funpack.parsing.variable_expression
This module contains functions for parsing conditional and logical
expressions, and the VariableExpression
class for representing a
parsed expression.
The |
|
Parses a string containing an expression. |
|
Given an expression returned by |
|
Identifies hierarchical relationships between variables. |
For a given variable, the ParentValues
column of the variable table may
contain one or more expressions, which define conditions that parent
variables of the variable may meet in order for the variable value to be
replaced. This module contains logic for parsing and evaluating a single
expression - the evaluation of multiple comma-separated expressions is handled
in the importing
module.
An expression comprises one or more conditional statements (or statements for short). A statement has the form:
variable operator value
where:
variable
is the ID of a parent variable of the variable in question. Variable IDs must be an integer preceded by the letterv
.
operator
is a comparison operator (e.g. equals, greater than, etc.).
value
is one of:
'na'
indicating missing,a numeric value against which the parent variable is to be compared.
a non-numeric value (i.e. a string), against which the parent variable is to be compared. The value must be quoted with either single or double quotes.
The following comparison operators are allowed (and the symbols used in a
statement can be found in the SYMBOLS
dictionary):
equal to
not equal to
greater than
greater than or equal to
less than
less than or equal to
The equal to and not equal to operators may be used with a value of
'na'
to test whether the values for a variable are missing or present
respectively. Similarly, the equal to and not equal to operators may be
used with a non-numeric value to test for string equality.
Multiple conditional statements may be combined with and
, or
, and
not
logical operations (specific symbols can be found in the
SYMBOLS
dictionary), and precedence may be enforced with the use of
round brackets.
The any
and all
operations can be applied to statements which have
been evaluated on multiple columns to combine the results column-wise.
- funpack.parsing.variable_expression.SYMBOLS = {'all': 'all', 'and': '&&', 'any': 'any', 'contains': 'contains', 'eq': '==', 'ge': '>=', 'gt': '>', 'le': '<=', 'lt': '<', 'na': 'na', 'ne': '!=', 'not': '~', 'or': '||', 'var': 'v'}
This dictionary contains the symbols for variables and operations that may be used in expressions.
- class funpack.parsing.variable_expression.VariableExpression(expr)[source]
Bases:
object
The
VariableExpression
class is a convenience class which can be used to parse and access an expression.- evaluate(df, cols)[source]
Evaluates this
VariableExpression
and returns the result.- Parameters:
dtable –
pandas.DataFrame
containing the data.cols – Dictionary containing
{ variable : [column_name] }
mappings from the variables used in the expressions to columns indf
. Each mapping may also contain a single column name, instead of a list.
- Returns:
The outcome of the expression - a
numpy
boolean array.
- property variables
Return a list of all variables used in the expression.
- funpack.parsing.variable_expression.calculateVariableExpressionEvaluationOrder(vids, exprs)[source]
Identifies hierarchical relationships between variables.
Given the variable table, identifies the hierarchical relationship order between all variables, and all parent variables used within their expressions.
- Parameters:
vids – Sequence of variable IDs
exprs – Sequence of parsed expression functions (as returned by
parseVariableExpression()
), one for each variable invariables
. For each variable, there may be either one expression function, or a sequence of them.
- Returns:
A list of tuples, each containing:
A hierarchy level
A list of all variables at that level
The list is in ascending order, by the hierarchy level, from most dependent to least dependent.
- funpack.parsing.variable_expression.makeParser()[source]
Generates a
pyparsing
parser which can be used to parse expressions.- Returns:
A
pyparsing
object which can parse an expression.
- funpack.parsing.variable_expression.parseBinary(toks)[source]
Called by the parser created by
makeParser()
. Parses an expression of the formexpression1 [and|or] expression2
, whereand
/or
are the corresponding symbols in theSYMBOLS
dictionary, andexpression1
andexpression2
are conditional statements or logical expression.Binary expressions expect that the shape of both operands is equal; the number of rows is guaranteed to match (because ultimately the operands are coming from the same
pandas.DataFrame
. But the number of columns may differ if, for example, one operand has been calculated from a multi-valued variable, and another from a single- valued variable.The outcome of this situation can be explicitly controlled in the query by use of the
any
andall
operators, which can be used to collapse the columns of a variable down to a single column.But if this is not explicitly controlled, the default behaviour which occurs when the operands of a binary operator have a different number of columns is to collapse both operands down to a single column via the
any
operator - in other words, combining values within each row witha logical “or” operation.Returns a function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.parseCondition(toks)[source]
Parses a conditional statement of the form:
variable operation value
- where:
variable
is a variable identifieroperation
is a comparison operationvalue
is a numeric value
Returns a function which can be used to evaluate the conditional statement. The function is constructed such that it expects a
pandas.DataFrame
, and will output a booleannumpy
array.
- funpack.parsing.variable_expression.parseUnary(toks)[source]
Called by the parser created by
makeParser()
. Parses an expression of the form[not|any|all] expression
, wherenot
/any
/all
is the corresponding symbol in theSYMBOLS
dictionary, andexpression
is a conditional statement or logical expression.Returns a function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.parseVariable(toks)[source]
Called by the parser created by
makeParser()
. Parses a variable identifier, returning an integer ID.
- funpack.parsing.variable_expression.parseVariableExpression(expr)[source]
Parses a string containing an expression.
- The expression may contain conditional statements of the form::
variable comparison_operator value
combined with logical expressions using symbols for
and
,or
, andnot
.The
parseVariableExpression
function, given an expression string, will return a function that can be used to evaluate the expression. An expression function expects to be given two arguments:A
pandas.DataFrame
which contains the data on all variables used in the expressionA dictionary containing
{variable : column}
mappings from the variables used in the expression to the columns of the data frame.
An expression function will simply return
True
orFalse
, depending on the outcome of the expression.Expression functions have a few attributes containing metadata about the expression:
ftype
contains the expression type, eitherunary
(for not, any and all operations),binary
(for and/or operations), orcondition
(for comparison operations)operation
contains the operation symbol
Boolean and/or functions contain
operand1
andoperand2
attributes which refer to the expression functions they will be applied to. Similarly, boolean not functions contain anoperand
attribute which refers to the expression function it will be applied to. Comparison expression functions containvariable
andvalue
attributes, which contain the variable name and the value involved in the comparison.- Parameters:
expr – String containing an expression.
- Returns:
A function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.variablesInExpression(expr)[source]
Given an expression returned by
parseVariableExpression()
, extracts all variables used in the expression.- Parameters:
expr – A parsed expression, produced by
parseVariableExpression()
.- Returns:
A set containing all of the variables that are mentioned in the expression.