funpack.parsing.variable_expression
This module contains functions for parsing conditional and logical
expressions, and the VariableExpression class for representing a
parsed expression.
The |
|
Parses a string containing an expression. |
|
Given an expression returned by |
|
Identifies hierarchical relationships between variables. |
For a given variable, the ParentValues column of the variable table may
contain one or more expressions, which define conditions that parent
variables of the variable may meet in order for the variable value to be
replaced. This module contains logic for parsing and evaluating a single
expression - the evaluation of multiple comma-separated expressions is handled
in the importing module.
An expression comprises one or more conditional statements (or statements for short). A statement has the form:
variable operator value
where:
variableis the ID of a parent variable of the variable in question. Variable IDs must be an integer preceded by the letterv.
operatoris a comparison operator (e.g. equals, greater than, etc.).
valueis one of:
'na'indicating missing,a numeric value against which the parent variable is to be compared.
a non-numeric value (i.e. a string), against which the parent variable is to be compared. The value must be quoted with either single or double quotes.
The following comparison operators are allowed (and the symbols used in a
statement can be found in the SYMBOLS dictionary):
equal to
not equal to
greater than
greater than or equal to
less than
less than or equal to
The equal to and not equal to operators may be used with a value of
'na' to test whether the values for a variable are missing or present
respectively. Similarly, the equal to and not equal to operators may be
used with a non-numeric value to test for string equality.
Multiple conditional statements may be combined with and, or, and
not logical operations (specific symbols can be found in the
SYMBOLS dictionary), and precedence may be enforced with the use of
round brackets.
The any and all operations can be applied to statements which have
been evaluated on multiple columns to combine the results column-wise.
- funpack.parsing.variable_expression.SYMBOLS = {'all': 'all', 'and': '&&', 'any': 'any', 'contains': 'contains', 'eq': '==', 'ge': '>=', 'gt': '>', 'le': '<=', 'lt': '<', 'na': 'na', 'ne': '!=', 'not': '~', 'or': '||', 'var': 'v'}
This dictionary contains the symbols for variables and operations that may be used in expressions.
- class funpack.parsing.variable_expression.VariableExpression(expr)[source]
Bases:
objectThe
VariableExpressionclass is a convenience class which can be used to parse and access an expression.- evaluate(df, cols)[source]
Evaluates this
VariableExpressionand returns the result.- Parameters:
dtable –
pandas.DataFramecontaining the data.cols – Dictionary containing
{ variable : [column_name] }mappings from the variables used in the expressions to columns indf. Each mapping may also contain a single column name, instead of a list.
- Returns:
The outcome of the expression - a
numpyboolean array.
- property variables
Return a list of all variables used in the expression.
- funpack.parsing.variable_expression.calculateVariableExpressionEvaluationOrder(vids, exprs)[source]
Identifies hierarchical relationships between variables.
Given the variable table, identifies the hierarchical relationship order between all variables, and all parent variables used within their expressions.
- Parameters:
vids – Sequence of variable IDs
exprs – Sequence of parsed expression functions (as returned by
parseVariableExpression()), one for each variable invariables. For each variable, there may be either one expression function, or a sequence of them.
- Returns:
A list of tuples, each containing:
A hierarchy level
A list of all variables at that level
The list is in ascending order, by the hierarchy level, from most dependent to least dependent.
- funpack.parsing.variable_expression.makeParser()[source]
Generates a
pyparsingparser which can be used to parse expressions.- Returns:
A
pyparsingobject which can parse an expression.
- funpack.parsing.variable_expression.parseBinary(toks)[source]
Called by the parser created by
makeParser(). Parses an expression of the formexpression1 [and|or] expression2, whereand/orare the corresponding symbols in theSYMBOLSdictionary, andexpression1andexpression2are conditional statements or logical expression.Binary expressions expect that the shape of both operands is equal; the number of rows is guaranteed to match (because ultimately the operands are coming from the same
pandas.DataFrame. But the number of columns may differ if, for example, one operand has been calculated from a multi-valued variable, and another from a single- valued variable.The outcome of this situation can be explicitly controlled in the query by use of the
anyandalloperators, which can be used to collapse the columns of a variable down to a single column.But if this is not explicitly controlled, the default behaviour which occurs when the operands of a binary operator have a different number of columns is to collapse both operands down to a single column via the
anyoperator - in other words, combining values within each row witha logical “or” operation.Returns a function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.parseCondition(toks)[source]
Parses a conditional statement of the form:
variable operation value
- where:
variableis a variable identifieroperationis a comparison operationvalueis a numeric value
Returns a function which can be used to evaluate the conditional statement. The function is constructed such that it expects a
pandas.DataFrame, and will output a booleannumpyarray.
- funpack.parsing.variable_expression.parseUnary(toks)[source]
Called by the parser created by
makeParser(). Parses an expression of the form[not|any|all] expression, wherenot/any/allis the corresponding symbol in theSYMBOLSdictionary, andexpressionis a conditional statement or logical expression.Returns a function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.parseVariable(toks)[source]
Called by the parser created by
makeParser(). Parses a variable identifier, returning an integer ID.
- funpack.parsing.variable_expression.parseVariableExpression(expr)[source]
Parses a string containing an expression.
- The expression may contain conditional statements of the form::
variable comparison_operator value
combined with logical expressions using symbols for
and,or, andnot.The
parseVariableExpressionfunction, given an expression string, will return a function that can be used to evaluate the expression. An expression function expects to be given two arguments:A
pandas.DataFramewhich contains the data on all variables used in the expressionA dictionary containing
{variable : column}mappings from the variables used in the expression to the columns of the data frame.
An expression function will simply return
TrueorFalse, depending on the outcome of the expression.Expression functions have a few attributes containing metadata about the expression:
ftypecontains the expression type, eitherunary(for not, any and all operations),binary(for and/or operations), orcondition(for comparison operations)operationcontains the operation symbol
Boolean and/or functions contain
operand1andoperand2attributes which refer to the expression functions they will be applied to. Similarly, boolean not functions contain anoperandattribute which refers to the expression function it will be applied to. Comparison expression functions containvariableandvalueattributes, which contain the variable name and the value involved in the comparison.- Parameters:
expr – String containing an expression.
- Returns:
A function which can be used to evaluate the expression.
- funpack.parsing.variable_expression.variablesInExpression(expr)[source]
Given an expression returned by
parseVariableExpression(), extracts all variables used in the expression.- Parameters:
expr – A parsed expression, produced by
parseVariableExpression().- Returns:
A set containing all of the variables that are mentioned in the expression.