funpack.parsing.variable_expression

This module contains functions for parsing conditional and logical expressions, and the VariableExpression class for representing a parsed expression.

VariableExpression

The VariableExpression class is a convenience class which can be used to parse and access an expression.

parseVariableExpression

Parses a string containing an expression.

variablesInExpression

Given an expression returned by parseVariableExpression(), extracts all variables used in the expression.

calculateVariableExpressionEvaluationOrder

Identifies hierarchical relationships between variables.

For a given variable, the ParentValues column of the variable table may contain one or more expressions, which define conditions that parent variables of the variable may meet in order for the variable value to be replaced. This module contains logic for parsing and evaluating a single expression - the evaluation of multiple comma-separated expressions is handled in the importing module.

An expression comprises one or more conditional statements (or statements for short). A statement has the form:

variable operator value

where:

  • variable is the ID of a parent variable of the variable in question. Variable IDs must be an integer preceded by the letter v.

  • operator is a comparison operator (e.g. equals, greater than, etc.).

  • value is one of:
    • 'na' indicating missing,

    • a numeric value against which the parent variable is to be compared.

    • a non-numeric value (i.e. a string), against which the parent variable is to be compared. The value must be quoted with either single or double quotes.

The following comparison operators are allowed (and the symbols used in a statement can be found in the SYMBOLS dictionary):

  • equal to

  • not equal to

  • greater than

  • greater than or equal to

  • less than

  • less than or equal to

The equal to and not equal to operators may be used with a value of 'na' to test whether the values for a variable are missing or present respectively. Similarly, the equal to and not equal to operators may be used with a non-numeric value to test for string equality.

Multiple conditional statements may be combined with and, or, and not logical operations (specific symbols can be found in the SYMBOLS dictionary), and precedence may be enforced with the use of round brackets.

The any and all operations can be applied to statements which have been evaluated on multiple columns to combine the results column-wise.

funpack.parsing.variable_expression.SYMBOLS = {'all': 'all', 'and': '&&', 'any': 'any', 'contains': 'contains', 'eq': '==', 'ge': '>=', 'gt': '>', 'le': '<=', 'lt': '<', 'na': 'na', 'ne': '!=', 'not': '~', 'or': '||', 'var': 'v'}

This dictionary contains the symbols for variables and operations that may be used in expressions.

class funpack.parsing.variable_expression.VariableExpression(expr)[source]

Bases: object

The VariableExpression class is a convenience class which can be used to parse and access an expression.

evaluate(df, cols)[source]

Evaluates this VariableExpression and returns the result.

Parameters:
  • dtablepandas.DataFrame containing the data.

  • cols – Dictionary containing { variable : [column_name] } mappings from the variables used in the expressions to columns in df. Each mapping may also contain a single column name, instead of a list.

Returns:

The outcome of the expression - a numpy boolean array.

property variables

Return a list of all variables used in the expression.

funpack.parsing.variable_expression.calculateVariableExpressionEvaluationOrder(vids, exprs)[source]

Identifies hierarchical relationships between variables.

Given the variable table, identifies the hierarchical relationship order between all variables, and all parent variables used within their expressions.

Parameters:
  • vids – Sequence of variable IDs

  • exprs – Sequence of parsed expression functions (as returned by parseVariableExpression()), one for each variable in variables. For each variable, there may be either one expression function, or a sequence of them.

Returns:

A list of tuples, each containing:

  • A hierarchy level

  • A list of all variables at that level

The list is in ascending order, by the hierarchy level, from most dependent to least dependent.

funpack.parsing.variable_expression.makeParser()[source]

Generates a pyparsing parser which can be used to parse expressions.

Returns:

A pyparsing object which can parse an expression.

funpack.parsing.variable_expression.parseBinary(toks)[source]

Called by the parser created by makeParser(). Parses an expression of the form expression1 [and|or] expression2, where and/or are the corresponding symbols in the SYMBOLS dictionary, and expression1 and expression2 are conditional statements or logical expression.

Binary expressions expect that the shape of both operands is equal; the number of rows is guaranteed to match (because ultimately the operands are coming from the same pandas.DataFrame. But the number of columns may differ if, for example, one operand has been calculated from a multi-valued variable, and another from a single- valued variable.

The outcome of this situation can be explicitly controlled in the query by use of the any and all operators, which can be used to collapse the columns of a variable down to a single column.

But if this is not explicitly controlled, the default behaviour which occurs when the operands of a binary operator have a different number of columns is to collapse both operands down to a single column via the any operator - in other words, combining values within each row witha logical “or” operation.

Returns a function which can be used to evaluate the expression.

funpack.parsing.variable_expression.parseCondition(toks)[source]

Parses a conditional statement of the form:

variable operation value
where:
  • variable is a variable identifier

  • operation is a comparison operation

  • value is a numeric value

Returns a function which can be used to evaluate the conditional statement. The function is constructed such that it expects a pandas.DataFrame, and will output a boolean numpy array.

funpack.parsing.variable_expression.parseUnary(toks)[source]

Called by the parser created by makeParser(). Parses an expression of the form [not|any|all] expression, where not/any/all is the corresponding symbol in the SYMBOLS dictionary, and expression is a conditional statement or logical expression.

Returns a function which can be used to evaluate the expression.

funpack.parsing.variable_expression.parseVariable(toks)[source]

Called by the parser created by makeParser(). Parses a variable identifier, returning an integer ID.

funpack.parsing.variable_expression.parseVariableExpression(expr)[source]

Parses a string containing an expression.

The expression may contain conditional statements of the form::

variable comparison_operator value

combined with logical expressions using symbols for and, or, and not.

The parseVariableExpression function, given an expression string, will return a function that can be used to evaluate the expression. An expression function expects to be given two arguments:

  • A pandas.DataFrame which contains the data on all variables used in the expression

  • A dictionary containing {variable : column} mappings from the variables used in the expression to the columns of the data frame.

An expression function will simply return True or False, depending on the outcome of the expression.

Expression functions have a few attributes containing metadata about the expression:

  • ftype contains the expression type, either unary (for not, any and all operations), binary (for and/or operations), or condition (for comparison operations)

  • operation contains the operation symbol

Boolean and/or functions contain operand1 and operand2 attributes which refer to the expression functions they will be applied to. Similarly, boolean not functions contain an operand attribute which refers to the expression function it will be applied to. Comparison expression functions contain variable and value attributes, which contain the variable name and the value involved in the comparison.

Parameters:

expr – String containing an expression.

Returns:

A function which can be used to evaluate the expression.

funpack.parsing.variable_expression.variablesInExpression(expr)[source]

Given an expression returned by parseVariableExpression(), extracts all variables used in the expression.

Parameters:

expr – A parsed expression, produced by parseVariableExpression().

Returns:

A set containing all of the variables that are mentioned in the expression.