funpack.importing.filter
This module contains functions used by the core.importData()
function to identify which columns should be imported, and to filter
rows from a data frame after it has been loaded.
- funpack.importing.filter.REMOVE_DUPLICATE_COLUMN_IDENTIFIER = '.REMOVE_DUPLICATE'
Identifier which is appended to the names of duplicate columns that are to be removed. Use of this identifier is not hard-coded anywhere - this module is just a convenient location for its definition. See the
funpack.main.doImport()
function.
- funpack.importing.filter.addAuxillaryVariables(fileinfo: FileInfo, proctable: DataFrame, variables: Sequence[int] | None = None, exclude: Sequence[int] | None = None) Tuple[Sequence[int] | None, Sequence[int] | None] [source]
Checks that auxillary variables referred to by processing rules are to be loaded.
- Parameters:
fileinfo –
FileInfo
object describing the input file(s).proctable – Processing table
variables – Variables to load, as returnened by
restrictVariables()
exclude – Variables to exclude, as returnened by
restrictVariables()
- Returns:
A tuple containing:
a sequence of variables to load, or
None
if all variables should be loaded.a sequence of variables to exclude, or
None
if no variables should be excluded.
- funpack.importing.filter.columnsToLoad(fileinfo, vartable, variables, exclude=None, colnames=None, excludeColnames=None)[source]
Determines which columns should be loaded from
datafiles
.Peeks at the first line of the data file (assumed to contain column names), then uses the variable table to determine which of them should be loaded.
- Parameters:
fileinfo –
FileInfo
object describing the input file(s).vartable – Variable table
variables – List of variables to load.
exclude – List of variables to exclude.
colnames – List of column names/glob-style wildcard patterns, specifying columns to load.
excludeColnames – List of column name suffixes specifying columns to exclude. This overrides
colnames
.
- Returns:
A tuple containing:
- funpack.importing.filter.evaluateSubjectExpression(data, expr, cols)[source]
Evaluates the given variable expression for each row in the data.
- Parameters:
data – A
pandas.DataFrame
instance.expr – String containing a variable expression
cols – Dict of
{vid : [Column]}
mappings
- Returns:
A boolean
numpy
array containing the result of evaluating the expression at each row, orNone
indicating that the expression was not evaluated (and every row passed).
- funpack.importing.filter.evaluateSubjectExpressions(data, allcols, subjectExprs)[source]
Remove subjects (rows) from the data according to
subjectExprs
.- Parameters:
data – A
pandas.DataFrame
instance.allcols – List of
Column
objects describing every column in the data set.subjectExprs – List of strings containing expressions which identify subjects to be included. Subjects for which any expression evaluates to
True
will be included.
- Returns:
1D boolean
numpy
array containingTrue
for subjects to be included andFalse
for subjects to be excluded. OrNone
, indicating that the expressions were not evaluated (and all rows passed).
- funpack.importing.filter.filterSubjects(data, cols, subjects=None, subjectExprs=None, exclude=None)[source]
Removes rows (subjects) from the data based on
subjects
to include, conditionalsubjectExprs
, and subjects toexclude
.- Parameters:
data – A
pandas.DataFrame
instance.allcols – List of
Column
objects describing every column in the data set.subjects – List of subjects to include.
subjectExprs – List of subject inclusion expressions
exclude – List of subjects to exclude
- Returns:
A
pandas.DataFrame
, potentially with rows removed.
- funpack.importing.filter.restrictVariables(cattable: DataFrame, variables: Sequence[int] | None = None, categories: Sequence[int | str] | None = None, excludeVariables: Sequence[int] | None = None, excludeCategories: Sequence[int | str] | None = None) Tuple[Sequence[int] | None, Sequence[int] | None] [source]
Determines which variables should be loaded (and the order they should appear in the output), and which variables should be excluded, from the given sequences of
variables
,categories
, andexcludeVariables
andexcludeCategories
.- Parameters:
cattable – The category table
variables – List of variable IDs to import.
categories – List of category names or IDs to import.
excludeVariables – List of variable IDs to exclude.
excludeCategories – List of category names or IDs to exclude.
- Returns:
A tuple containing:
a sequence of variables to load, or
None
if all variables should be loaded.a sequence of variables to exclude, or
None
if no variables should be excluded.