funpack.cleaning
This module provides the cleanData() function, which performs
a set of cleaning steps on the data.
- funpack.cleaning.applyChildValues(dtable)[source]
Fills missing values in variables which have ParentValues expressions defined.
- funpack.cleaning.applyCleaningFunctions(dtable)[source]
Applies cleaning steps specified in the
Cleancolumn of the variable table.
- funpack.cleaning.applyNAInsertion(dtable)[source]
Re-codes data which should be interpreted as missing/not available.
Certain variables can take values which should be interpreted as missing - these are defined in the
NAValuescolumns of the variable and data coding tables.This function replaces all of those values with
np.nan. The replacement is performed in-place.
- funpack.cleaning.applyNewLevels(dtable)[source]
Applies recoding of categorical variables as specified by the
RawLevelsandNewLevelscolumns in the variable table.For each column, if the new data (after recoding) is negatively correlated with the old data (before recoding), an
'inverted'flag is added to the column (viaDataTable.addFlag()).
- funpack.cleaning.cleanData(dtable, skipNAInsertion=False, skipCleanFuncs=False, skipChildValues=False, skipRecoding=False)[source]
Perform data cleaning steps.
This function does the following:
Re-encodes missing values (the
NAValuescolumn in the variable table)Applies cleaning (the
Cleancolumn in the processing table)Fills missing values in child variables (the
ParentValuesandChildValuescolumns in the variable table)Re-encodes categorical variable values (the
RawLevelsandNewLevelscolumns in the variable table)
- Parameters:
dtable – The
DataTable.skipNAInsertion – If
True, NA value recoding is skipped.skipCleanFuncs – If
True, cleaning functions defined in the variable table are not applied.skipChildValues – If
True, child value filling is skipped.skipRecoding – If
True, raw-to-new level recoding is skipped.
- Returns:
The
DataTable, with cleaned data.