funpack.cleaning
This module provides the cleanData()
function, which performs
a set of cleaning steps on the data.
- funpack.cleaning.applyChildValues(dtable)[source]
Fills missing values in variables which have ParentValues expressions defined.
- funpack.cleaning.applyCleaningFunctions(dtable)[source]
Applies cleaning steps specified in the
Clean
column of the variable table.
- funpack.cleaning.applyNAInsertion(dtable)[source]
Re-codes data which should be interpreted as missing/not available.
Certain variables can take values which should be interpreted as missing - these are defined in the
NAValues
columns of the variable and data coding tables.This function replaces all of those values with
np.nan
. The replacement is performed in-place.
- funpack.cleaning.applyNewLevels(dtable)[source]
Applies recoding of categorical variables as specified by the
RawLevels
andNewLevels
columns in the variable table.For each column, if the new data (after recoding) is negatively correlated with the old data (before recoding), an
'inverted'
flag is added to the column (viaDataTable.addFlag()
).
- funpack.cleaning.cleanData(dtable, skipNAInsertion=False, skipCleanFuncs=False, skipChildValues=False, skipRecoding=False)[source]
Perform data cleaning steps.
This function does the following:
Re-encodes missing values (the
NAValues
column in the variable table)Applies cleaning (the
Clean
column in the processing table)Fills missing values in child variables (the
ParentValues
andChildValues
columns in the variable table)Re-encodes categorical variable values (the
RawLevels
andNewLevels
columns in the variable table)
- Parameters:
dtable – The
DataTable
.skipNAInsertion – If
True
, NA value recoding is skipped.skipCleanFuncs – If
True
, cleaning functions defined in the variable table are not applied.skipChildValues – If
True
, child value filling is skipped.skipRecoding – If
True
, raw-to-new level recoding is skipped.
- Returns:
The
DataTable
, with cleaned data.