funpack.cleaning

This module provides the cleanData() function, which performs a set of cleaning steps on the data.

funpack.cleaning.applyChildValues(dtable)[source]

Fills missing values in variables which have ParentValues expressions defined.

funpack.cleaning.applyCleaningFunctions(dtable)[source]

Applies cleaning steps specified in the Clean column of the variable table.

funpack.cleaning.applyNAInsertion(dtable)[source]

Re-codes data which should be interpreted as missing/not available.

Certain variables can take values which should be interpreted as missing - these are defined in the NAValues columns of the variable and data coding tables.

This function replaces all of those values with np.nan. The replacement is performed in-place.

funpack.cleaning.applyNewLevels(dtable)[source]

Applies recoding of categorical variables as specified by the RawLevels and NewLevels columns in the variable table.

For each column, if the new data (after recoding) is negatively correlated with the old data (before recoding), an 'inverted' flag is added to the column (via DataTable.addFlag()).

funpack.cleaning.cleanData(dtable, skipNAInsertion=False, skipCleanFuncs=False, skipChildValues=False, skipRecoding=False)[source]

Perform data cleaning steps.

This function does the following:

  1. Re-encodes missing values (the NAValues column in the variable table)

  2. Applies cleaning (the Clean column in the processing table)

  3. Fills missing values in child variables (the ParentValues and ChildValues columns in the variable table)

  4. Re-encodes categorical variable values (the RawLevels and NewLevels columns in the variable table)

Parameters:
  • dtable – The DataTable.

  • skipNAInsertion – If True, NA value recoding is skipped.

  • skipCleanFuncs – If True, cleaning functions defined in the variable table are not applied.

  • skipChildValues – If True, child value filling is skipped.

  • skipRecoding – If True, raw-to-new level recoding is skipped.

Returns:

The DataTable, with cleaned data.