Cleaning and processing functions ================================= This page contains a list of all cleanng and processing functions that are available within FUNPACK. Recall that running FUNPACK involves four stages: 1. **Import**: The requested rows (subjects) and columns (variables/datafields) are loaded into memory. 2. **Cleaning**: A number of steps are applied to each column, including NA value replacement, categorical recoding, child value replacement, and *variable-specific cleaning functions*. 3. **Processing**: Columns may be added or removed by a number of *processing functions*, such as removing columns due to sparsity, or generating new binary columns from an existing categorical column. 4. **Export**: The data is saved to a new file. This page covers the *variable-specific cleaning functions*, and *processing functions*, that are built into FUNPACK [*]_. .. [*] Note that you can also write your own functions and load them as plugins with the ``--plugin_file`` command-line option. Cleaning functions and processing functions differ in the following ways: * Cleaning functions are applied to a single variable / data field at a time, whereas processing functions may be applied to multiple variables / data fields at once. * Cleaning functions cannot add or remove columns, whereas processing functions are able to remove existing columns and add new columns. Cleaning and processing functions may be specified on the command-line, or in separate variable or processing tables (via the ``--variable_file`` and ``--processing_file`` command-line options). In all cases: - You may specify a single function, or a comma-separated list of functions to be applied in order. - When no arguments are specified, the function parentheses are optional. - Function arguments may be passed as positional or keyword (named) arguments. For example, this command would run cleaning functinos ``func1``, ``func2``, ``func3``, and ``func4`` on the columns of data field 12345:: fmrib_unpack \ -cl 12345 "func1,func2(),func3(99),func4('arg1', arg2=1234)" \ output.tsv input.csv .. _cleaning_functions: Cleaning functions ------------------ Built-in cleaning functions are defined in the :mod:`funpack.cleaning_functions` module. These functions may be used with the ``-cl`` / ``--clean`` command-line option. For example, to apply ``fillMissing`` and ``flattenHierarchical`` to the columns of data field `41202 `_:: fmrib_unpack -cl 20002 "fillVisits,flattenHierarchical" output.tsv input.csv .. autofunction:: funpack.cleaning_functions.fillVisits :noindex: .. autofunction:: funpack.cleaning_functions.fillMissing :noindex: .. autofunction:: funpack.cleaning_functions.makeNa :noindex: .. autofunction:: funpack.cleaning_functions.codeToNumeric :noindex: .. autofunction:: funpack.cleaning_functions.flattenHierarchical :noindex: .. autofunction:: funpack.cleaning_functions.parseSpirometryData :noindex: .. _processing_functions: Processing functions -------------------- Built-in processing functions are defined in the :mod:`funpack.processing_functions` module. These functions may be used with the ``-ppr`` / ``--prepend_process`` and ``-apr`` / ``--append_process`` command-line flags. You have the option to prepend or append processing functions in case you are using a pre-defined processing table or configuration profile, but wish to perform some additional steps for specific data fields. For example, to apply ``removeIfSparse`` and ``binariseCategorical`` to the columns of data fields `20001 `_ and `20002 `_:: fmrib_unpack \ -apr 20001,20002 \ "removeIfSparse(50),binariseCategorical(acrossVisits=True)" \ output.tsv input.csv .. autofunction:: funpack.processing_functions.removeIfSparse :noindex: .. autofunction:: funpack.processing_functions.removeIfRedundant :noindex: .. autofunction:: funpack.processing_functions.binariseCategorical :noindex: .. autofunction:: funpack.processing_functions.expandCompound :noindex: .. autofunction:: funpack.processing_functions.createDiagnosisColumns :noindex: .. autofunction:: funpack.processing_functions.removeField :noindex: