funpack.main

This module contains the funpack entry point.

funpack.main.configLogging(args)[source]

Configures funpack logging.

Parameters:

argsargparse.Namespace object containing parsed command line arguments.

funpack.main.doCleanAndProcess(dtable, args)[source]

Data cleaning and processing stage.

Parameters:
  • dtableDataTable containing the data

  • argsargparse.Namespace object containing command line arguments

  • poolmultiprocessing.Pool object for parallelisation (may be None)

funpack.main.doDescriptionExport(dtable, args)[source]

If a --description_file has been specified, a description for every column is saved out to the file.

funpack.main.doExport(dtable, args)[source]

Data export stage.

Parameters:
  • dtableDataTable containing the data

  • argsargparse.Namespace object containing command line arguments

funpack.main.doICD10Export(args)[source]

If a --icd10_map_file has been specified, the ICD10 codes present in the data (and their converted values) are saved out to the file.

funpack.main.doImport(args, mgr)[source]

Data import stage.

Parameters:
  • argsargparse.Namespace object containing command line arguments

  • mgrmultiprocessing.Manager object for parallelisation (may be None)

Returns:

A tuple containing:

  • A DataTable containing the data

  • A sequence of Column objects representing the unknown columns.

  • A sequence of Column objects representing columns which are uncategorised, and have no processing or cleaning rules specified on them.

  • A list of Column objects that were not loaded from each input file.

funpack.main.doSummaryExport(dtable, args)[source]

If a --summary_file has been specified, a summary of the cleaning steps that have been applied to each variable are saved out to the file.

funpack.main.doUnknownsExport(dtable, args, unknowns, uncategorised)[source]

If the --unknown_vars_file argument was used, the unknown/ unprocessed columns are saved out to a file.

Parameters:
  • dtableDataTable containing the data

  • argsargparse.Namespace object containing command line arguments

  • unknowns – List of Column objects representing the unknown columns.

  • uncategorised – A sequence of Column objects representing columns which are uncategorised, and have no processing or cleaning rules specified on them.

funpack.main.generateDescription(dtable, col)[source]

Called by doDescriptionExport(). Generates and returns a suitable description for the given column.

Parameters:
  • dtableDatatable instance

  • colColumn instance

funpack.main.main(argv=None)[source]

funpack entry point.

funpack.main.splitDataTable(dtable, args)[source]

Splits the .:class:DataTable into separate numeric/non-numeric tables.

Called by doExport(). If the --suppress_non_numerics and/or --write_non_numerics options are active, non-numeric columns need to be separated from numeric columns, and possibly saved to a separate output file.

Parameters:
  • dtableDataTable containing the data

  • argsargparse.Namespace object containing command line arguments

Returns:

A list of (DataTable, filename) tuples, containing the DataTable instances and corresponding output file names.