funpack.util

This module contains a collection of miscellaneous utility functions, classes, and constants.

class funpack.util.CTYPES(value)

Bases: Enum

The CTYPES enum defines all the types that funpack is aware of.

categorical_multiple = 6
categorical_multiple_non_numeric = 7
categorical_single = 4
categorical_single_non_numeric = 5
compound = 11
continuous = 3
date = 9
integer = 2
sequence = 1
text = 10
time = 8
unknown = 12
funpack.util.DATA_TYPES = {CTYPES.categorical_multiple: <class 'numpy.float32'>, CTYPES.categorical_multiple_non_numeric: <class 'str'>, CTYPES.categorical_single: <class 'numpy.float32'>, CTYPES.categorical_single_non_numeric: <class 'str'>, CTYPES.compound: <class 'str'>, CTYPES.continuous: <class 'numpy.float32'>, CTYPES.integer: <class 'numpy.float32'>, CTYPES.sequence: <class 'numpy.uint32'>, CTYPES.text: <class 'str'>}

Default internal data type to use for the different variable types. Used by the columnTypes() function. These types may be overridden by the InternalType column of the variable table, which is populated from the funpack/schema/type.txt file (see loadTableBases()).

class funpack.util.Singleton(*args, **kwargs)[source]

Bases: object

Manages a reference to a single instance of a class.

This is not a true singleton - there are no restrictions against multiple instances being created. However, a reference is only held to the first created instance.

The Singleton class is used as the base class for DataTable, to allow for shared-memory access to the DataTable by worker processes.

classmethod instance()[source]

Return a reference to the singleton instance, or None if one does not exist.

classmethod setInstance(inst)[source]

Set/override the singleton instance.

funpack.util.cat(files, outfile)[source]

Uses cat to concatenate files, saving the output to outfile.

Parameters:
  • files – Sequence of files to concatenate.

  • outfile – Name of file to save output to.

funpack.util.dedup(seq)[source]

Remove duplicates from a sequence, preserving order. Returns a list.

funpack.util.deprecated(message)[source]

Decorator used to mark a function or method as deprecated

funpack.util.findConfigDir(dirname='configs')[source]

Returns the first entry from findConfigDirs. If $FUNPACK_CONFIG_DIR is set, it will be returned. Otherwise, it will be the location of the funpack/configs/ directory as described in findConfigDirs().

funpack.util.findConfigDirs(dirname='configs')[source]

Returns a list of candidate FUNPACK configuration directories.

The FUNPACK FMRIB configuration installs its config/table files into <python>/lib/python<X.Y>/site-packages/funpack/configs/. If FUNPACK is installed into that Python environment, this directory will be alongside the FUNPACK source code.

However, if FUNPACK is being executed from a source checkout, we have to use site.getsitepackages to find the location of the config directory.

The dirname argument may also be set to plugins, in which case the path to the funpack.plugins module will be returned.

The $FUNPACK_CONFIG_DIR environment variable can also be used to point to a configuration directory - if set, the returned list will include $FUNPACK_CONFIG_DIR/ at the beginning.

A RuntimeError is raised if the config directory cannot be found.

funpack.util.findConfigFile(filename, suffix='.cfg', dirname='configs')[source]

Searches for a FUNPACK configuration file in a number of locations.

Parameters:
  • filename – Name of file to search for

  • suffix – Suffix to append, if the filename is specfied without one (must include the leading period).

  • dirname – Name of internal/built-in directory to search - assumed to be within the funpack package directory, e.g. funpack/configs/.

Returns:

Absolute path to the found file, or filename unmodified if a match was not found.

funpack.util.findPluginFile(filename)[source]

Searches for a FUNPACK plugin tile - see findConfigFile().

funpack.util.findTableFile(filename)[source]

Searches for a FUNPACK table tile - see findConfigFile().

funpack.util.generateColumnName(variable, visit, instance)[source]

Generate a column name for the given variable, visit and instance.

Parameters:
  • variable – Integer variable ID

  • visit – Visit number

  • instance – Instance number

funpack.util.inMainProcess()[source]

Returns True if the running process is the main (parent) process. Returns False if the running process is a child process (e.g. a multiprocessingg worker process).

funpack.util.isna(val: Any) bool[source]

Test whether val is NaN. Return True if val is nan, or if val is a sequence where every value contained within is nan.

funpack.util.logIfError(label)[source]

Decorator which emits a log message with label if the decorated function raises an Exception.

funpack.util.parseColumnName(name)[source]

Parses a UK Biobank column name, returns the components.

Two column naming formats are supported. The name is expected to be a string of one of the following forms:

variable-visit.instance
variable.instance
f.variable.visit.instance

where variable and visit are integers. instance is typically also an integer, but non-numeric values for instance are accepted. This (and the second form above) is to allow parsing of derived columns (see e.g. the processing_functions.binariseCategorical() processing function).

Some variables have the form:

f.variable..visit.instance

For these variables, the visit is interpreted as a negative number.

If name does not have one of the above forms, a ValueError is raised.

Note

For the vast majority of biobank variables, the second number in a column name (visit above) corresponds to the assessment visit. However, there are a small number of variables which are not associated with a specific visit, and thus for which this number does not correspond to a visit (e.g. variable 40006), but to some other coding.

Confusingly, the UK Biobank showcase refers to the coding that a variable adheres to as an “instancing”, whilst also using the term “instance” to refer to the columns of multi-valued variables - the instance element of the column name.

The “instancing” that a variable uses is contained in the Instancing column of the variable table. Variables for which the visit component of their column names do correspond to an actual visit have an instancing equal to 2.

Parameters:

name – Column name

Returns:

A tuple containing:

  • variable ID

  • visit number

  • instance (may be an integer or a string)

funpack.util.parseMatlabRange(r)[source]

Parses a string containing a MATLAB-style start:stop or start:step:stop range, where the stop is inclusive).

Parameters:

r – String containing MATLAB_style range.

Returns:

List of integers in the fully expanded range.

funpack.util.tempdir(root=None, changeto=True)[source]

Create and change into a temporary directory, deleting it on exit.

Parameters:
  • root – Create the directory as a sub-directory of root (default: $TMPDIR)

  • changeto – Change into the directory (default: True)

funpack.util.timed(op=None, logger=None, lvl=None, fmt=None)[source]

Context manager which times a section of code, and prints a log message afterwards.

Parameters:
  • op – Name of operation which is being timed

  • logger – Logger object to use - defaults to log.

  • lvl – Log level - defaults to logging.INFO.

  • fmt – Custom message. If not provided, a default message is used. Must be a '%'-style format string which accepts two parameters: the elapsed time (%s), and the memory usage (%i)..

funpack.util.wc(fname)[source]

Uses wc to count the number of lines in fname.

Parameters:

fname – Name of the file to check

Returns:

Number of lines in fname.