funpack.util
This module contains a collection of miscellaneous utility functions, classes, and constants.
- class funpack.util.CTYPES(value)
Bases:
EnumThe
CTYPESenum defines all the types thatfunpackis aware of.- categorical_multiple = 6
- categorical_multiple_non_numeric = 7
- categorical_single = 4
- categorical_single_non_numeric = 5
- compound = 11
- continuous = 3
- date = 9
- integer = 2
- sequence = 1
- text = 10
- time = 8
- unknown = 12
- funpack.util.DATA_TYPES = {CTYPES.categorical_multiple: <class 'numpy.float32'>, CTYPES.categorical_multiple_non_numeric: <class 'str'>, CTYPES.categorical_single: <class 'numpy.float32'>, CTYPES.categorical_single_non_numeric: <class 'str'>, CTYPES.compound: <class 'str'>, CTYPES.continuous: <class 'numpy.float32'>, CTYPES.integer: <class 'numpy.float32'>, CTYPES.sequence: <class 'numpy.uint32'>, CTYPES.text: <class 'str'>}
Default internal data type to use for the different variable types. Used by the
columnTypes()function. These types may be overridden by theInternalTypecolumn of the variable table, which is populated from thefunpack/schema/type.txtfile (seeloadTableBases()).
- class funpack.util.Singleton(*args, **kwargs)[source]
Bases:
objectManages a reference to a single instance of a class.
This is not a true singleton - there are no restrictions against multiple instances being created. However, a reference is only held to the first created instance.
The
Singletonclass is used as the base class forDataTable, to allow for shared-memory access to theDataTableby worker processes.
- funpack.util.cat(files, outfile)[source]
Uses
catto concatenatefiles, saving the output tooutfile.- Parameters:
files – Sequence of files to concatenate.
outfile – Name of file to save output to.
- funpack.util.dedup(seq)[source]
Remove duplicates from a sequence, preserving order. Returns a list.
- funpack.util.findConfigDir(dirname='configs')[source]
Returns the first entry from
findConfigDirs. If$FUNPACK_CONFIG_DIRis set, it will be returned. Otherwise, it will be the location of the funpack/configs/ directory as described infindConfigDirs().
- funpack.util.findConfigDirs(dirname='configs')[source]
Returns a list of candidate FUNPACK configuration directories.
The FUNPACK FMRIB configuration installs its config/table files into
<python>/lib/python<X.Y>/site-packages/funpack/configs/. If FUNPACK is installed into that Python environment, this directory will be alongside the FUNPACK source code.However, if FUNPACK is being executed from a source checkout, we have to use
site.getsitepackagesto find the location of the config directory.The
dirnameargument may also be set toplugins, in which case the path to thefunpack.pluginsmodule will be returned.The
$FUNPACK_CONFIG_DIRenvironment variable can also be used to point to a configuration directory - if set, the returned list will include$FUNPACK_CONFIG_DIR/at the beginning.A
RuntimeErroris raised if the config directory cannot be found.
- funpack.util.findConfigFile(filename, suffix='.cfg', dirname='configs')[source]
Searches for a FUNPACK configuration file in a number of locations.
- Parameters:
filename – Name of file to search for
suffix – Suffix to append, if the filename is specfied without one (must include the leading period).
dirname – Name of internal/built-in directory to search - assumed to be within the
funpackpackage directory, e.g.funpack/configs/.
- Returns:
Absolute path to the found file, or
filenameunmodified if a match was not found.
- funpack.util.findPluginFile(filename)[source]
Searches for a FUNPACK plugin tile - see
findConfigFile().
- funpack.util.findTableFile(filename)[source]
Searches for a FUNPACK table tile - see
findConfigFile().
- funpack.util.generateColumnName(variable, visit, instance)[source]
Generate a column name for the given variable, visit and instance.
- Parameters:
variable – Integer variable ID
visit – Visit number
instance – Instance number
- funpack.util.inMainProcess()[source]
Returns
Trueif the running process is the main (parent) process. ReturnsFalseif the running process is a child process (e.g. amultiprocessinggworker process).
- funpack.util.isna(val: Any) bool[source]
Test whether
valis NaN. ReturnTrueifvalisnan, or ifvalis a sequence where every value contained within isnan.
- funpack.util.logIfError(label)[source]
Decorator which emits a log message with
labelif the decorated function raises anException.
- funpack.util.parseColumnName(name)[source]
Parses a UK Biobank column name, returns the components.
Two column naming formats are supported. The name is expected to be a string of one of the following forms:
variable-visit.instance variable.instance f.variable.visit.instance
where
variableandvisitare integers.instanceis typically also an integer, but non-numeric values forinstanceare accepted. This (and the second form above) is to allow parsing of derived columns (see e.g. theprocessing_functions.binariseCategorical()processing function).Some variables have the form:
f.variable..visit.instance
For these variables, the visit is interpreted as a negative number.
If
namedoes not have one of the above forms, aValueErroris raised.Note
For the vast majority of biobank variables, the second number in a column name (
visitabove) corresponds to the assessment visit. However, there are a small number of variables which are not associated with a specific visit, and thus for which this number does not correspond to a visit (e.g. variable 40006), but to some other coding.Confusingly, the UK Biobank showcase refers to the coding that a variable adheres to as an “instancing”, whilst also using the term “instance” to refer to the columns of multi-valued variables - the
instanceelement of the column name.The “instancing” that a variable uses is contained in the
Instancingcolumn of the variable table. Variables for which thevisitcomponent of their column names do correspond to an actual visit have an instancing equal to 2.- Parameters:
name – Column name
- Returns:
A tuple containing:
variable ID
visit number
instance (may be an integer or a string)
- funpack.util.parseMatlabRange(r)[source]
Parses a string containing a MATLAB-style
start:stoporstart:step:stoprange, where thestopis inclusive).- Parameters:
r – String containing MATLAB_style range.
- Returns:
List of integers in the fully expanded range.
- funpack.util.tempdir(root=None, changeto=True)[source]
Create and change into a temporary directory, deleting it on exit.
- Parameters:
root – Create the directory as a sub-directory of
root(default:$TMPDIR)changeto – Change into the directory (default:
True)
- funpack.util.timed(op=None, logger=None, lvl=None, fmt=None)[source]
Context manager which times a section of code, and prints a log message afterwards.
- Parameters:
op – Name of operation which is being timed
logger – Logger object to use - defaults to
log.lvl – Log level - defaults to
logging.INFO.fmt – Custom message. If not provided, a default message is used. Must be a
'%'-style format string which accepts two parameters: the elapsed time (%s), and the memory usage (%i)..