funpack.util
This module contains a collection of miscellaneous utility functions, classes, and constants.
- class funpack.util.CTYPES(value)
Bases:
Enum
The
CTYPES
enum defines all the types thatfunpack
is aware of.- categorical_multiple = 6
- categorical_multiple_non_numeric = 7
- categorical_single = 4
- categorical_single_non_numeric = 5
- compound = 11
- continuous = 3
- date = 9
- integer = 2
- sequence = 1
- text = 10
- time = 8
- unknown = 12
- funpack.util.DATA_TYPES = {CTYPES.categorical_multiple: <class 'numpy.float32'>, CTYPES.categorical_multiple_non_numeric: <class 'str'>, CTYPES.categorical_single: <class 'numpy.float32'>, CTYPES.categorical_single_non_numeric: <class 'str'>, CTYPES.compound: <class 'str'>, CTYPES.continuous: <class 'numpy.float32'>, CTYPES.integer: <class 'numpy.float32'>, CTYPES.sequence: <class 'numpy.uint32'>, CTYPES.text: <class 'str'>}
Default internal data type to use for the different variable types. Used by the
columnTypes()
function. These types may be overridden by theInternalType
column of the variable table, which is populated from thefunpack/schema/type.txt
file (seeloadTableBases()
).
- class funpack.util.Singleton(*args, **kwargs)[source]
Bases:
object
Manages a reference to a single instance of a class.
This is not a true singleton - there are no restrictions against multiple instances being created. However, a reference is only held to the first created instance.
The
Singleton
class is used as the base class forDataTable
, to allow for shared-memory access to theDataTable
by worker processes.
- funpack.util.cat(files, outfile)[source]
Uses
cat
to concatenatefiles
, saving the output tooutfile
.- Parameters:
files – Sequence of files to concatenate.
outfile – Name of file to save output to.
- funpack.util.dedup(seq)[source]
Remove duplicates from a sequence, preserving order. Returns a list.
- funpack.util.findConfigDir(dirname='configs')[source]
Returns the first entry from
findConfigDirs
. If$FUNPACK_CONFIG_DIR
is set, it will be returned. Otherwise, it will be the location of the funpack/configs/ directory as described infindConfigDirs()
.
- funpack.util.findConfigDirs(dirname='configs')[source]
Returns a list of candidate FUNPACK configuration directories.
The FUNPACK FMRIB configuration installs its config/table files into
<python>/lib/python<X.Y>/site-packages/funpack/configs/
. If FUNPACK is installed into that Python environment, this directory will be alongside the FUNPACK source code.However, if FUNPACK is being executed from a source checkout, we have to use
site.getsitepackages
to find the location of the config directory.The
dirname
argument may also be set toplugins
, in which case the path to thefunpack.plugins
module will be returned.The
$FUNPACK_CONFIG_DIR
environment variable can also be used to point to a configuration directory - if set, the returned list will include$FUNPACK_CONFIG_DIR/
at the beginning.A
RuntimeError
is raised if the config directory cannot be found.
- funpack.util.findConfigFile(filename, suffix='.cfg', dirname='configs')[source]
Searches for a FUNPACK configuration file in a number of locations.
- Parameters:
filename – Name of file to search for
suffix – Suffix to append, if the filename is specfied without one (must include the leading period).
dirname – Name of internal/built-in directory to search - assumed to be within the
funpack
package directory, e.g.funpack/configs/
.
- Returns:
Absolute path to the found file, or
filename
unmodified if a match was not found.
- funpack.util.findPluginFile(filename)[source]
Searches for a FUNPACK plugin tile - see
findConfigFile()
.
- funpack.util.findTableFile(filename)[source]
Searches for a FUNPACK table tile - see
findConfigFile()
.
- funpack.util.generateColumnName(variable, visit, instance)[source]
Generate a column name for the given variable, visit and instance.
- Parameters:
variable – Integer variable ID
visit – Visit number
instance – Instance number
- funpack.util.inMainProcess()[source]
Returns
True
if the running process is the main (parent) process. ReturnsFalse
if the running process is a child process (e.g. amultiprocessingg
worker process).
- funpack.util.isna(val: Any) bool [source]
Test whether
val
is NaN. ReturnTrue
ifval
isnan
, or ifval
is a sequence where every value contained within isnan
.
- funpack.util.logIfError(label)[source]
Decorator which emits a log message with
label
if the decorated function raises anException
.
- funpack.util.parseColumnName(name)[source]
Parses a UK Biobank column name, returns the components.
Two column naming formats are supported. The name is expected to be a string of one of the following forms:
variable-visit.instance variable.instance f.variable.visit.instance
where
variable
andvisit
are integers.instance
is typically also an integer, but non-numeric values forinstance
are accepted. This (and the second form above) is to allow parsing of derived columns (see e.g. theprocessing_functions.binariseCategorical()
processing function).Some variables have the form:
f.variable..visit.instance
For these variables, the visit is interpreted as a negative number.
If
name
does not have one of the above forms, aValueError
is raised.Note
For the vast majority of biobank variables, the second number in a column name (
visit
above) corresponds to the assessment visit. However, there are a small number of variables which are not associated with a specific visit, and thus for which this number does not correspond to a visit (e.g. variable 40006), but to some other coding.Confusingly, the UK Biobank showcase refers to the coding that a variable adheres to as an “instancing”, whilst also using the term “instance” to refer to the columns of multi-valued variables - the
instance
element of the column name.The “instancing” that a variable uses is contained in the
Instancing
column of the variable table. Variables for which thevisit
component of their column names do correspond to an actual visit have an instancing equal to 2.- Parameters:
name – Column name
- Returns:
A tuple containing:
variable ID
visit number
instance (may be an integer or a string)
- funpack.util.parseMatlabRange(r)[source]
Parses a string containing a MATLAB-style
start:stop
orstart:step:stop
range, where thestop
is inclusive).- Parameters:
r – String containing MATLAB_style range.
- Returns:
List of integers in the fully expanded range.
- funpack.util.tempdir(root=None, changeto=True)[source]
Create and change into a temporary directory, deleting it on exit.
- Parameters:
root – Create the directory as a sub-directory of
root
(default:$TMPDIR
)changeto – Change into the directory (default:
True
)
- funpack.util.timed(op=None, logger=None, lvl=None, fmt=None)[source]
Context manager which times a section of code, and prints a log message afterwards.
- Parameters:
op – Name of operation which is being timed
logger – Logger object to use - defaults to
log
.lvl – Log level - defaults to
logging.INFO
.fmt – Custom message. If not provided, a default message is used. Must be a
'%'
-style format string which accepts two parameters: the elapsed time (%s
), and the memory usage (%i
)..