funpack.schema.hierarchy
This module contains functions and data structures for working with hierarchical variables.
The loadHierarchyFile()
function will read hierarchy information for
one variable from a text file, and return a Hierarchy
object.
The Hierarchy
class allows the hierarchy information about a variable
to be queried.
The data coding information for data codings in the UKBiobank is downloaded
on-demand from the UK Biobank showcase at
https://biobank.ctsu.ox.ac.uk/crystal/schema.cgi. Some pre-downloaded backup
files are stored in in funpack/schema/hierarchy/
.
- exception funpack.schema.hierarchy.CircularError[source]
Bases:
Exception
Error raised by the
Hierarchy.parents()
method in the event that a circular relationship is detected in a hierarchy.
- funpack.schema.hierarchy.HIERARCHY_DATA_NAMES = {'icd10': 19, 'icd9': 87, 'opcs3': 259, 'opcs4': 240}
This dictionary contains some UK Biobank hierarchical data codings which can be looked up by name with the
getHierarchyFilePath()
function.
- class funpack.schema.hierarchy.Hierarchy(nodes, parents, codings, descs)[source]
Bases:
object
The
Hierarchy
class allows information in a hierarchical variable to be queried. Theparents()
method will return all parents in the hierarchy for a given value (a.k.a. coding), and thedescription()
method will return the description for a value.Additional metadata can be added and retrieved for codings via the
set()
andget()
methods.- property codings
Return a list of all unique codings in the hierarchy.
- get(coding, attr)[source]
Get the given attribute for the given coding. Returns a
KeyError
for non-unique / invalid codings, or non-existent attributes.
- index(coding)[source]
Return the node ID for the given
coding
. AKeyError
is raised for non-uniqwue or invalid codings.
- class funpack.schema.hierarchy.Node(node_id: int, parent_id: int, coding: Any, attrs: dict)[source]
Bases:
object
Represnts a node in a hierarchical encoding. Used by the
Hierarchy
class.- attrs: dict
- coding: Any
- node_id: int
- parent_id: int
- funpack.schema.hierarchy.codeToNumeric(code, name=None, dtable=None, vid=None, coding=None, hier=None, download=True)[source]
Converts a hierarchical code into a numeric version. See the
getHierarchyFilePath()
for information on the arguments.Some hierarchical codings in the UKB do not have unique coding values for parent/non-leaf nodes. If this function is passed such a value,
np.nan
is returned.- Parameters:
code – Code to convert
name – Data coding name
dtable – The
DataTable
vid – Variable ID
coding – Data coding ID
hier – A
Hierarchy
instance which, if provided, will be used instead of loading one from file using the other arguments.download – Defaults to
True
- coding files are downloaded from the UK Biobank showcase. Set toFalse
to force loading from the backup files infunpack/schema/hierarchy/
.
- funpack.schema.hierarchy.dataCodingType(coding)[source]
Returns a data type suitable for representing values in the given encoding.
- funpack.schema.hierarchy.getHierarchyCoding(dtable=None, vid=None, name=None, coding=None)[source]
Return a data coding ID for the given
vid
,name
, orcoding
. See theloadHierarchyFile()
function for details.- Parameters:
dtable – The
DataTable
vid – Variable ID
name – Data coding name
coding – Data coding ID
- Returns:
An integer ID for the corresponding data coding.
- funpack.schema.hierarchy.getHierarchyFilePath(coding)[source]
Return a file path to a backup file for the given coding. The file is not guaranteed to exist.
- funpack.schema.hierarchy.loadEncodingTable()[source]
Loads the encoding.txt file built into funpack, which contains information about all UK Biobank encodings.
- funpack.schema.hierarchy.loadHierarchyFile(dtable=None, vid=None, name=None, coding=None, download=True)[source]
Load an encoding file containing hierarchy information for the specified variable/name.
Hierarchy files can be looked up with one of the following methods, in order of precedence:
By specifying a data coding (
coding
). This takes precedence.By specifying a
name
which is present in theHIERARCHY_DATA_NAMES
.By passing a
DataTable
(dtable
) and variable ID (vid
)
Te recognised data type names for use with the second method are listed in the
HIERARCHY_DATA_NAMES
dictionary.A
ValueError
is raised if the variable is unknown, or does not have a listed data coding.The hierarchy coding file that is downloaded/loaded is assumed to be a tab-separated file containing the following columns:
coding
: A variable value (not necessarily unique)meaning
: Descriptionnode_id
: Unique numeric identifier for each nodeparent_id
: Identifier of each node’s parent
It is assumed that all codings have a unique
node_id
. Top-level parent nodes (nodes with no parent of their own) often have an ID of 0, although this is not assumed.
- funpack.schema.hierarchy.numericToCode(code, name=None, dtable=None, vid=None, coding=None, download=True)[source]
Converts a numeric hierarchical code into its original version
- Parameters:
code – Code to convert
name – Data coding name
dtable – The
DataTable
vid – Variable ID
coding – Data coding ID
download – Defaults to
True
- coding files are downloaded from the UK Biobank showcase. Set toFalse
to force loading from the backup files infunpack/schema/hierarchy/
.