funpack.schema.hierarchy
This module contains functions and data structures for working with hierarchical variables.
The loadHierarchyFile() function will read hierarchy information for
one variable from a text file, and return a Hierarchy object.
The Hierarchy class allows the hierarchy information about a variable
to be queried.
The data coding information for data codings in the UKBiobank is downloaded
on-demand from the UK Biobank showcase at
https://biobank.ctsu.ox.ac.uk/crystal/schema.cgi. Some pre-downloaded backup
files are stored in in funpack/schema/hierarchy/.
- exception funpack.schema.hierarchy.CircularError[source]
Bases:
ExceptionError raised by the
Hierarchy.parents()method in the event that a circular relationship is detected in a hierarchy.
- funpack.schema.hierarchy.HIERARCHY_DATA_NAMES = {'icd10': 19, 'icd9': 87, 'opcs3': 259, 'opcs4': 240}
This dictionary contains some UK Biobank hierarchical data codings which can be looked up by name with the
getHierarchyFilePath()function.
- class funpack.schema.hierarchy.Hierarchy(nodes, parents, codings, descs)[source]
Bases:
objectThe
Hierarchyclass allows information in a hierarchical variable to be queried. Theparents()method will return all parents in the hierarchy for a given value (a.k.a. coding), and thedescription()method will return the description for a value.Additional metadata can be added and retrieved for codings via the
set()andget()methods.- property codings
Return a list of all unique codings in the hierarchy.
- get(coding, attr)[source]
Get the given attribute for the given coding. Returns a
KeyErrorfor non-unique / invalid codings, or non-existent attributes.
- index(coding)[source]
Return the node ID for the given
coding. AKeyErroris raised for non-uniqwue or invalid codings.
- class funpack.schema.hierarchy.Node(node_id: int, parent_id: int, coding: Any, attrs: dict)[source]
Bases:
objectRepresnts a node in a hierarchical encoding. Used by the
Hierarchyclass.- attrs: dict
- coding: Any
- node_id: int
- parent_id: int
- funpack.schema.hierarchy.codeToNumeric(code, name=None, dtable=None, vid=None, coding=None, hier=None, download=True)[source]
Converts a hierarchical code into a numeric version. See the
getHierarchyFilePath()for information on the arguments.Some hierarchical codings in the UKB do not have unique coding values for parent/non-leaf nodes. If this function is passed such a value,
np.nanis returned.- Parameters:
code – Code to convert
name – Data coding name
dtable – The
DataTablevid – Variable ID
coding – Data coding ID
hier – A
Hierarchyinstance which, if provided, will be used instead of loading one from file using the other arguments.download – Defaults to
True- coding files are downloaded from the UK Biobank showcase. Set toFalseto force loading from the backup files infunpack/schema/hierarchy/.
- funpack.schema.hierarchy.dataCodingType(coding)[source]
Returns a data type suitable for representing values in the given encoding.
- funpack.schema.hierarchy.getHierarchyCoding(dtable=None, vid=None, name=None, coding=None)[source]
Return a data coding ID for the given
vid,name, orcoding. See theloadHierarchyFile()function for details.- Parameters:
dtable – The
DataTablevid – Variable ID
name – Data coding name
coding – Data coding ID
- Returns:
An integer ID for the corresponding data coding.
- funpack.schema.hierarchy.getHierarchyFilePath(coding)[source]
Return a file path to a backup file for the given coding. The file is not guaranteed to exist.
- funpack.schema.hierarchy.loadEncodingTable()[source]
Loads the encoding.txt file built into funpack, which contains information about all UK Biobank encodings.
- funpack.schema.hierarchy.loadHierarchyFile(dtable=None, vid=None, name=None, coding=None, download=True)[source]
Load an encoding file containing hierarchy information for the specified variable/name.
Hierarchy files can be looked up with one of the following methods, in order of precedence:
By specifying a data coding (
coding). This takes precedence.By specifying a
namewhich is present in theHIERARCHY_DATA_NAMES.By passing a
DataTable(dtable) and variable ID (vid)
Te recognised data type names for use with the second method are listed in the
HIERARCHY_DATA_NAMESdictionary.A
ValueErroris raised if the variable is unknown, or does not have a listed data coding.The hierarchy coding file that is downloaded/loaded is assumed to be a tab-separated file containing the following columns:
coding: A variable value (not necessarily unique)meaning: Descriptionnode_id: Unique numeric identifier for each nodeparent_id: Identifier of each node’s parent
It is assumed that all codings have a unique
node_id. Top-level parent nodes (nodes with no parent of their own) often have an ID of 0, although this is not assumed.
- funpack.schema.hierarchy.numericToCode(code, name=None, dtable=None, vid=None, coding=None, download=True)[source]
Converts a numeric hierarchical code into its original version
- Parameters:
code – Code to convert
name – Data coding name
dtable – The
DataTablevid – Variable ID
coding – Data coding ID
download – Defaults to
True- coding files are downloaded from the UK Biobank showcase. Set toFalseto force loading from the backup files infunpack/schema/hierarchy/.