funpack.schema.hierarchy

This module contains functions and data structures for working with hierarchical variables.

The loadHierarchyFile() function will read hierarchy information for one variable from a text file, and return a Hierarchy object.

The Hierarchy class allows the hierarchy information about a variable to be queried.

The data coding information for data codings in the UKBiobank is downloaded on-demand from the UK Biobank showcase at https://biobank.ctsu.ox.ac.uk/crystal/schema.cgi. Some pre-downloaded backup files are stored in in funpack/schema/hierarchy/.

exception funpack.schema.hierarchy.CircularError[source]

Bases: Exception

Error raised by the Hierarchy.parents() method in the event that a circular relationship is detected in a hierarchy.

funpack.schema.hierarchy.HIERARCHY_DATA_NAMES = {'icd10': 19, 'icd9': 87, 'opcs3': 259, 'opcs4': 240}

This dictionary contains some UK Biobank hierarchical data codings which can be looked up by name with the getHierarchyFilePath() function.

class funpack.schema.hierarchy.Hierarchy(nodes, parents, codings, descs)[source]

Bases: object

The Hierarchy class allows information in a hierarchical variable to be queried. The parents() method will return all parents in the hierarchy for a given value (a.k.a. coding), and the description() method will return the description for a value.

Additional metadata can be added and retrieved for codings via the set() and get() methods.

coding(nodeID)[source]

Return the coding for the given nodeID.

property codings

Return a list of all unique codings in the hierarchy.

description(coding)[source]

Return the description for the given coding.

get(coding, attr)[source]

Get the given attribute for the given coding. Returns a KeyError for non-unique / invalid codings, or non-existent attributes.

index(coding)[source]

Return the node ID for the given coding. A KeyError is raised for non-uniqwue or invalid codings.

parentIDs(nodeID)[source]

Return IDs of all parents of the given node.

parents(coding)[source]

Return codings for all parents of the given coding. A KeyError is raised for non-uniqwue or invalid codings.

set(coding, attr, value)[source]

Set an attribute for the given coding. Returns a KeyError for non-unique / invalid codings.

class funpack.schema.hierarchy.Node(node_id: int, parent_id: int, coding: Any, attrs: dict)[source]

Bases: object

Represnts a node in a hierarchical encoding. Used by the Hierarchy class.

attrs: dict
coding: Any
node_id: int
parent_id: int
funpack.schema.hierarchy.codeToNumeric(code, name=None, dtable=None, vid=None, coding=None, hier=None, download=True)[source]

Converts a hierarchical code into a numeric version. See the getHierarchyFilePath() for information on the arguments.

Some hierarchical codings in the UKB do not have unique coding values for parent/non-leaf nodes. If this function is passed such a value, np.nan is returned.

Parameters:
  • code – Code to convert

  • name – Data coding name

  • dtable – The DataTable

  • vid – Variable ID

  • coding – Data coding ID

  • hier – A Hierarchy instance which, if provided, will be used instead of loading one from file using the other arguments.

  • download – Defaults to True - coding files are downloaded from the UK Biobank showcase. Set to False to force loading from the backup files in funpack/schema/hierarchy/.

funpack.schema.hierarchy.dataCodingType(coding)[source]

Returns a data type suitable for representing values in the given encoding.

funpack.schema.hierarchy.getHierarchyCoding(dtable=None, vid=None, name=None, coding=None)[source]

Return a data coding ID for the given vid, name, or coding. See the loadHierarchyFile() function for details.

Parameters:
  • dtable – The DataTable

  • vid – Variable ID

  • name – Data coding name

  • coding – Data coding ID

Returns:

An integer ID for the corresponding data coding.

funpack.schema.hierarchy.getHierarchyFilePath(coding)[source]

Return a file path to a backup file for the given coding. The file is not guaranteed to exist.

funpack.schema.hierarchy.loadEncodingTable()[source]

Loads the encoding.txt file built into funpack, which contains information about all UK Biobank encodings.

funpack.schema.hierarchy.loadHierarchyFile(dtable=None, vid=None, name=None, coding=None, download=True)[source]

Load an encoding file containing hierarchy information for the specified variable/name.

Hierarchy files can be looked up with one of the following methods, in order of precedence:

  1. By specifying a data coding (coding). This takes precedence.

  2. By specifying a name which is present in the HIERARCHY_DATA_NAMES.

  3. By passing a DataTable (dtable) and variable ID (vid)

Te recognised data type names for use with the second method are listed in the HIERARCHY_DATA_NAMES dictionary.

A ValueError is raised if the variable is unknown, or does not have a listed data coding.

The hierarchy coding file that is downloaded/loaded is assumed to be a tab-separated file containing the following columns:

  • coding: A variable value (not necessarily unique)

  • meaning: Description

  • node_id: Unique numeric identifier for each node

  • parent_id: Identifier of each node’s parent

It is assumed that all codings have a unique node_id. Top-level parent nodes (nodes with no parent of their own) often have an ID of 0, although this is not assumed.

Parameters:
  • dtable – The DataTable

  • vid – Variable ID

  • name – Data coding name

  • coding – Data coding ID

  • download – Defaults to True - coding files are downloaded from the UK Biobank showcase. Set to False to force loading from the backup files in funpack/schema/hierarchy/.

Returns:

A Hierarchy object.

funpack.schema.hierarchy.numericToCode(code, name=None, dtable=None, vid=None, coding=None, download=True)[source]

Converts a numeric hierarchical code into its original version

Parameters:
  • code – Code to convert

  • name – Data coding name

  • dtable – The DataTable

  • vid – Variable ID

  • coding – Data coding ID

  • download – Defaults to True - coding files are downloaded from the UK Biobank showcase. Set to False to force loading from the backup files in funpack/schema/hierarchy/.