
This module contains the FileInfo class, and the sniff() and fileinfo() functions, for getting information about input data files.

class funpack.fileinfo.FileInfo(datafiles, indexes=None, loaders=None, encodings=None, renameDuplicates=False, renameSuffix=None)[source]

Bases: object

The FileInfo class is a container for the information generated by the fileinfo() function for a collection of input datafiles.

property allColumns: Sequence[Column]

Returns a list containing all columns from all data files. The result is just a concatenation of the lists returned by columns() for each data file.

property allVariables: Sequence[int]

Returns a list containing all variable IDs from all data files. Duplicates are removed, and the IDs are sorted.


Return a list of Column objects representing each of the columns that are present in the given datafile.

property datafiles

Return a list containing the data files.


Return the CSV dialect type for the given datafile.


Return the encoding for the given datafile, or None if no custom encoding was specified.


Return True if the given datafile has a header row, False otherwise.


Return the index column for the given data file.


Return the custom loader for the given datafile, or None. if there is no custom loader.


Return the custom sniffer for the given datafile, or None if there is no custom sniffer. This is equivalent to loader(), as sniffer/loader functions are always paired.

funpack.fileinfo.fileinfo(datafiles, indexes=None, sniffers=None, encodings=None, renameDuplicates=False, renameSuffix=None)[source]

Identifies the format of each input data file, and extracts/generates column names and variable IDs for every column.

  • datafiles – Sequence of data files to be loaded.

  • indexes – Dict containing {filename : [index]} mappings, specifying which column(s) to use as the index. Defaults to 0 (the first column).

  • sniffers – Dict containing {file : snifferName} mappings, specifying custom sniffers to be used for specific files. See the custom module.

  • encodings – Dict of {datafile : encoding} mappings, specifying non-standard file encodings. If not specified, latin1 is assumed.

  • renameDuplicates – Defaults to False. If True, columns which have the same name are renamed - see renameDuplicateColumns().

  • renameSuffix – Passed as suffix to renameDuplicateColumns(), if renameDuplicates is True.


A tuple containing:

  • List of csv dialect types

  • List of booleans, indicating whether or not each file has a header row.

  • List of lists, Column objects representing the columns in each file.

funpack.fileinfo.has_header(sample, dialect, candidateTypes=None, missingValues=None)[source]

Used in place of the csv.Sniffer.has_header method.

The Sniffer.has_header method can fail in some circumstances, e.g.:

  • for files which only contain a single column.

  • for files which contain lots of missing values.

This function works in essentially the same manner as the csv.Sniffer.has_header function, but handles the above situations.

  • sample – Text sample.

  • dialect – CSV dialect as returned by the csv.Sniffer.sniff method, or a string describing the dialect (e.g. 'whitespace').

  • candidateTypes – Sequence of types to check. Defaults to [float].

  • missingValues – Sequence of missing values to ignore. Defaults to ['', 'na', 'n/a', 'nan']. The missing value test is case insensitive.


True if the sample looks like it contains a header, False otherwise.

funpack.fileinfo.renameDuplicateColumns(cols, suffix=None)[source]

Identifies any columns which have the same name, and re-names the subsequent ones. If N columns have the same name X, they are renamed X, X.1<suffix>, X.2<suffix>, ..., X.<N-1><suffix>.

The name attribute of each Column object is modified in-place.

  • cols – Sequence of Column objects.

  • suffix – String to append to the name of all renamed columns. Defaults to an empty string.

funpack.fileinfo.sniff(datafile, encoding=None)[source]

Identifies the format of the given input data file.

  • datafile – Input data file

  • encoding – File encoding (default: 'latin1')


A tuple containing:

  • A csv dialect type

  • List of Column objects. The name attributes will be None if the file does not have a header row. The variable, visit, and instance attributes will be None if the file does not have UKB-style column names.