funpack.fileinfo
This module contains the FileInfo class, and the sniff() and
fileinfo() functions, for getting information about input data files.
- class funpack.fileinfo.FileInfo(datafiles, indexes=None, loaders=None, encodings=None, renameDuplicates=False, renameSuffix=None)[source]
Bases:
objectThe
FileInfoclass is a container for the information generated by thefileinfo()function for a collection of input datafiles.- property allColumns: Sequence[Column]
Returns a list containing all columns from all data files. The result is just a concatenation of the lists returned by
columns()for each data file.
- property allVariables: Sequence[int]
Returns a list containing all variable IDs from all data files. Duplicates are removed, and the IDs are sorted.
- columns(datafile)[source]
Return a list of
Columnobjects representing each of the columns that are present in the givendatafile.
- property datafiles
Return a list containing the data files.
- encoding(datafile)[source]
Return the encoding for the given
datafile, orNoneif no custom encoding was specified.
- funpack.fileinfo.fileinfo(datafiles, indexes=None, sniffers=None, encodings=None, renameDuplicates=False, renameSuffix=None)[source]
Identifies the format of each input data file, and extracts/generates column names and variable IDs for every column.
- Parameters:
datafiles – Sequence of data files to be loaded.
indexes – Dict containing
{filename : [index]}mappings, specifying which column(s) to use as the index. Defaults to 0 (the first column).sniffers – Dict containing
{file : snifferName}mappings, specifying custom sniffers to be used for specific files. See thecustommodule.encodings – Dict of
{datafile : encoding}mappings, specifying non-standard file encodings. If not specified,latin1is assumed.renameDuplicates – Defaults to
False. IfTrue, columns which have the same name are renamed - seerenameDuplicateColumns().renameSuffix – Passed as
suffixtorenameDuplicateColumns(), ifrenameDuplicates is True.
- Returns:
A tuple containing:
List of
csvdialect typesList of booleans, indicating whether or not each file has a header row.
List of lists,
Columnobjects representing the columns in each file.
- funpack.fileinfo.has_header(sample, dialect, candidateTypes=None, missingValues=None)[source]
Used in place of the
csv.Sniffer.has_headermethod.The
Sniffer.has_headermethod can fail in some circumstances, e.g.:for files which only contain a single column.
for files which contain lots of missing values.
This function works in essentially the same manner as the
csv.Sniffer.has_headerfunction, but handles the above situations.- Parameters:
sample – Text sample.
dialect – CSV dialect as returned by the
csv.Sniffer.sniffmethod, or a string describing the dialect (e.g.'whitespace').candidateTypes – Sequence of types to check. Defaults to
[float].missingValues – Sequence of missing values to ignore. Defaults to
['', 'na', 'n/a', 'nan']. The missing value test is case insensitive.
- Returns:
Trueif the sample looks like it contains a header,Falseotherwise.
- funpack.fileinfo.renameDuplicateColumns(cols, suffix=None)[source]
Identifies any columns which have the same name, and re-names the subsequent ones. If
Ncolumns have the same nameX, they are renamedX,X.1<suffix>,X.2<suffix>,...,X.<N-1><suffix>.The
nameattribute of eachColumnobject is modified in-place.- Parameters:
cols – Sequence of
Columnobjects.suffix – String to append to the name of all renamed columns. Defaults to an empty string.
- funpack.fileinfo.sniff(datafile, encoding=None)[source]
Identifies the format of the given input data file.
- Parameters:
datafile – Input data file
encoding – File encoding (default:
'latin1')
- Returns:
A tuple containing:
A
csvdialect typeList of
Columnobjects. Thenameattributes will beNoneif the file does not have a header row. Thevariable,visit, andinstanceattributes will beNoneif the file does not have UKB-style column names.