funpack.exporting_tsv
This module provides the exportTSV()
and exportCSV()
functions,
which export the data contained in a DataTable
to a TSV or CSV file.
- funpack.exporting_tsv.NUM_ROWS = 10000
Default number of rows to export at a time by
exportTSV()
- the default value for itsnumRows
argument.
- funpack.exporting_tsv.exportCSV(dtable, outfile, **kwargs)[source]
Export data to a CSV-style file.
This function is identical to the
exportTSV()
, except that the default value for the sep` argument is a','
instead of a' '
.
- funpack.exporting_tsv.exportTSV(dtable, outfile, sep=None, missingValues=None, escapeNewlines=False, numRows=None, dropNaRows=False, dateFormat=None, timeFormat=None, formatters=None, **kwargs)[source]
Export data to a TSV-style file.
This may be parallelised by row - chunks of
numRows
rows will be saved to separate temporary output files in parallel, and then concatenated afterwards to produce the final output file.- Parameters:
dtable –
DataTable
containing the dataoutfile – File to output to
sep – Separator character to use. Defaults to
'\t'
missingValues – String to use for missing/NA values. Defaults to the empty string.
escapeNewlines – If
True
, all string/object types are escaped usingshlex.quote
.numRows – Number of rows to write at a time. Defaults to
NUM_ROWS
.dropNaRows – If
True
, rows which do not contain data for any columns are not exported.dateFormat – Name of formatter to use for date columns.
timeFormat – Name of formatter to use for time columns.
formatters – Dict of
{ [vid|column] : formatter }
mappings, specifying custom formatters to use for specific variables.
- funpack.exporting_tsv.writeDataFrame(dtable, outfile, header, chunki, sep, missingValues, dropNaRows, dateFormat, timeFormat, formatters)[source]
Writes all of the data in
dtable
tooutfile
.Called by
exportTSV()
to output one chunk of data.- Parameters:
dtable –
DataTable
containing the dataoutfile – File to output to
header – If
True
, write the header row (column names).chunki – Chunk index (used for logging)
sep – Separator character to use.
missingValues – String to use for missing/NA values.
dropNaRows – If
True
, rows which do not contain data for any columns are not exported.dateFormat – Name of formatter to use for date columns.
timeFormat – Name of formatter to use for time columns.
formatters – Dict of
{ [vid|column] : formatter }
mappings, specifying custom formatters to use for specific variables.