funpack.exporting_tsv

This module provides the exportTSV() and exportCSV() functions, which export the data contained in a DataTable to a TSV or CSV file.

funpack.exporting_tsv.NUM_ROWS = 10000

Default number of rows to export at a time by exportTSV() - the default value for its numRows argument.

funpack.exporting_tsv.exportCSV(dtable, outfile, **kwargs)[source]

Export data to a CSV-style file.

This function is identical to the exportTSV(), except that the default value for the sep` argument is a ',' instead of a ' '.

funpack.exporting_tsv.exportTSV(dtable, outfile, sep=None, missingValues=None, escapeNewlines=False, numRows=None, dropNaRows=False, dateFormat=None, timeFormat=None, formatters=None, **kwargs)[source]

Export data to a TSV-style file.

This may be parallelised by row - chunks of numRows rows will be saved to separate temporary output files in parallel, and then concatenated afterwards to produce the final output file.

Parameters:
  • dtableDataTable containing the data

  • outfile – File to output to

  • sep – Separator character to use. Defaults to '\t'

  • missingValues – String to use for missing/NA values. Defaults to the empty string.

  • escapeNewlines – If True, all string/object types are escaped using shlex.quote.

  • numRows – Number of rows to write at a time. Defaults to NUM_ROWS.

  • dropNaRows – If True, rows which do not contain data for any columns are not exported.

  • dateFormat – Name of formatter to use for date columns.

  • timeFormat – Name of formatter to use for time columns.

  • formatters – Dict of { [vid|column] : formatter } mappings, specifying custom formatters to use for specific variables.

funpack.exporting_tsv.writeDataFrame(dtable, outfile, header, chunki, sep, missingValues, dropNaRows, dateFormat, timeFormat, formatters)[source]

Writes all of the data in dtable to outfile.

Called by exportTSV() to output one chunk of data.

Parameters:
  • dtableDataTable containing the data

  • outfile – File to output to

  • header – If True, write the header row (column names).

  • chunki – Chunk index (used for logging)

  • sep – Separator character to use.

  • missingValues – String to use for missing/NA values.

  • dropNaRows – If True, rows which do not contain data for any columns are not exported.

  • dateFormat – Name of formatter to use for date columns.

  • timeFormat – Name of formatter to use for time columns.

  • formatters – Dict of { [vid|column] : formatter } mappings, specifying custom formatters to use for specific variables.