CSV requirements

In general, we recommend the usage of PARQUET files, as these are compressed in size as well as contain properly typed data types. But data can be certainly also provided in CSV format, either uncompressed or compressed as .gz, given they adhere to the following rules. They must be encoded in UTF-8, use commas (,), semicolons (;) or tab (\t) as column separators, and start with a single header line, containing the column names.

CSV requirements

1. Header row

  • The first row must contain the column names.
  • Each column name in a table must be unique.

2. Rows

  • Each row in the file must contain the same number of cells.

3. Alphanumeric entries (text, categories, strings)

  • Entries containing line breaks, and spaces at the beginning or end, must be quoted with double-quotes.
“this is, one column”
“this is \n two lines”
“ space at the beginning and end “
  • double quotes in entries must be escaped with double quotes itself
“this does contain “”quoted text”””

4. Datetime values

  • must be encoded in one of the formats below
  • missing values must appear as empty strings
FormatExample
Dateyyyy-MM-dd2020-02-08
Datetime with hoursyyyy-MM-dd HH
yyyy-MM-ddTHH
yyyy-MM-ddTHHZ
2020-02-08 09
2020-02-08T09
2020-02-08T09Z
Datetime with minutesyyyy-MM-dd HH:mm
yyyy-MM-ddTHH:mm
yyyy-MM-ddTHH:mmZ
2020-02-08 09:30
2020-02-08T09:30
2020-02-08T09:30Z
Datetime with secondsyyyy-MM-dd HH:mm:ss
yyyy-MM-ddTHH:mm:ss
yyyy-MM-ddTHH:mm:ssZ
2020-02-08 09:30:26
2020-02-08T09:30:26
2020-02-08T09:30:26Z
Datetime with millisecondsyyyy-MM-dd HH:mm:ss.SSS
yyyy-MM-ddTHH:mm:ss.SSS
yyyy-MM-ddTHH:mm:ss.SSSZ
2020-02-08 09:30:26.123
2020-02-08T09:30:26.123
2020-02-08T09:30:26.123Z

5. Numerical values

  • must have a . as decimal separator
  • must not have a thousands separator
  • must have missing values encoded as empty strings