Guides
CSV requirements

CSV requirements

To successfully synthesize your dataset, the content must be encoded in UTF-8, have commas (,) or semicolons (;) as comma separators, and adhere to the following rules:

CSV requirements

1. Header row

  • The first row must contain the column names.
  • Each column name in a table must be unique and may not exceed 255 characters.
  • These names cannot have special characters like commas, semi-columns, columns, slash, dollar-sign, backslash, quotes, double-quotes, etc.

2. Rows

Each row in the file must contain the same number of cells.

3. Alphanumeric entries (text, categories, strings)

  • Entries containing line breaks, and spaces at the beginning or end, must be quoted with double-quotes.
    “this is, one column”
    “this is \n two lines”
    “ space at the beginning and end “
  • double quotes in entries must be escaped with double quotes itself
    “this does contain “”quoted text”””

4. Datetime values

  • must be encoded in one of the below formats
  • must have missing values encoded as empty strings
FormatExample
Dateyyyy-MM-dd2020-02-08
Datetime with hoursyyyy-MM-dd HH
yyyy-MM-ddTHH
yyyy-MM-ddTHHZ
2020-02-08 09
2020-02-08T09
2020-02-08T09Z
Datetime with minutesyyyy-MM-dd HH:mm
yyyy-MM-ddTHH:mm
yyyy-MM-ddTHH:mmZ
2020-02-08 09:30
2020-02-08T09:30
2020-02-08T09:30Z
Datetime with secondsyyyy-MM-dd HH:mm:ss
yyyy-MM-ddTHH:mm:ss
yyyy-MM-ddTHH:mm:ssZ
2020-02-08 09:30:26
2020-02-08T09:30:26
2020-02-08T09:30:26Z
Datetime with millisecondsyyyy-MM-dd HH:mm:ss.SSS
yyyy-MM-ddTHH:mm:ss.SSS
yyyy-MM-ddTHH:mm:ss.SSSZ
2020-02-08 09:30:26.123
2020-02-08T09:30:26.123
2020-02-08T09:30:26.123Z
💡

The following formats are not supported:

  • Any format with a week number
    Example: 2020-W06-5 (Week 6, Day 5 of 2020)
  • Any format with ordinal dates.
    Example: 2020-039 (Day 39 of 2020)
  • Formats with a time zone offset that do not contain a Z
    Example: 2020-02-08 09+07:00
  • Short formats that do not contain any special characters, such as -, T, Z, etc.
    Example: 20200208T0930
  • Formats that separate seconds and milliseconds with a comma
    Example: 2020-02-08T09:30:26,123
  • Formats that separate seconds and milliseconds with a colon
    Example: 2020-02-08 09:30:26:123
  • Date only formats that have a time zone component
    Example: 2020-02-08Z

5. Numerical values

  • must have a . as decimal separator
  • must not have a thousands separator
  • must have missing values encoded as empty strings