Prepare your data

Before you generate synthetic data with MOSTLY AI, review some of the considerations and requirements that can help you avoid unexpected errors, maintain the privacy of the subjects (people, companies, or any other entities), and ensure higher accuracy of the generated synthetic data.

If your original data is in CSV files, see CSV file requirements.
To prepare to train a two-table or multi-table generator, see Subject and linked table requirements.

In the context of MOSTLY AI, subject tables are the tables that contain the private information of people, companies, or other entities.

Linked tables typically have foreign keys to subject tables. Before you successfully train a generator on subject and linked tables, it is important to understand the requirements for how your original data in those tables should be structured.