Synthetic datasets
To generate synthetic data in MOSTLY AI, you start a new synthetic dataset. You can view all finished, canceled, failed, and in-progress synthetic datasets on the Synthetic datasets page.
What is a synthetic dataset?
A synthetic dataset contains the generated (single- or multi-table) data as well as a number of additional artifacts.
- Generated synthetic data (available to download in CSV, Parquet, XLSX formats)
- Usage statistics
- Generated data points
- Credits used
- Data insights
- Generator quality - Overall, Univariate, Bivariate, Coherence
- Distances
- Model report for the quality of the generator
- Data report for the quality of the synthetic dataset
- Data samples - 10 generated samples from the generated data (that you can resample as needed)
- Configuration
- JSON dictionary of the synthetic dataset configuration
- Python client code to access the synthetic data via Python or Jupyter Notebook
Create a synthetic dataset
For more information, see Generate single- and multi-table synthetic datasets.
Configure a synthetic dataset
Set sample size and temperature |
Rebalance columns |
Impute data |
Evaluate quality |
Deliver to databases and cloud buckets |
Use a seed dataset for conditional generation |
Fair synthetic data |