The Home page in the MOSTLY AI synthetic data platform provides direct access to features that you can use to start synthetic data generation or access previously generated synthetic data. You can review the features below.

-
Upload files
On the Upload files tab, you can upload (drag-and-drop or browse to select) CSV of Parquet files to immediately configure and start a synthetic data job. -
Connect to a source
On the Connect to a source tab, you can immediately create a connection to a new database or cloud bucket. -
Generate synthetic data from a sample dataset
Under Or use sample data, you can immediately start a synthetic data job with any of the datasets that are available. Pick one and start a synthetic data job for it with the Start button. -
Last six completed jobs
Under Existing synthetic datasets, you can review the last six completed jobs. The card for each job indicates if the synthetic data passed the Privacy check and what its overall Accuracy is.
Upload files
Use the Upload files tab on the left side to drag-and-drop CSV (.csv
) or Parquet (.parquet
) files from which you want to generate synthetic data.
You can upload only one table of data. If you have a table that is split into multiple files, you can drag and upload all files. If you want to upload files that include two or more tables, start an Ad hoc job from the Jobs tab. |

For next steps on how to configure a job to create synthetic data, see Create synthetic data.
Connect to a source
If you want to generate synthetic data from an existing database or a file in a cloud bucket, select the Connect to a source tab for a direct access to the Create connector workflow.
For next steps on how to configure a connector, see Connect to your data sources.
Generate synthetic data from a sample dataset
Under Or use sample dataset, you can immediately generate synthetic data from one of the datasets that we prepared for you. Select a dataset and click Start for it to go to the Start job screen where you can configure the job settings.

Below is a description of each of the sample datasets.
Dataset | Description | More info |
---|---|---|
UCI Adult dataset 10k samples |
The UCI Adult dataset, also known as the Census Income dataset, is a well-known dataset used in machine learning and statistics. It contains census data from 1994 and consists of 48,842 instances, each representing an individual. |
|
Online Shoppers |
The Online Shoppers Purchasing Intention dataset is a popular dataset hosted by the University of California, Irvine. It contains information on the browsing and purchasing behavior of visitors to an online store over a period of one year (from May 2010 to May 2011). |
|
Bank Marketing |
The Bank Marketing dataset is another well-known dataset used in machine learning and statistics. It is also known as the UCI Bank Marketing dataset because it is hosted by the University of California, Irvine. |
Last six completed jobs
You have an easy view over the last six completed jobs under Existing synthetic datasets. The card for each job indicates if the generated synthetic data passes the Privacy check and what its overall Accuracy is.
The overall Accuracy is an average of all univariate and bivariate accuracy scores for each column of data from the QA report. |
