Welcome to the MOSTLY AI synthetic data platform!
With the platform, you can generate structured synthetic data from a wide range of data sources.
-
Sample datasets
-
Local files
-
Databases
-
Cloud bucket files
It is easier to start with a sample dataset or with a local file. You can review the steps for each in the following sections.
Select one and generate your first synthetic dataset!
Generate synthetic data from a sample dataset
For a quick and easy start, you can use one of the sample datasets on the Home page for your first synthetic dataset.
Steps
-
On the Home page, click Start for one of the available sample datasets.
-
On the Start job screen, click Launch job.
Result
MOSTLY AI redirects your browser to the Jobs tab where you can track the status of the job that generates your synthetic dataset.
What’s next
After the job completes, you can download the generated synthetic data.
Generate synthetic data from a local file
If you have a dataset saved in a local file, you can use it to generate synthetic data with the steps below.
The steps below show how you can use the drag area on the Home page to upload a single table (that can also span multiple files). To upload files with multiple tables, you can select the Jobs tab and click Create synthetic data. For more information, see Create synthetic data. |
Steps
-
On the Home page, drag the dataset file in the area under Upload files.
-
On the Start job screen, click Launch job.
Result
MOSTLY AI redirects your browser to the Jobs tab where you can track the status of the job that generates your synthetic dataset.
What’s next
After the job completes, you can download the generated synthetic data.
Download the generated synthetic data
Use the Download synthetic data button for the completed job to download the generated synthetic data.
Steps
-
On the Jobs screen, click the Download synthetic data button.
-
In the pop-up menu, select to download the data as a CSV or Parquet file.
What’s next
You can review how to read the QA report and understand the quality scores of the generated dataset.
If you want to generate synthetic data from cloud buckets or databases, see the links below.