The Home page in the MOSTLY AI Synthetic Data Platform provides direct access to features that you can use to start synthetic data generation or access previously generated synthetic data. You can review the features below.
- Upload files
On the Upload files tab, you can upload (drag-and-drop or browse to select) CSV or Parquet files to immediately configure and start a synthetic dataset.
- Connect to a source
On the Connect to a source tab, you can immediately create a connection to a new database or cloud bucket.
- Synthesize a sample dataset
Under Or use sample data, you can immediately start a synthetic dataset with any of the datasets that are available. Pick one and start a new synthetic dataset for it with the Start button.
- Last six generated synthetic datasets
Under Existing synthetic datasets, you can review the last six generated synthetic datasets. The card for each synthetic dataset indicates the overall Accuracy of the trained AI models that generate the synthetic data.
Use the Upload files tab on the left side to drag-and-drop CSV (
.csv), TSV (
.tsv), or Parquet (
.parquet) files from which you want to generate synthetic data.
You can upload only one table of data. If you have a table that is split into multiple files, you can drag and upload all files.
For next steps on how to configure a synthetic dataset, see Configure synthetic datasets.
If you want to generate synthetic data from an existing database or files in a cloud bucket, select the Connect to a source tab for a direct access to the Create connector workflow.
For next steps on how to configure a connector, see Connectors.
Under Or use sample dataset, you can immediately generate synthetic data from one of the datasets that we prepared for you. Select a dataset and click Start for it to go to the Start job screen where you can configure the synthetic dataset settings.
Below is a description of each of the sample datasets.
|UCI Adult dataset||The UCI Adult dataset, also known as the Census Income dataset, is a well-known dataset used in machine learning and statistics. It contains census data from 1994 and consists of 48,842 instances, each representing an individual.||Link (opens in a new tab)|
|Bank Marketing||The Bank Marketing dataset is another well-known dataset used in machine learning and statistics. It is also known as the UCI Bank Marketing dataset because it is hosted by the University of California, Irvine.||Link (opens in a new tab)|
|Online Shoppers||The Online Shoppers Purchasing Intention dataset is a popular dataset hosted by the University of California, Irvine. It contains information on the browsing and purchasing behavior of visitors to an online store over a period of one year (from May 2010 to May 2011).||Link (opens in a new tab)|
You have an overview of the last six generated synthetic datasets under Existing synthetic datasets. The card for each synthetic dataset shows the overall Accuracy of the trained AI models that generate the synthetic data.
The overall Accuracy is an aggregated statistic. To learn more about the QA Report and how the accuracy score is calculated, see Read the QA Report