The Home page in the MOSTLY AI synthetic data platform provides direct access to features that you can use to start synthetic data generation or access previously generated synthetic data. You can review the features below.

MOSTLY AI Home page
  • Upload files
    On the Upload files tab, you can upload (drag-and-drop or browse to select) CSV of Parquet files to immediately configure and start a synthetic data job.

  • Connect to a source
    On the Connect to a source tab, you can immediately create a connection to a new database or cloud bucket.

  • Generate synthetic data from a sample dataset
    Under Or use sample data, you can immediately start a synthetic data job with any of the datasets that are available. Pick one and start a synthetic data job for it with the Start button.

  • Last six completed jobs
    Under Existing synthetic datasets, you can review the last six completed jobs. The card for each job indicates if the synthetic data passed the Privacy check and what its overall Accuracy is.

Upload files

Use the Upload files tab on the left side to drag-and-drop CSV (.csv) or Parquet (.parquet) files from which you want to generate synthetic data.

You can upload only one table of data. If you have a table that is split into multiple files, you can drag and upload all files.
If you want to upload files that include two or more tables, start an Ad hoc job from the Jobs tab.
MOSTLY AI Home page Upload files tab

For next steps on how to configure a job to create synthetic data, see Create synthetic data.

Connect to a source

If you want to generate synthetic data from an existing database or a file in a cloud bucket, select the Connect to a source tab for a direct access to the Create connector workflow.

MOSTLY AI Home page Connect to a source tab

 
For next steps on how to configure a connector, see Connect to your data sources.

Generate synthetic data from a sample dataset

Under Or use sample dataset, you can immediately generate synthetic data from one of the datasets that we prepared for you. Select a dataset and click Start for it to go to the Start job screen where you can configure the job settings.

MOSTLY AI Home page Use sample data

 

Below is a description of each of the sample datasets.

Dataset Description More info

UCI Adult dataset 10k samples

The UCI Adult dataset, also known as the Census Income dataset, is a well-known dataset used in machine learning and statistics. It contains census data from 1994 and consists of 48,842 instances, each representing an individual.
The MOSTLY AI dataset includes only 10k samples.

Link

Online Shoppers

The Online Shoppers Purchasing Intention dataset is a popular dataset hosted by the University of California, Irvine. It contains information on the browsing and purchasing behavior of visitors to an online store over a period of one year (from May 2010 to May 2011).

Link

Bank Marketing

The Bank Marketing dataset is another well-known dataset used in machine learning and statistics. It is also known as the UCI Bank Marketing dataset because it is hosted by the University of California, Irvine.

Last six completed jobs

You have an easy view over the last six completed jobs under Existing synthetic datasets. The card for each job indicates if the generated synthetic data passes the Privacy check and what its overall Accuracy is.

The overall Accuracy is an average of all univariate and bivariate accuracy scores for each column of data from the QA report.
MOSTLY AI Home page Last six jobs