Home page

Home page

The Home page in the MOSTLY AI Synthetic Data Platform provides direct access to features that you can use to start synthetic data generation or access previously generated synthetic data. You can review the features below.

MOSTLY AI - Home page overview
  • Upload files
    On the Upload files tab, you can upload (drag-and-drop or browse to select) CSV or Parquet files to immediately configure and start a synthetic dataset.
  • Connect to a source
    On the Connect to a source tab, you can immediately create a connection to a new database or cloud bucket.
  • Synthesize a sample dataset
    Under Or use sample data, you can immediately start a synthetic dataset with any of the datasets that are available. Pick one and start a new synthetic dataset for it with the Start button.
  • Last six generated synthetic datasets
    Under Existing synthetic datasets, you can review the last six generated synthetic datasets. The card for each synthetic dataset indicates the overall Accuracy of the trained AI models that generate the synthetic data.

Upload files

Use the Upload files tab on the left side to drag-and-drop CSV (.csv), TSV (.tsv), or Parquet (.parquet) files from which you want to generate synthetic data.

Note
You can upload only one table of data. If you have a table that is split into multiple files, you can drag and upload all files.

MOSTLY AI - Home page Upload files tab

For next steps on how to configure a synthetic dataset, see Configure synthetic datasets.

Connect to a source

If you want to generate synthetic data from an existing database or files in a cloud bucket, select the Connect to a source tab for a direct access to the Create connector workflow.

MOSTLY AI - Home page Upload files tab

For next steps on how to configure a connector, see Connectors.

Synthesize a sample dataset

Under Or use sample dataset, you can immediately generate synthetic data from one of the datasets that we prepared for you. Select a dataset and click Start for it to go to the Start job screen where you can configure the synthetic dataset settings.

MOSTLY AI - Home page Use sample data

Below is a description of each of the sample datasets.

DatasetDescriptionMore info
UCI Adult datasetThe UCI Adult dataset, also known as the Census Income dataset, is a well-known dataset used in machine learning and statistics. It contains census data from 1994 and consists of 48,842 instances, each representing an individual.Link (opens in a new tab)
Bank MarketingThe Bank Marketing dataset is another well-known dataset used in machine learning and statistics. It is also known as the UCI Bank Marketing dataset because it is hosted by the University of California, Irvine.Link (opens in a new tab)
Online ShoppersThe Online Shoppers Purchasing Intention dataset is a popular dataset hosted by the University of California, Irvine. It contains information on the browsing and purchasing behavior of visitors to an online store over a period of one year (from May 2010 to May 2011).Link (opens in a new tab)

Last six generated synthetic datasets

You have an overview of the last six generated synthetic datasets under Existing synthetic datasets. The card for each synthetic dataset shows the overall Accuracy of the trained AI models that generate the synthetic data.

The overall Accuracy is an aggregated statistic. To learn more about the QA Report and how the accuracy score is calculated, see Read the QA Report

MOSTLY AI - Home page Use sample data