lightbulb

Track the progress of your synthesization job,
review the column details, or stop AI model training.

list

A job needs to be started before you can use this guide
Feel free to start one of our tutorials if you don’t have any.

clock

It will take 5 mins to read this guide.
You’ll learn where to find the synthesization tasks and training history per table.


The Jobs page

The Jobs page appears automatically when you log into MOSTLY AI.
Let’s take a look around before we start configuring your synthetic data generation job.

Getting around

The Jobs page gives you several options to choose from:

green 1 Main navigation

Jobs

See the status of your synthetic data generation jobs,
share them with others, download the synthetic data and QA report,
or generate more data from the job’s trained AI model.

Catalogs

Catalogs are synthetic data templates for your data sources.
Click this menu item to browse your catalogs or create a new one.

Connectors

Connectors enable you to connect to your company’s data sources.
Click this menu item to browse your connectors or create a new one.


green 2 Settings & documentation

docs

Open the documentation

profile

Sign out from MOSTLY AI

settings

Modify the system settings - Only visible for admin users


green 3 Create synthetic data button

Click create new synthetic data to open a workspace where you can upload your data, connect to your data sources, and configure the synthesization settings.

green 4 Jobs list

Learn whether your job is in progress, finished, whether it failed, or whether you canceled it.
The following actions are available:

cancel

Cancel the job.

view job summary

Download synthetic data’s QA report.

generate more data

Reuse the job’s trained AI model to generate more data.

download

Download the job’s synthetic data, its settings, or its logs.

sharing options

Share your job, including its synthetic data and QA report, with other user groups.

delete

Delete your job, including its synthetic data and QA report.

Exploring the job details

Job details tab


green 1 Switch between Job details and QA report

When the job is completed, the QA report tab automatically open so you can learn about the synthetic data’s quality and whether it passed the privacy tests. You can switch back to the Job details tab to see the synthesization history.


green 2 Job summary

Job type

There are four jobs types:

  • Ad hoc synthesizes a dataset uploaded using the web UI.

  • Catalog job synthesizes a database or dataset stored in a cloud bucket
    or local server.

  • Generate with subject count creates a specified number of new synthetic
    subjects from a previous job’s readily trained AI model.

  • Generate with seed creates a linked table for an uploaded subject table
    using a previous job’s readily trained AI model.

Job started

This field indicates when the job was started.

Tasks completed

The number of tasks in this job that have been completed.

Catalog

The name of the catalog that is being synthesized.

Destination

The destination where the synthetic data will be written to.
This field is not shown for Ad hoc jobs.
You can always download the synthetic data as CSV or Parquet files.


green 3 Table list

Table name

Name of the table.

Current task

Learn which synthesization task is currently being perfomed.
See the table below for further details.

Status

Whether the task is in progress, finished, failed, or canceled.

Duration

How much time has elapsed.

The kebab menu on the right hand side of the row let’s you choose between View tasks and View column details. In both cases, a drawer opens where you can learn more about the table’s synthesization process or its encoding types, respectively.

View tasks

View tasks


Task Step Description

Synthetizing table
Generating text

Organizing data

Ensures that very large tables can be processed regardless of system memory size.

Data analysis

The table is analyzed for its data types and unique values.

Transforming data

The table is transformed for efficient processing.

AI training

Using generative neural networks, a model is trained to retain your dataset’s granularity, statistical correlations, structures, and time-dependencies.

Generating synthetic data

The resulting AI model is used to create a synthetic version of the table.

Packaging synthetic data

Creating zip archive

Creates a ZIP archive with the synthetic version of the dataset.

Creating the quality assurance
report

Analyzing synthetic data for quality and accuracy

The resulting synthetic table is tested against the original for accuracy and privacy. It checks for identical information matches and whether the synthetic subjects are dissimilar enough to the original subjects to prevent re-identification.


View column details

View tasks


green 4 Expand row to view table synthesization settings

Number of training rows

The number of rows used for training.

Number of generated rows

The number of rows being generated.

Training goal

Whether the training goal has been set to
Accuracy or Speed.

Training epochs

The maximum number of epochs set in the training settings.

Model size

The model size selected in the training settings,
either Small, Medium, or Large.

Batch size

The batch size set in the training settings.