Track the progress of your synthesization job, review the column details, or stop AI model training.
A job needs to be started before you can use this guide
Feel free to start one of our tutorials if you don’t have any.
It will take 5 mins to read this guide.
You’ll learn where to find the synthesization tasks and training history per table.
The Jobs page
The Jobs page appears automatically when you log into MOSTLY AI.
Let’s take a look around before we start configuring your synthetic data generation job.
The Jobs page gives you several options to choose from:
Main navigation
Jobs
See the status of your synthetic data generation jobs,
share them with others, download the synthetic data and QA report,
or generate more data from the job’s trained AI model.
Catalogs
Catalogs are synthetic data templates for your data sources.
Click this menu item to browse your catalogs or create a new one.
Connectors
Connectors enable you to connect to your company’s data sources.
Click this menu item to browse your connectors or create a new one.
Settings & documentation
Open the documentation
Sign out from MOSTLY AI
Modify the system settings - Only visible for admin users
Create synthetic data button
Click to open a workspace where you can upload your data, connect to your data sources, and configure the synthesization settings.
Jobs list
Learn whether your job is , , whether it , or whether you it.
The following actions are available:
Cancel the job.
Download synthetic data’s QA report.
Reuse the job’s trained AI model to generate more data.
Download the job’s synthetic data, its settings, or its logs.
Share your job, including its synthetic data and QA report, with other user groups.
Delete your job, including its synthetic data and QA report.
Exploring the job details
Switch between Job details and QA report
When the job is completed, the QA report tab automatically open so you can learn about the synthetic data’s quality and whether it passed the privacy tests. You can switch back to the Job details tab to see the synthesization history.
Job summary
Job type
There are four jobs types:
Ad hoc synthesizes a dataset uploaded using the web UI.
Catalog job synthesizes a database or dataset stored in a cloud bucket
or local server.
Generate with subject count creates a specified number of new synthetic
subjects from a previous job’s readily trained AI model.
Generate with seed creates a linked table for an uploaded subject table
using a previous job’s readily trained AI model.
Job started
This field indicates when the job was started.
Tasks completed
The number of tasks in this job that have been completed.
Catalog
The name of the catalog that is being synthesized.
Destination
The destination where the synthetic data will be written to.
This field is not shown for Ad hoc jobs.
You can always download the synthetic data as CSV or Parquet files.
Table list
Table name
Name of the table.
Current task
Learn which synthesization task is currently being perfomed.
See the table below for further details.
Status
Whether the task is , , , or .
Duration
How much time has elapsed.
The kebab menu on the right hand side of the row let’s you choose between View tasks and View column details. In both cases, a drawer opens where you can learn more about the table’s synthesization process or its encoding types, respectively.
View tasks
Task
Step
Description
Synthetizing table Generating text
Organizing data
Ensures that very large tables can be processed regardless of system memory size.
Data analysis
The table is analyzed for its data types and unique values.
Transforming data
The table is transformed for efficient processing.
AI training
Using generative neural networks, a model is trained to retain your dataset’s granularity, statistical correlations, structures, and time-dependencies.
Generating synthetic data
The resulting AI model is used to create a synthetic version of the table.
Packaging synthetic data
Creating zip archive
Creates a ZIP archive with the synthetic version of the dataset.
Creating the quality assurance report
Analyzing synthetic data for quality and accuracy
The resulting synthetic table is tested against the original for accuracy and privacy. It checks for identical information matches and whether the synthetic subjects are dissimilar enough to the original subjects to prevent re-identification.
View column details
Expand row to view table synthesization settings
Number of training rows
The number of rows used for training.
Number of generated rows
The number of rows being generated.
Training goal
Whether the training goal has been set to Accuracy or Speed.
Training epochs
The maximum number of epochs set in the training settings.
Model size
The model size selected in the training settings,
either Small, Medium, or Large.
We use third-party web analytics tools to analyze website usage and measure the success of advertising campaigns. Cookies are set in the process and data is partly transferred to the USA. Further details can be found in our privacy policy.You can revoke or adjust your selection at any time under Settings.
Here you will find an overview of all cookies used. You can give your consent to whole categories or display further information and select certain cookies.