Once you’ve completed all configuration steps, you can click on Launch Job
. This will start the synthetization job and open the Job summary
page. Here you can learn about the job details, its progress, and browse the QA report once the job is finished.
There are two sections on this page that inform you about the synthetic data generation process:
-
The top section provides general information:
Job name The name of the job as it was specified during its configuration.
Job type There are four jobs types:
-
Ad hoc synthesizes a dataset uploaded using the web UI.
-
Data catalog synthesizes a database or dataset stored in a cloud bucket
or local server. -
Generate with subject count creates a specified number of new synthetic
subjects from a previous job’s readily trained AI model. -
Generate with seed creates a linked table for an uploaded subject table
using a previous job’s readily trained AI model.
Uploaded This field indicates when the original dataset was uploaded.
-
-
The
Job summary
section tells you which stage the synthesization process is in.
Your dataset goes through the following six stages before you can download the synthetic version:Submitted MOSTLY AI received your dataset and run configuration.
Provisioning Compute resources are being allocated to your run.
Encoding Your dataset is analyzed for its data types and unique values and transformed for efficient processing.
Training Using generative neural networks, a model is trained to retain your dataset’s granularity, statistical correlations, structures, and time-dependencies.
Generating Without having access to your dataset, MOSTLY AI uses the resulting model to create a synthetic version of your dataset.
Analyzing The resulting synthetic copy is tested against the original data for accuracy and privacy. It checks for identical information matches and whether the synthetic subjects are dissimilar enough to the original subjects to prevent re-identification. MOSTLY AI discards the original dataset once this stage is completed.