Once you’ve completed all configuration steps, you can click on Launch Job. This will start the synthetization job and open the Job summary page. Here you can learn about the job details, its progress, and browse the QA report once the job is finished.

There are two sections on this page that inform you about the synthetic data generation process:

  1. The top section provides general information:

    Job name

    The name of the job as it was specified during its configuration.

    Job type

    There are four jobs types:

    • Ad hoc synthesizes a dataset uploaded using the web UI.

    • Data catalog synthesizes a database or dataset stored in a cloud bucket
      or local server.

    • Generate with subject count creates a specified number of new synthetic
      subjects from a previous job’s readily trained AI model.

    • Generate with seed creates a linked table for an uploaded subject table
      using a previous job’s readily trained AI model.

    Uploaded

    This field indicates when the original dataset was uploaded.

    quick job 6 job summary dataset details


  2. The Job summary section tells you which stage the synthesization process is in.
    Your dataset goes through the following six stages before you can download the synthetic version:

    Submitted

    MOSTLY AI received your dataset and run configuration.

    Provisioning

    Compute resources are being allocated to your run.

    Encoding

    Your dataset is analyzed for its data types and unique values and transformed for efficient processing.

    Training

    Using generative neural networks, a model is trained to retain your dataset’s granularity, statistical correlations, structures, and time-dependencies.

    Generating

    Without having access to your dataset, MOSTLY AI uses the resulting model to create a synthetic version of your dataset.

    Analyzing

    The resulting synthetic copy is tested against the original data for accuracy and privacy. It checks for identical information matches and whether the synthetic subjects are dissimilar enough to the original subjects to prevent re-identification. MOSTLY AI discards the original dataset once this stage is completed.

    quick job 6 job summary execution details