Once you’ve completed all configuration steps, you can click on Launch Job. This will start the synthetization job and open the Job summary page. Here you can learn about the job details, its progress, and browse the QA report once the job is finished.

There are two sections on this page that inform you about the synthetic data generation process:

  1. The top section provides general information:

    Job name

    The name of the job as it was specified during its configuration.

    Job type

    There are four jobs types:

    • Ad hoc synthesizes a dataset uploaded using the web UI.

    • Data catalog synthesizes a database or dataset stored in a cloud bucket
      or local server.

    • Generate with subject count creates a specified number of new synthetic
      subjects from a previous job’s readily trained AI model.

    • Generate with seed creates a linked table for an uploaded subject table
      using a previous job’s readily trained AI model.

    Uploaded

    This field indicates when the original dataset was uploaded.

    quick job 6 job summary dataset details


  2. The Job summary section informs you about the synthesization tasks currently being performed. It shows which tables are being synthesized, the current tasks, their status, and the total duration. In addition, you can click on the kebab icon on the right side of each entry to see a detailed task list or an overview of the columns' generation methods and encoding types.

    feat job summary table list

    A task list appears when you choose View tasks from the kebab menu. The table below provides an overview of all the tasks and steps you will see in this list.

    feat job summary task list
    Task Step Description

    Synthetizing table
    Generating text

    Organizing data

    Ensures that very large tables can be processed regardless of system memory size.

    Data analysis

    The table is analyzed for its data types and unique values.

    Transforming data

    The table is transformed for efficient processing.

    AI training

    Using generative neural networks, a model is trained to retain your dataset’s granularity, statistical correlations, structures, and time-dependencies.

    Generating synthetic data

    The resulting AI model is used to create a synthetic version of the table.

    Packaging synthetic data

    Creating zip archive

    Creates a ZIP archive with the synthetic version of the dataset.

    Creating the quality assurance
    report

    Analyzing synthetic data for quality and accuracy

    The resulting synthetic table is tested against the original for accuracy and privacy. It checks for identical information matches and whether the synthetic subjects are dissimilar enough to the original subjects to prevent re-identification.

    To learn more about the AI training step, you can click on View training logs to see how the AI model training is going. Here you can find a chart depicting the training and validation loss per epoch.

    feat job summary training log

    If you consider the model to be sufficiently trained, or you want to speed up the synthetization process, you can click on Stop training to skip to the synthetic data generation step.