Key use cases

  • Ad hoc jobs are ideal for one-off synthetic data generation jobs — datasets that you synthesize once and don’t expect to spend any more time on.

  • Another benefit of ad hoc jobs is that you can easily upload CSV files from your computer. Each file can have a maximum size of 5 Gb. Read the Preparing your dataset section to learn how to format them.

This guide presents a comprehensive walkthough on setting up and running ad hoc jobs. Its nine steps guide you through the process of uploading your dataset and configuring the job settings, column details, and training parameters. Once everything’s set, you can confidently start your synthetization job.

Each of these steps come with some handy tips and tricks that will help you tailor your synthetic data to your needs:

Learn about data augmentation and subsetting when configuring the job settings.

You can use these features to ensure that downstream analytics and AI training applications have enough data points to produce accurate and realistic results. For downstream QA and test engineering tasks, you can generate test data that’s a smaller-sized, representative subset of your data, while retaining all the relevant business scenarios for testing.

Fine-tune privacy protection and realism of your synthetic dataset.

By configuring which rare categories are protected and how they’re protected, you can enhance realism while optimizing privacy.

Use the training parameters to make appropriate trade-offs between accuracy and speed.

Sometimes you need your data quickly. MOSTLY AI will always guarantee the privacy and realism of the synthetic version of your data, regardless of your choices.

To start an ad hoc job, simply select Jobs in the main menu on the left-hand side and then click on the Ad hoc job icon in the Launch new job pane.

Launch new job


Below the Launch new job section, you can find a list of previous jobs.
Each entry shows the following details:

Job name

The name of the job as it was specified during its configuration.

Job type

Whether it was an Ad hoc job, Data catalog job, Generate with subject count job,
or Generate with seed job.

Date

The date and time that the job was started.

Status

Whether the job is Queued, Running, Finished, or Failed.

Launch new job