Click on the Settings tab to optionally specify the number of training and generated subjects for each subject table.

General settings

By specifying the size of a subject table, you also determine the size of its linked tables. MOSTLY AI processes the linked tables' entries as properties of the subject table’s entries. Therefore, if the number of generated subjects is bigger or smaller than the original, the linked table size changes accordingly. These changes have no impact on the size of reference tables.

If you leave these fields blank, then MOSTLY AI will use the same number of subjects during training and generation as the original dataset. By changing these values, you can do some nifty things:

Speed up AI model training, but at the cost of accuracy

Simply specify a lower number of training subjects than what’s present in your subject table to speed up AI model training. However, the resulting synthetic data will be a less accurate representation of your original data.

Increase the size of small datasets

You can produce more synthetic subjects than there are subjects in your original dataset. In data science, this is called data augmentation. It is particularly useful for downstream AI model training tasks, where the increased number of subjects can result in better performing, more accurate models.

To augment a small dataset, leave the number of training subjects field blank and increase the number of generated subjects to a useful quantify for AI model training.

Create a representative subset of your data

This feature makes generating test data a breeze. MOSTLY AI can use all data points from your original dataset to create a smaller-sized, representative subset of your data that retains all statistical features, breathes realism, and covers all the business scenarios for testing.

You can create a subset of your data by leaving the number of training subjects field blank and decreasing the number of generated subjects to the desired size.

for best results, we strongly recommend that to start your first run with a small number of subjects and then increase it for later runs.