You can optionally use the table settings to specify:

  • Whether AI model training needs to be done quickly or accurately.

  • For linked tables, whether and how the number of records per subject should be limited.

Additionally, if the results of an earlier job were not of the desired accuracy or took too long to generate, you can open the advanced settings to improve AI model training performance.

You can find the table settings at the very bottom of the Table details tab. Select the table you want to configure from the table list and scroll down. Read on below to learn more about each setting.

Table settings

Training goal

Use the Optimize for dropdown menu to select a training goal that best suits your use case.
The following options are available:


Recommended for ML/Analytics use cases
This option trains the table’s AI model to achieve the highest attainable synthetic data accuracy. The training is stopped when the validation loss stops improving.


Recommended for Testing use cases
This option trains the table’s AI model to deliver accurate synthetic data using significantly shorter training times. The training is stopped as soon as the rate of improvement of the validation loss decreases.

Limit records per subject

For linked tables, you can choose whether you want to limit the number of records per subject. There may be a large number of records per subject that may not be relevant from a statistical point of view. By limiting the number of records, you can reduce the computational resources required to process your dataset.

For instance, bank transaction datasets often have a very skewed distribution of the number of events per customer. Customers ordinarily have 150 transactions per account on average, but there are also outliers with up to 1000 transactions.

MOSTLY AI lets you limit the number of records per subject or drop the subject from the dataset entirely if they exceed this limit.

To limit the number of records per subject, specify the Max records per subject and select from the dropdown menu how a subject is treated that exceeds this limit. Here, you can choose from No, keep all, Yes, limit records, and Yes, drop subjects.

Limit records

If your dataset contains inhomogeneous sequence length distributions, we recommend not to turn off Limit Records Per Subject. This feature reduces the privacy risk for outlier subjects.

Advanced settings

Open the advanced settings if you want to improve AI model training performance. Based on the results of previous jobs, you can use these settings to improve synthetic data accuracy and training time. Read below how to use them.

Table settings

Maximum training epochs

An epoch refers to the process of passing the table forward and backward through the neural network only once. MOSTLY AI will start new epochs until the neural network optimally learned your dataset’s features. Unfortunately, it’s not possible to know beforehand how many epochs are needed.

This setting allows you to limit the numbers of epochs to, for instance, 2, 5, or 10 — significantly reducing the time to generate your synthetic dataset but at the cost of accuracy.

Model size

Adjust the model size if the synthetization job runs into memory issues, takes too long to complete, or produces synthetic data with less than the desired accuracy.

Model size dimensions

Smaller sizes require less memory, run faster, and reduce synthetic data accuracy, whereas bigger sizes increase accuracy, require more memory, and take more time to complete.

Batch size

Batch size refers to the number of records used for each training step. Selecting a larger batch size can speed up training, but consumes more memory and can decrease accuracy.

If you get out of memory errors during the training stage, then you can try to resolve it by decreasing the batch size.