💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook

Holdout set

Holdout data (also called testing data) refers to a portion of original data that is held out of the data sets used for training and validating synthetic data models. The purpose is to provide a final unbiased comparison of the machine learning model's performance trained on the original and the synthetic data. Accurate synthetic data should not overfit on the training set of the original data and should generalize so that a model trained on such synthetic data achieves comparable results on the original holdout dataset with a model trained on the original training data set.