💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook

Quality assurance

The quality assurance of synthetic data involves the comparison of the univariate and multivariate probability distributions of the columns of the original and synthetic data. The goal is to minimize the difference between them while preventing privacy leakage. This can be measured by various accuracy and privacy metrics. Such an evaluation serves as a great check to verify the correct synthesization of the original data set, but can provide little insight into the impact of the synthetic data on a particular use case. Verifying that synthetic data meets use case-specific requirements is critical to the success of any data synthesization project.