In this tutorial we delve into the connection between the size of your training samples and the accuracy of the resulting synthetic dataset. Understanding this concept can be a game-changer, helping you cut down on computational costs and generation time, while maintaining data precision. Join us as we explore this fascinating topic step by step.

Here is the publicly available notebook, so you can follow along and experiment with different datasets, models, and synthesizers:

Access our state-of-the-art synthetic data generator for free here:

If you want to know more about how MOSTLY AI's synthetic data generator compares to other generators, read our benchmarking blogpost:

Here is what you can expect:

Introduction - 00:00:00
Hypothesis - 00:00:22
Data Setup - 00:01:56
Generating Synthetic Datasets - 00:02:45
Assessing Quality - 00:03:52
Data Comparison - 00:05:01
Rule Adherence - 00:06:45
Machine Learning Evaluation - 00:07:53
Key Takeaways - 00:09:52