💡 Introducing the MOSTLY AI Assistant
Read all about it here
August 16, 2023
2m 34s

Tips and Tricks For Fast Synthetic Data Generation

MOSTLY AI's synthetic data generator comes with a turbo engine. If you want to generate synthetic data really fast, for example, to check structure or to generate data for testing, and you don't care about the accuracy of your synthetic data, the turbo setting is your best friend. Check out this Tips and Trick Tutorial for speeding up your synthetic data pipelines!
Get hands on with synthetic data generation here ➡️ https://bit.ly/43IGYSv


[00:00:01] Hello. In today's Tips and Tricks video, I want to take a look at the turbo training setting.

[00:00:07] This is very useful if you want to make quick progress in your data synthesization, and at least initially, you don't care so much about the quality of the synthetic data that's generated.

[00:00:21] We have introduced this new setting turbo in a link table setup, such as the one that I'm looking at right now with my baseball players and my seasons. I have to apply it separately to both, so for players first and then for the seasons.

[00:00:37] That will speed up the generation of the synthetic data. To show you, I have here run a job as preparation for this recording, and you can already see the synthetic data.

[00:00:50] To be honest, it looks very good to the human eye. You can absolutely use this already as synthetic data, but of course, some of the correlations are not going to be that good.

[00:01:04] We can see the accuracy in the QA report in a bit. As I said, we're doing this because we want to make fast progress. This job only ran for about four minutes.

[00:01:15] While it's a normal job with full accuracy, it takes about 30 minutes or so. Now, one little cool thing that I like about our QA report also is that it actually teaches you some interesting facts about the original data.

[00:01:31] In case you didn't have the time to rigorously do data exploration before in other tools, you can totally use the QA report to understand more stuff about your original data.

[00:01:43] Under univariate distributions, for example, here we can see the birth dates distributed over time or the height curve, the weight curve, et cetera.

[00:01:56] Under bi-variate distributions, you can see correlations between data points in the dataset.

[00:02:03] One interesting one here is batting and throwing with your right hand or your left hand.

[00:02:09] The darker shape means those are a higher proportion of the dataset. You can see there that a lot of people bat with their right hand and throw with their right hand, but then there's also some people that bat with both and throw with the right hand.

[00:02:27] That's it for today. Remember to use the turbo setting to get to results quickly.

Ready to start?

Sign up for free or contact our sales team to schedule a demo.