💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
August 16, 2023
1m 39s

Getting Started with MOSTLY AI - Synthetic Data Generation Output Settings Explained


In this video, we'll show you how you can adjust the output of your synthetic data generation on MOSTLY AI. If you want to create smaller or larger versions of your dataset, MOSTLY AI's synthetic data generation allows you to quickly and safely subset or supersize your data, while keeping the statistical properties intact.

Get started with synthetic data generation for free here ➡️ https://bit.ly/43IGYSv


[00:00:00] Hi, everyone. In this video we will talk about the output settings.

[00:00:04] There are only two things that I can configure here. First is the Data destination, which means where the synthetic data will be stored.

[00:00:11] The first option is download as CSV/Parquet, which means the data won't be automatically stored anywhere. Once the job is completed, I can then, actually, go into the job, and then download my synthetic data from there as a CSV and Parquet file. That's always possible.

[00:00:27] If I have defined some connectors, some databases, or cloud storages, I could select them here in order to store that data automatically in those data destinations.

[00:00:38] The second option here is for every subject table, I can define the number of generated subjects. If this is left blank, the exact number of existing rows of data in the subject table will be generated.

[00:00:53] Let's say, for example, this User Data table has 10,000 rows of data originally. If I leave this blank, 10,000 rows of synthetic data will be generated.

[00:01:03] I could, actually, create a low number of subjects. I can say only create 5,000 subjects, which means effectively, the data would be subsetted.

[00:01:12] The other way around, I could also say, "Let's create more data." I could say, "Let's create 50,000 rows of synthetic data," which means we would, actually, have more data than we had originally.

[00:01:26] That's it for the configuration of the output settings. Once you have completed the Tables, Data settings and Output settings, you can click Launch job and take it from there. Thanks for watching.

Ready to try synthetic data?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.