August 16, 2023
1m 42s

Getting started with MOSTLY AI - Creating Synthetic Data Using Sample Data


It's super easy to start generating synthetic data. The first step is to select a sample dataset for the synthetic data generator to learn from. Check out this video to see how you can select a preloaded dataset on MOSTLY AI's synthetic data platform to start generating your own synthetic data!
[00:00:00] Hello. In this video, I'm going to show you how easy it is to create synthetic data using the MOSTLY AI synthetic data platform and its provided sample data.

[00:00:11] First, I have to log in. If I don't have an account, I can sign up here. I have an account. I'm using Google authentication to log in.

[00:00:20] Then here in the homepage, you see sample data. The MOSTLY AI synthetic data platform takes existing structured tabular data and turns it into its synthetic version. For that, we need some input data.

[00:00:34] Here we have three samples that we can use. Actually, if you go into our documentation, you can learn more about those data sets.

[00:00:41] For the purpose of this demonstration, I'm going to use the UC Irvine Adult dataset. It's a well-known data set in the machine learning community that contains census data from 1994.

[00:00:55] I'm just going to click here on start. That data is now pre-loaded here for this job. Then really, nothing I need to do. I just can click launch job.

[00:01:06] Actually, just one thing I need to do. I need to define where the synthetic data eventually should be stored. For that, I'm just going to say download as a CSV or Parquet file and then click launch job again. Really nothing that I need to configure or worry about.

[00:01:23] Now, the job is starting. It's going to take about one or two minutes to complete. I previously did this already. This is what a completed job looks like, and we'll get to that in the next video. Super easy, and thanks for watching.

