August 16, 2023
1m 37s

Getting Started with MOSTLY AI - Synthetic Data Subject Tables Explained


In this video, we'll be discussing what subject tables are, how to generate synthetic data, and how to protect the privacy of the individuals contained within the data.

If you're looking to get started with synthetic data generation, then this video is must-see! The concept of subject tables is a fundamental one and we all need to understand it to be able to create privacy-safe synthetic data effectively.

[00:00:00] Hi everyone. In this video, we'll talk about subject tables and what they actually are.

[00:00:06] I uploaded here a dataset, a single CSV file and I see here the table User Data and it says here Subject table.

[00:00:19] Maybe you see it somewhere else in our platform, but it's an important concept and something that you should really understand when you synthesize data.

[00:00:28] You might want to look into our documentation, and if you search here for subject tables, you might end up here where it says subject tables basically contain one record per subject, and it's the privacy of those subjects that we want to protect.

[00:00:45] What does that really mean? It means basically if you look at this data set here, we have a data set with User Data, contains the names, the email addresses, genders, and so forth.

[00:00:57] Every individual in that data set is one row of data. Every row of that data, every individual is one subject. These subjects here, these individuals, these are the entities that we want to protect, where we want to protect the privacy.

[00:01:15] It can be users like in this example, it can be business partners, it can be other subjects, but it's important that it's always one subject per row and they are unique. That's the concept of subject tables.

[00:01:35] Thanks for watching.

