Welcome to our comprehensive tutorial on mastering Quality Assurance for synthetic data. Understanding the quality of your synthetic data is a crucial for everyone working with synthetic data. In this detailed walkthrough, we'll delve deep into the key concepts behind MOSTLY AI's QA reports, focusing on both privacy and accuracy aspects of synthetic data generation.

📈 What You'll Learn:

- Understand the fundamentals of MOSTLY AI's QA reports.
- Navigate and interpret different sections of a QA report, including the model and data QA reports.
- Grasp the significance of univariate and bivariate distributions, correlations, accuracy, and different privacy metrics in QA.
- Explore a practical coding session to approximate MOSTLY AI's calculations for these metrics.
- Discover how to generate, analyze, and evaluate synthetic data using MOSTLY AI.

Dataset and code: https://bit.ly/47e5lKq

Synthetic data generation platform: https://bit.ly/3M8Lhkb

Key moments:

00:00 - Overview of MOSTLY AI's QA Reports

00:04 - Introduction to the key concepts behind MOSTLY AI's QA reports

00:16 - Explaining how MOSTLY AI quantifies both the privacy and accuracy parts of its QA reports.

00:21 - Guide on how to navigate to the QA report section in MOSTLY AI after running synthetic data generation jobs.

00:35 - Deep Dive into QA Report Details

00:43 - Exploration of QA reports, including correlations, accuracy, distributions, and privacy.

00:51 - Starting the walkthrough of Python code that approximates how MOSTLY AI calculates its QA metrics.

01:31 - Demonstrating data synthesis using MOSTLY AI with the UCI Adult Income data set.

02:14 - Analyzing the QA report generated from the synthetic data job.

02:52 - Step-by-step guide to calculate both accuracy and privacy metrics manually.

02:59 - Checking Python library versions and preparing the target data set for analysis.

07:53 - Creating plots for univariate and bivariate accuracy metrics using Python.

00:09:17 - Explanation of how MOSTLY AI calculates privacy metrics, including distance measurements and nearest neighbor analysis.