🚀 MOSTLY AI releases World’s First Industry-Grade Open-Source Toolkit for Synthetic Data
Read all about it here
February 13, 2025
3m 20s

Synthetic Data SDK on Snowflake

In this tutorial, we’ll walk you through how to create a privacy-preserving synthetic dataset directly within a notebook in Snowflake using Python and the Synthetic Data SDK. You’ll learn how to install and configure the Synthetic Data SDK locally, set up a generator for synthetic data, and customize privacy settings for your dataset.

We’ll demonstrate how to generate synthetic data from a real-world dataset, maintaining the original statistical distribution while ensuring privacy. By the end of this video, you’ll know how to generate and share privacy-preserving synthetic data with your team, all within minutes!

Key Steps Covered:

- Installing the Synthetic Data SDK locally in your environment
- Importing and using pandas to load datasets
- Setting up and configuring the synthetic data generator
- Enabling differential privacy and accessing model metrics
- Real-time sampling of synthetic data
- Sharing and collaborating with colleagues on privacy-preserving data
- Key troubleshooting tips for Snowflake notebook environments

We also explain the necessary network rules for enabling external access to install required libraries and datasets within Snowflake.

For more tutorials and examples, visit our GitHub, where you can explore open-source resources for creating synthetic data across different environments.

Transcript

Ready to start?

Sign up for free or contact our sales team to schedule a demo.
magnifiercross