To truly unlock the power of AI, organizations need faster, more flexible access to high-quality data. But slow, restricted data access and centralized data teams often create bottlenecks. Privacy-preserving synthetic data eliminates these barriers, enabling intelligent, on-demand data access. With MOSTLY AI's open-source Synthetic Data SDK, organizations can generate high-fidelity synthetic data directly within their local AWS compute environment, including SageMaker Unified Studio. This empowers teams to democratize analytics, accelerate AI model development, and streamline software testing - without relying on sensitive or restricted datasets.
Getting Started
Getting started is as easy as executing one line of code in your SageMaker notebook:
!pip install -U mostlyai[local]
Once installed, you can launch an SDK instance in local mode:
# 1) Initialize the SDK in local mode
import pandas as pd
from mostlyai.sdk import MostlyAI
mostly = MostlyAI(local=True)
Then load your original data. In this example we're loading the popular adult census dataset:
# 2) Load your original data
trn_df = pd.read_csv('https://github.com/mostly-ai/public-demo-data/raw/dev/census/census.csv.gz')
trn_df.head()
And train your first synthetic data generator:
# 3) Train a synthetic data generator
g = mostly.train(name='census', data=trn_df)
Now you can live probe your generator on-demand. In this example we'll probe for 10 rows of synthetic census data:
# 4 ) Live probe generator
df_samples = mostly.probe(g, size=10)
df_samples
And there you have it! High-fidelity privacy-preserving synthetic data generated directly in your SageMaker environment, with just a few lines of code!
![](https://mostly.ai/wp-content/uploads/2025/01/image1-1-1024x479.png)
Video
Check out this short video where Julio walks through the entire flow:
Conclusion
In one of our our previous blog posts, we demonstrated how to unlock data democratization and accelerate model development on Amazon SageMaker with privacy-preserving synthetic data. Now, with our open-source Synthetic Data SDK, organizations can take an even more frictionless approach to unlocking the power of AI in AWS - without the limitations of real-world data.