The Synthetic Data SDK is an Open Source Python toolkit for creating high-fidelity, privacy-safe Synthetic Data.
The SDK allows you to programmatically create, browse and manage three key resources:
!pip install -U mostlyai
# initialize the SDK
from mostlyai.sdk import MostlyAI
mostly = MostlyAI()
# train a generator
g = mostly.train(data="/path/to/data")
# inspect generator quality
g.reports(display=True)
# generate any number of new privacy-safe samples
mostly.probe(g, size=1_000_000)
# generate new synthetic samples to your needs
mostly.probe(g, seed=[{'age': 65, 'gender': 'male'}])
# export and share your generator
g.export_to_file()