In this follow-up demo, we take the realistic mock healthcare data generated in the previous video and show you how to scale it significantly using the MOSTLY AI Assistant. You'll see how we go from a small dataset to 100,000 fully referential synthetic patient records and send the output directly to a Databricks destination.
🚀 What you’ll see in this video:
How to configure a generator based on existing mock data using natural language prompts
How to confirm relational integrity between tables using the visual UI
How to scale the dataset from 150 to 100,000 patients with all related tables adjusting automatically
How to send synthetic data directly to Databricks using a preconfigured schema
How to verify table creation, data structure, and sample data inside the Databricks workspace
How to confirm the dataset contains exactly 100,000 rows using a quick query
💡 Why this is useful:
This scaled synthetic dataset can be used for software development, testing, or demo purposes. It keeps relationships intact and protects privacy by not relying on any real patient data. No sensitive information is exposed, and no legal approvals are required.
🛠️ This workflow supports multiple destinations including Databricks, Snowflake, and MySQL. The Assistant and destination connectors make synthetic data delivery fast and safe across any environment.
📥 Try it for yourself and see how fast you can go from mock data to production-scale synthetic datasets using MOSTLY AI.
📺 Be sure to watch the previous video to see how the original mock data was created with natural language.