Help our research and support earthquake victims by filling out the Synthetic Data Survey! ❤️
Read more

Sign up

Thank you.

Thank you very much for signing up for the MOSTLY AI platform. You will receive a confirmation e-mail in the next few minutes.
Contact us

Synthetic data generation for free forever, up to 100K rows per day

The best AI-driven synthetic data generator is available free of charge for up to 100K rows daily. Do you want to generate high-quality, privacy-safe synthetic versions of your datasets? MOSTLY AI's synthetic data generator is at your service for machine learning, testing or data sharing use cases. The best test data and the best training data is always synthetic. And it's available straight from your browser after a simple registration. Once you signed up, you will receive an email with your login details. Please check your Spam folders in case you can't find it.
Already registered? 
Log in

How to generate synthetic data?

Forget cumbersome manual data generation and risky data anonymization. Generate flexibly sized, realistic synthetic data with the push of a button. Simply log in with your email address, upload your data directly or connect to a database, configure the synthesization process and you are good to go. Read more about how to generate synthetic data with tips for data prep.

How does MOSTLY AI's synthetic data generator work?

A number of approaches have emerged to create synthetic data, including GANs, Variational Autoencoders as well as Autoregressive Networks. We actively research all of them, and have developed our own unique combination of techniques to provide the best possible results to our customers in terms of accuracy, privacy as well as flexibility. Our approach continues to outperform other solutions by a wide margin, and we don't stop here, but continue to actively advance the field with our team of world-class AI experts over the coming years.

What is synthetic data? And why do we need synthetic data generation tools?

Synthetic data generators with powerful AI engines can learn the patterns and correlations of a dataset. Once the AI engine has been trained on a dataset, the synthetic data generator can recreate as much or as little statistically identical data as you need. The resulting synthetic data contains none of the original datapoints and is perfectly privacy-safe. However, synthetic data does contain all of the intelligence from the original data, including business rules. As a result, it is perfect for software testing and for building machine learning and AI models. You can synthesize entire databases and automate your test data pipeline. Synthetic data is a powerful privacy enhancing technology and test data generation tool.

What if my source dataset is very large, and I don’t want to touch the production version of this dataset or database?

If you don’t want to touch your production dataset / database, or it is too big to handle easily, you can always make a copy or sample a part of it with the data you consider to be more important to synthesize. Sometimes, large databases contain multiple tables that are not important; this gives you the opportunity of sampling only the tables / data that you and your team consider relevant for synthesization. In addition, MOSTLY AI allows you to create Data Catalogs for your databases, which give you the opportunity to select only the tables and columns that you want to synthesize, as well as to rank their importance. You can also make sure that the references are properly mapped using the Reference Manager. This video tutorial can guide you through the steps to achieve the best results when generating synthetic data with MOSTLY AI.

How safe is my data?

We use a secure AWS cloud environment. We do not see or retain any of your uploaded data. Once the job is completed, the original dataset is deleted. The generated synthetic data comes with an automatically generated privacy and accuracy QA report. If the synthetic data passes the privacy and accuracy checks, you can safely use it without any privacy concerns.

What is the optimal size of source data for MOSTLY AI to learn and accurately generate synthetic data?

We recommend that your subject tables include more than 5000 subjects. Even though the minimum number to start a synthetic data generation job is 100 subjects, the more subjects available, the better the training algorithm can generalize their features, which results in a decreased privacy risk. Our guide on Preparing your dataset can give you more details on how to achieve the best results with MOSTLY AI.

What is a test data generator?

A test data generator is a must-have tool in software testing. Mock data generators are widely available, however, mock or fake data lacks realism. MOSTLY AI's test data generator uses the power of AI to provide realism. The test data generator learns the business rules embedded in the production data, the correlations between columns and tables. Once the AI has learned the properties of your sample dataset, you can generate as much or as little test data as you need. Our test data generator is available free of charge for generating up to 100K rows of data per day. Should you need more than that, please reach out to our team for help! 

I need help with synthetic data generation. What should I do? 

We are there for you every step of the way throughout your synthetic data journey. If you have a specific question about using MOSTLY AI's synthetic data generator, join MOSTLY AI's synthetic data community on Discord and one of our synthetic data experts will be happy to help you with the task at hand. If you would like to learn more about how to use a synthetic data generator, take a look at our documentation, where you will find a quick start guide to synthetic data generation and tutorials to help you get started. Alternatively, you can also send an email to and we will get back to you. 

How can I get access to the paid version?

If you would like to use the proprietary version of MOSTLY AI's synthetic data platform - with advanced data connection capabilities for synthesizing complex data structures with referential integrity - please contact us.

Synthetic data generation has never been easier

MOSTLY AI's synthetic data generator offers an easy way to generate synthetic data with reliable results and built-in privacy mechanisms. Synthetic data generation is a must-have capability for building better and privacy safe machine learning models and to safely and easily collaborate with others on data projects involving sensitive customer data. Learn how to generate synthetic data to unlock a whole new world of data agility! 
Learn how to generate synthetic data