💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook

We are under Maintenance

Please be aware that our systems are under maintenance. Please check back later.

Thank you.

Thank you for signing up for MOSTLY AI. Check your email inbox, and use the link to log in.
Contact us

Synthetic data generation for free forever, up to 100K rows per day

The best AI-powered synthetic data generator is available free of charge for up to 100K rows daily. Generate high-quality, privacy-safe synthetic versions of your datasets for ML, advanced analytics, software testing and data sharing. MOSTLY AI's synthetic data generator also offers AI-powered data augmentation features, such as rebalancing and synthetic data imputation. It's available straight from your browser after a simple registration. No credit card required.
Once you signed up, you will receive an email with your login details. Please check your Spam folders in case you can't find it.

How to generate synthetic data?

Forget cumbersome manual data generation and risky data anonymization. Generate flexibly sized, realistic synthetic data with the push of a button. Simply log in with your email address, upload your data directly or connect to a database, configure the synthesization process and you are good to go. Read more about how to generate synthetic data with tips for data prep.

How does MOSTLY AI's synthetic data generator work?

A number of approaches have emerged to create synthetic data, including GANs, Variational Autoencoders as well as Autoregressive Networks. We actively research all of them, and have developed our own unique combination of techniques to provide the best possible results to our customers in terms of accuracy, privacy as well as flexibility. Our approach continues to outperform other solutions by a wide margin, and we don't stop here, but continue to actively advance the field with our team of world-class AI experts over the coming years.

What is synthetic data? And why do we need synthetic data generation tools?

Synthetic data generators with powerful AI engines can learn the patterns and correlations of a dataset. Once the AI engine has been trained on a dataset, the synthetic data generator can recreate as much or as little statistically identical data as you need. The resulting synthetic data contains none of the original datapoints and is perfectly privacy-safe. However, synthetic data does contain all of the intelligence from the original data, including business rules. As a result, it is perfect for software testing and for building machine learning and AI models. You can synthesize entire databases and automate your test data pipeline. Synthetic data is a powerful privacy enhancing technology and test data generation tool.

How to connect to a local database?

In order to connect a local database that's hosted in your machine, it will be necessary to expose it to the Internet. Data destinations in a local machine use localhost as the endpoint, which is the default name of the computer you are working on. If you want to expose your localhost to the Internet in order to accept connections, it would be necessary to use a tool like ngrok to allow external connections. If your goal is to connect a local database to MOSTLY AI, we have a tutorial on how to create a data destination on your local machine.

What if my source dataset is very large, and I don’t want to touch the production version of this dataset or database?

If you don’t want to touch your production dataset / database, or it is too big to handle easily, you can always make a copy or sample a part of it with the data you consider to be more important to synthesize. Sometimes, large databases contain multiple tables that are not important; this gives you the opportunity of sampling only the tables / data that you and your team consider relevant for synthesization. In addition, MOSTLY AI allows you to create Data Catalogs for your databases, which give you the opportunity to select only the tables and columns that you want to synthesize, as well as to rank their importance. You can also make sure that the references are properly mapped using the Reference Manager. This video tutorial can guide you through the steps to achieve the best results when generating synthetic data with MOSTLY AI.

How safe is my data?

We use a secure AWS cloud environment. We do not see or retain any of your uploaded data. Once the job is completed, the original dataset is deleted. The generated synthetic data comes with an automatically generated privacy and accuracy QA report. If the synthetic data passes the privacy and accuracy checks, you can safely use it without any privacy concerns.

What is the optimal size of source data for MOSTLY AI to learn and accurately generate synthetic data?

We recommend that your subject tables include more than 5000 subjects. Even though the minimum number to start a synthetic data generation job is 100 subjects, the more subjects available, the better the training algorithm can generalize their features, which results in a decreased privacy risk. Our guide on Preparing your dataset can give you more details on how to achieve the best results with MOSTLY AI.

What is a test data generator?

A test data generator is a must-have tool in software testing. Mock data generators are widely available, however, mock or fake data lacks realism. MOSTLY AI's test data generator uses the power of AI to provide realism. The test data generator learns the business rules embedded in the production data, the correlations between columns and tables. Once the AI has learned the properties of your sample dataset, you can generate as much or as little test data as you need. Our test data generator is available free of charge for generating up to 100K rows of data per day. Should you need more than that, please reach out to our team for help! 

I need help with synthetic data generation. What should I do? 

We are there for you every step of the way throughout your synthetic data journey. If you have a specific question about using MOSTLY AI's synthetic data generator, join MOSTLY AI's synthetic data community on Discord and one of our synthetic data experts will be happy to help you with the task at hand. If you would like to learn more about how to use a synthetic data generator, take a look at our documentation, where you will find a quick start guide to synthetic data generation and tutorials to help you get started. Alternatively, you can also send an email to support@mostly.ai and we will get back to you. 

How to use an SQL table for synthesization?

In order to use an SQL table for synthesization with MOSTLY AI, you only need to create a connector to your database and enter your authentication details. Currently, MOSTLY AI allows you to connect to databases like DB2, MariaDB, MS SQL Server, MySQL, Oracle and PostgreSQL, with many more to come in the near future. If you want to connect to a local database (running on localhost), it will be necessary to create a data destination on your local machine to expose it to the Internet.

How can I get access to the paid version?

If you would like to use the proprietary version of MOSTLY AI's synthetic data platform - with advanced data connection capabilities for synthesizing complex data structures with referential integrity - please contact us.

Synthetic data generation has never been easier

MOSTLY AI's synthetic data generator offers an easy way to generate synthetic data with reliable results and built-in privacy mechanisms. Synthetic data generation is a must-have capability for building better and privacy safe machine learning models and to safely and easily collaborate with others on data projects involving sensitive customer data. Learn how to generate synthetic data to unlock a whole new world of data agility!