🚀 Launching Synthetic Text to Unlock High-Value Proprietary Text Data
Read all about it here

Synthetic data generation for free forever, up to 5 credits per day

The best AI-powered synthetic data generator is available free of charge for up to 5 credits daily. Generate high-quality, privacy-safe synthetic versions of your datasets for ML, advanced analytics, software testing and data sharing.
Trusted by leading Enterprises
Citi logoCity of Vienna logoERSTE logoMerkur versicherung logoMerkur Innovation LabNvidia Inception Program logoTelefonica logo

MOSTLY AI Synthetic Data Platform Frequently Asked Questions

A number of Deep Learning model architectures and approaches have emerged to create synthetic data, including Transformers, GANs, Variational Autoencoders as well as Autoregressive Networks. We have and continue to actively research all of them, and have developed our own unique combination of techniques to provide the best possible results to our customers in terms of accuracy, privacy as well as flexibility. Our approach continues to outperform other solutions by a wide margin.
In order to use a database to load original data for synthesization with MOSTLY AI, you only need to create a connector to your database and enter your authentication details. Currently, MOSTLY AI allows you to connect to databases like BigQuery, MS SQL, MySQL, Oracle and PostgreSQL, etc.

In order to connect to a local database that's hosted on your machine, it will be necessary to expose it to the Internet. Data destinations on a local machine use localhost as the endpoint, which is the default name of the computer you are working on. If you want to expose your localhost to the Internet in order to accept connections, it would be necessary to use a tool like ngrok to allow external connections. If your goal is to connect a local database to MOSTLY AI, we have a tutorial on how to create a data destination on your local machine.
If you don’t want to touch your production dataset / database, or it is too big to handle easily, you can always make a copy or sample a part of it with the data you consider to be most important to synthesize. Sometimes, large databases contain multiple tables that are not important; this gives you the opportunity of sampling only the tables / data that you and your team consider relevant for synthesization.
We use a secure AWS cloud environment. We do not see or retain any of your uploaded data. Once the Generator has been created, the original dataset is deleted. MOSTLY AI is both ISO 27001 and SOC 2 Type 2 certified and we use industry best practices throughout our information security measures.
We recommend that your subject tables include more than 5,000 subjects. Even though there is no minimum number to start the training of a Generator, the more subjects available, the better the Generative AI algorithms can learn from the original data, which results in higher synthetic data quality. Our guide on Preparing your data for synthesization can give you more details on how to achieve the best results with MOSTLY AI.
We are there for you every step of the way throughout your synthetic data journey. If you would like to learn more about how to use our Synthetic Data Platform, take a look at our documentation, where you will find a quick start guide to synthetic data generation and tutorials to help you get started. Alternatively, you can also send an email to support@mostly.ai and we will get back to you. 
If you would like to use the paid version of MOSTLY AI's Synthetic Data Platform - with no credit limits and the possibility to install the Platform in your own compute environment - please contact us.

Synthetic data generation has never been easier

MOSTLY AI's Synthetic Data Generator offers an easy way to generate synthetic data with reliable results and built-in privacy mechanisms. Synthetic data generation is a must-have capability for building better and privacy safe machine learning models and to safely and easily collaborate with others on data projects involving sensitive customer data. Learn how to generate synthetic data to unlock a whole new world of data agility! 
magnifiercross