We had such a busy start to 2021! Our developers worked hard to deliver much anticipated new features to simplify our customers’ lives with faster, safer, and easier processes. A serious legal assessment was underway, while the MOSTLY AI team also made the SOC 2 certification happen. Microsoft, Telefónica, the City of Vienna, and many others have been using our synthetic data generation platform to make the most of their data assets, with Erste Group signing a 3-year partnership last month. An important piece of research was also born, proving that synthetic data for Explainable AI will be an important use case.
The feedback we have received so far makes it abundantly clear that AI-generated synthetic data is the way to go for large organizations looking to step up their data game. And the new version of our category-leading synthetic data generator, MOSTLY GENERATE 1.5 is the tool that provides the level of maturity, usability, and data quality that is crucial to scale synthetic data in an organization.
Watch the video announcement of the new release:
Legal support for synthetic data is part of the product upgrade
Privacy protection and data security have a special place in our hearts. We take this very seriously, and completing the SOC 2 certification is a very meaningful step for the team, reinforcing all that we stand for. SOC 2 assures our customers that we follow consistent security practices and that we are able to keep their valuable data always safe and protected through the implementation of standardized controls.
Another important way in which we support our customers’ legal teams is by providing a Data Protection Impact Assessment (DPIA) blueprint for MOSTLY GENERATE. This document, created in collaboration with the reputable law firm, Taylor Wessing will allow legal teams to demonstrate compliance to regulators easily.
Work faster and synthesize data easier
You can now use the Data Catalog to enable carefree automation of synthetic data pipelines and store links to data sources together with their configuration settings. Synthetization is now a one-click job.
Using the REST API, you can create fully automated synthetic data pipelines. You can easily integrate MOSTLY GENERATE with upstream ETL applications and downstream post-processing tools.
GPU accelerated synthetic data is like synthetic data with wings. Using the brand new GPU training option, you can now synthesize your sequential datasets in considerably less time, without any impact on synthetic data quality or privacy.
MOSTLY GENERATE 1.5 now natively supports Parquet files, enabling faster time-to-data, as converting to CSV is no longer necessary. From now on, you can save your encoding configurations as a JSON file and use your own tooling to generate configuration settings for datasets with a large number of columns.
Now there is also a turbo button for synthetic data generation: you can now choose to optimize model training for speed. It’s really fast and the resulting synthetic data is only a little less accurate. Great for use cases where speed is of utmost importance, but accuracy isn’t paramount, like creating realistic data for testing.
Stay safe with added synthetic data controls
MOSTLY GENERATE’s new User Management system allows you to securely control user access to data, run job details, and synthetic data generation features. Onboarding and offboarding employees is now a breeze. Users can log in using their Active Directory credentials.
You can now use stochastic rare category protection thresholds for categorical variables, which randomizes the decision of whether to include or exclude categories whose frequency in the data is very close to the inclusion threshold. This makes it now impossible to infer even the parameters of the rare category protection, adding an additional layer of protection for outliers and extreme values.
The consistency correction feature helps generate consistent historical sequences for your synthetic subjects when there is a large variety of values. Users can enable consistency correction per categorical column in their event table, and Admins can configure in the Global run settings whether Users can work with this feature.
A new encoding type: synthetic geolocation data
Due to popular demand, we are now supporting the synthetisation of geolocation data with latitude and longitude encoding types. It’s time to get those footprint datasets ready to work for you in a privacy-preserving way!
We would love to hear your feedback! If you are using MOSTLY GENERATE 1.5, please let us know what you think, as we continuously strive to build an even better product for you. If you are not yet our customer but are curious to find out how our synthetic data platform can increase the ROI of your data projects, contact us for a personalized demo!