The workflow for creating a data catalog for databases is largely an automated process. As databases already contain descriptions of their tables and relationships in their schemas, MOSTLY AI just needs to read them and configure the synthetic data generation job accordingly.

However, to correctly secure the privacy of your data subjects, it does need to know which tables contain their profiles. Once MOSTLY AI has read the database schema, a screen will appear where you can classify these tables as subject tables. It will then determine the appropriate job configuration, which you can review and optionally configure the Smart Select relationships of your database. You can then save the resulting data catalog and start the synthetization job.

The flowchart below gives an overview of all the steps involved.

Data catalog workflow overview
Before you can start creating a data catalog for your database, please keep in mind that you need to have at least two data connectors configured, one for the source database and another for the destination database.

Once the synthetic data generation job starts, the destination database will be fully erased and rewritten with the synthetic version of the source database.