Creating a data catalog is only a slightly different procedure from configuring an ad hoc job. Whereas the synthetization parameters are identical, a data catalog needs to know where your original data is stored so that the synthetization job can be repeated or automated. The flowchart below gives an overview of the steps involved.

Data catalog workflow overview

In order for MOSTLY AI to know how to obtain your original dataset, you might need to configure a data connector. These connectors enable MOSTLY AI to log in and load data from various sources, such as cloud storage buckets and databases.

Once you configured this data connector, you can create a new data catalog, specify the path to the subject table and event table you want to synthesize, and link them. MOSTLY AI will then perform a full analysis of the data to determine the correct encoding types for this job. You can then review the resulting settings and save the data catalog for later use.