Guides
Catalogs
Create a catalog

Create a catalog

To synthesize data from relational databases or from datasets that you host on cloud object storage, you create a catalog in MOSTLY AI.

You can create two types of catalogs.

Create a database catalog

With a database catalog, you can create a pre-configured synthetic dataset that you can use to synthesize data from database tables.

Prerequisites

Create a database connector to use as a data source.

Steps

  1. In MOSTLY AI, select the Catalogs tab. Select the Catalogs tab
  2. Click Create catalog. Catalogs tab - click Create catalog
  3. From the left, select Database.
  4. Select an existing database connector and click Next. Catalogs tab - click Create catalog Step result: MOSTLY AI creates an empty catalog which has the same name as the selected cloud storage connector. At this point, the catalog is not saved yet.
  5. Add tables to the new catalog.
    1. Click Add table. New catalog - click Add table Step result: The Select a database table drawer opens.
    2. Select a database table from the drop-down list.
      💡

      Tip
      Type in the box to filter the tables by name.

      Get started with cloud storage datasets - click Start job
    3. (Optional) To add all related database tables, select the Include child tables checkbox. Add table - select database table MOSTLY AI will automatically add any and all related database tables to the catalog automatically.
    4. Click Proceed. Add table - select database table Step result: The table is now added to the catalog and the catalog is saved.

      By default, all tables are initially added as subject tables.

    5. Repeat steps 5a-c to add more database tables to the catalog.
  6. (Optional) Configure the table, data, and output settings for the catalog. For more information, see Configure catalogs.

Result

Your database catalog saves automatically as soon as you add the first table. From then on, it is available in the catalogs list in the Catalogs tab.

By default, the added tables appear as subject tables in the catalog. You can configure the table relationships to define the linked tables and the Context and Smart Select foreign keys.

Create a cloud storage catalog

With a cloud storage catalog, you can create a pre-configured synthetic dataset that can synthesize data from tables that you host on cloud object storage buckets.

Prerequisites

  • Create a cloud storage connector.
  • If you want to synthesize two or multiple tables, make sure that all tables are available in the cloud bucket for which you have a connector.

Steps

  1. In MOSTLY AI, select the Catalogs tab. Select the Catalogs tab
  2. Click Create catalog. Catalogs tab - click Create catalog
  3. From the left, select Cloud storage.
  4. Select an existing cloud storage connector and click Next. Catalogs tab - select Cloud storage Step result: MOSTLY AI creates an empty catalog which has the same name as the selected cloud storage connector. At this point, the catalog is not saved yet.
  5. Add tables to the new catalog.
    1. Click Add table. New cloud storage catalog - click Add table Step result: The Specify table path drawer opens.
    2. For Table path, specify the bucket path to the table you want to add.
    3. For Table name, specify the name of the table as you want it to appear in the catalog and in the generated synthetic data.
    4. Click Proceed. Add table - specify table path and name Step result: The table is now added to the catalog and the catalog is saved.

      By default, all tables are initially added as subject tables.

    5. Repeat steps 5a-d to add more tables to the catalog.
  6. (Optional) Configure the table, data, and output settings for the catalog. For more information, see Configure catalogs.

Result

Your catalog saves automatically as soon as you add the first table. From then on, it is available in the catalogs list in the Catalogs tab.

By default, the added tables appear as subject tables in the catalog.

What's next

With a database or cloud storage catalog created, you can now configure the catalog.

When you open the catalog, you can start the configuration of a new synthetic dataset by clicking the Next button. You can then also configure a data destination for the synthetic dataset.