Create a catalog
To synthesize data from relational databases or from datasets that you host on cloud object storage, you create a catalog in MOSTLY AI.
You can create two types of catalogs.
Create a database catalog
With a database catalog, you can create a pre-configured synthetic dataset that you can use to synthesize data from database tables.
Prerequisites
Create a database connector to use as a data source.
Steps
- In MOSTLY AI, select the Catalogs tab.
- Click Create catalog.
- From the left, select Database.
- Select an existing database connector and click Next.
Step result: MOSTLY AI creates an empty catalog which has the same name as the selected cloud storage connector. At this point, the catalog is not saved yet.
- Add tables to the new catalog.
- Click Add table.
Step result: The Select a database table drawer opens.
- Select a database table from the drop-down list.
💡
Tip
Type in the box to filter the tables by name. - (Optional) To add all related database tables, select the Include child tables checkbox.
MOSTLY AI will automatically add any and all related database tables to the catalog automatically.
- Click Proceed.
Step result: The table is now added to the catalog and the catalog is saved.
By default, all tables are initially added as subject tables.
- Repeat steps 5a-c to add more database tables to the catalog.
- Click Add table.
- (Optional) Configure the table, data, and output settings for the catalog. For more information, see Configure catalogs.
Result
Your database catalog saves automatically as soon as you add the first table. From then on, it is available in the catalogs list in the Catalogs tab.
By default, the added tables appear as subject tables in the catalog. You can configure the table relationships to define the linked tables and the Context and Smart Select foreign keys.
Create a cloud storage catalog
With a cloud storage catalog, you can create a pre-configured synthetic dataset that can synthesize data from tables that you host on cloud object storage buckets.
Prerequisites
- Create a cloud storage connector.
- If you want to synthesize two or multiple tables, make sure that all tables are available in the cloud bucket for which you have a connector.
Steps
- In MOSTLY AI, select the Catalogs tab.
- Click Create catalog.
- From the left, select Cloud storage.
- Select an existing cloud storage connector and click Next.
Step result: MOSTLY AI creates an empty catalog which has the same name as the selected cloud storage connector. At this point, the catalog is not saved yet.
- Add tables to the new catalog.
- Click Add table.
Step result: The Specify table path drawer opens.
- For Table path, specify the bucket path to the table you want to add.
- For Table name, specify the name of the table as you want it to appear in the catalog and in the generated synthetic data.
- Click Proceed.
Step result: The table is now added to the catalog and the catalog is saved.
By default, all tables are initially added as subject tables.
- Repeat steps 5a-d to add more tables to the catalog.
- Click Add table.
- (Optional) Configure the table, data, and output settings for the catalog. For more information, see Configure catalogs.
Result
Your catalog saves automatically as soon as you add the first table. From then on, it is available in the catalogs list in the Catalogs tab.
By default, the added tables appear as subject tables in the catalog.
What's next
With a database or cloud storage catalog created, you can now configure the catalog.
When you open the catalog, you can start the configuration of a new synthetic dataset by clicking the Next button. You can then also configure a data destination for the synthetic dataset.