When creating a new data catalog, you’ll need to specify which data source it’s for. Using the
Select a data connector screen shown below, you can select the server you want to connect to. They’re categorized into three tabs —
Cloud storage, and
Once you click
Proceed, MOSTLY AI will ask you to specify the location of your tables.
As generating synthetic data is all about protecting the privacy of your data subjects, MOSTLY AI needs to know whose privacy you’re going to protect. The first location you need to specify is therefore always that of a subject table. This table describes their profiles — a set of attributes that say something about your data subjects. Here you can think of names, places of residence, email addresses, birthdates, and other types of privacy-sensitive information.
In the field labeled
Specify path here, enter the directory where the subject table is stored.
In addition, you can also choose to specify the location of a linked table. This allows you to process lists, sequential data, or time-series data. Here you can think of online shopping carts, buyer journeys, purchase histories, or financial transactions.
Add new tableto add an linked table to your dataset.
A second field appears where you can enter the location of your linked table.
You can optionally change the names of your tables. These names are used to indicate which table you’re working on when configuring your job. They won’t be present in the resulting synthetic dataset.
Click on the grey text field indicated by the pen icon and fill out the table name.