When creating a new data catalog, you’ll need to specify which data source it’s for. Using the Select a data connector screen shown below, you can select the server you want to connect to. They’re categorized into three tabs — Local storage, Cloud storage, and Databases.

Select a data connector

Once you click Proceed, MOSTLY AI will ask you to specify the location of your tables.

As generating synthetic data is all about protecting the privacy of your data subjects, MOSTLY AI needs to know whose privacy you’re going to protect. The first location you need to specify is therefore always that of a subject table. This table describes their profiles — a set of attributes that say something about your data subjects. Here you can think of names, places of residence, email addresses, birthdates, and other types of privacy-sensitive information.

  1. In the field labeled Specify path here, enter the directory where the subject table is stored.

    Specify the location of your tables

In addition, you can also choose to specify the location of a linked table. This allows you to process lists, sequential data, or time-series data. Here you can think of online shopping carts, buyer journeys, purchase histories, or financial transactions.

  1. Click on Add new table to add an linked table to your dataset.
    A second field appears where you can enter the location of your linked table.

    Specify the location of your tables

You can optionally change the names of your tables. These names are used to indicate which table you’re working on when configuring your job. They won’t be present in the resulting synthetic dataset.

  1. Click on the grey text field indicated by the pen icon and fill out the table name.

    Upload subject table