Multi-table relationships

In a multi-table scenario, where you add 3 or more tables to a generator, you can have multiple tables with multiple relationships to other tables. For example, consider the Berka dataset.

MOSTLY AI - Multi-table relationships - Berka table schema

When you configure a multi-table generator, the most important task is to define the table relationships. The relationships define how the generator trains on your original data and how it retains the correlations and the referential integrity of your original data.

Context and non-context relationships

With a context relationship, your generator can fully retain the correlations and maintain the referential integrity between two tables. A non-context relationship maintains only the referential integrity. A table can have only one context relationship. Any additional relationships can be only non-context.

You define a relationship as context when you want the data in a table (child) to be generated in the context of the table it references (parent). Context relationships also help to fully retain the correlations across a grandparent - parent - child table hierarchy. You can again consider the Berka dataset.

Due to the context relationships, the correlations among account - disposition - card hierarchy are fully retained. The non-context relationship between client and disposition only maintains the referential integrity. You can switch this to a context relationship, but then the generator loses the opportunity to retain the hierarchical correlations among account - disposition - card.

MOSTLY AI - Multi-table relationships - Berka relationship hierarchy

Decide: context or non-context

  • If a linked table has a single foreign key to another table, set that as context.
  • If a table has multiple foreign keys, prioritize as context the table with which you want to retain the correlations. Leave the remaining foreign keys as non-context.

Example: Berka relationships

For an example on how to set the table relationships in a multi-table dataset, you can use the Berka dataset. For reference, you can use the table schema from the illustrations above.

From the web interface, you can define each relationship with the configuration of a foreign key between two tables. For reference, use the diagrams above.

Steps

  1. Download the Berka dataset and add all tables to a new generator. MOSTLY AI - Multi-table relationships - add Berka tables
  2. Define foreign keys for all linked tables that refer the account table.
    1. Set a context foreign key from the loan table (account_id column) to the account table. MOSTLY AI - Multi-table relationships - Set foreign key for loan table
    2. Set a context foreign key from the order table (account_id column) to the account table. MOSTLY AI - Multi-table relationships - Set foreign key for order table
    3. Set a context foreign key from the transaction table (account_id column) to the account table. MOSTLY AI - Multi-table relationships - Set foreign key for transaction table
    4. Set two foreign keys from the disp table: one to the account table, and another to the client table.
      💡

      Only one of the foreign keys can be context. Set as context the relationship for which you want to fully retain the correlations between the two tables.

      MOSTLY AI - Multi-table relationships - Set foreign keys for disp table
    5. Set a context foreign key from the card table (disp_id column) to the disp table. MOSTLY AI - Multi-table relationships - Set foreign keys for card table
  3. Set the primary keys for each table that doesn't already have one.
    1. Click the Primary key column for a table and select its primary key column.
    2. Repeat for each table. MOSTLY AI - Multi-table relationships - Set foreign keys for card table

Result

With all required foreign keys set, the Data configuration page provides a hierarchical overview of the table relationships and lists all primary keys and foreign keys

MOSTLY AI - Multi-table relationships - Set foreign keys for card table

You can now take a look at the relationship diagram.

Exception: Child to grandparent relationship

If you have a hierarchy of three tables where a child table references both the grandparent and the parent, you might run into issues during generator training.

The solution is to exclude the grandparent foreign key column of the child table from training and propagate the column as a post-processing step after you generate a synthetic dataset.

MOSTLY AI - Multi-table relationships - Set foreign keys for card table