Concepts
Context processing in multi-table datasets

Context processing in multi-table schemas

MOSTLY AI provides powerful features for the training on and generation of multi-table datasets. For such datasets, the Sequential Context Processor (SCP) is a key feature that aids the generation of nested child tables with rich context from any surrounding tables. Thus, synthetic data that you generate with MOSTLY AI can fully retain the existing correlations between nested linked tables in complex schemas.

Example of context processing

To examine how sequential context processing works, let's consider a multi-table scenario with six tables, as listed below.

  • customers table
  • loans table
  • payments table
  • accounts table
  • cards table
  • transactions table

The diagram below illustrates the relationships in this table schema.

Six-table schema
💡

For ease of reading, the text below refers to generating records in the context of other records. However, the same rules of context apply when training MOSTLY AI generators.

So, when you read generated in the context of, bear in mind that trained on in the context of applies just as equally.

customers table

At the top of the hierarchy is the customers table. This table acts as the primary context for all child and grandchild tables that follow in the hierarchy. This is also known as a subject table.

Single table - no context

loans table

The loans table is the first child table in the hierarchy. Along with the parent customers table, both tables represent a two-table scenario, where the customers table is a subject table and the loans table is a linked table.

The records in the loans table are generated in the context of the parent records from the customers table.

Two tables - the context are the parent records

In addition, each record in the loans table is generated in the context of all same-table sibling records. Let's consider how the loans records are generated for the parent Lucia Garcia.

After the first loan record for Lucia Garcia is generated, that first record is then used as context for the second loan record of Lucia Garcia.

Same-table siblings context - generating the 2nd record

Afterwards, all previously generated sibling records (with parent Lucia Garcia) are passed as context for each new sibling to be generated.

Same-table siblings context - generating the 3rd record

To summarize, when MOSTLY AI generates a loan record, it does so in the context of:

  • parent customer record (Lucia Garcia)
  • same-table sibling loan records (the second Consumer loan is generated in the context of the first two loans Consumer and Mortgage of Lucia Garcia)
💡

For details on how to set up a two-table scenario in MOSTLY AI, see Two-table relationships.

payments table

The payments table is the first grandchild table in the hierarchy. It is a child to the loans table, and a grandchild to the customers table.

When MOSTLY AI generates a payment record, it does so in the context of:

  • the grandparent customer record (Lucia Garcia)
  • parent loan record (Consumer loan)
  • parent sibling loan records (the Mortgage and the two Consumer loans)
  • same-table sibling payment records (the payment records that belong to the same Consumer loan)

The context not used is as follows:

  • same-table cousin records (payment records that belong to other loan parent records)
Three tables - context are the parent and grandparent records

accounts table

The accounts table is the second child to the customers table and a sibling to the loans table. Just like the customers records, the accounts records are generated in the context of the customers records.

However, what the Sequential Context Processor includes is also the loans records as context. This means that every time an account record is generated, MOSTLY AI provides as context:

  • parent customer record (Lucia Garcia)
  • cross-table sibling loan records (the 2 Consumer loans and the Mortgage loan)
  • same table sibling account records (the Savings account is passed as context when generating the Checking account)

The context not used is as follows:

  • any records from the payments table
Three tables - context are the parent and cross-table siblings

cards table

The cards table is the second grandchild table in the hierarchy. It is a child to the accounts table and a grandchild to the customers table.

When MOSTLY AI generates a card record, it does so in the context of:

  • the grandparent customer record (Lucia Garcia)
  • the parent account record (the Savings account of Lucia Garcia)
  • all same-table parent sibling account records (the Checking account of Lucia Garcia)
  • all cross-table parent sibling loan records (the 2 Consumer and 1 Mortgage records)
  • all previously generated same-table sibling card records

The context not used is as follows:

  • all cross-table cousin records from the payments table (payments records whose parent loan record has as parent Lucia Garcia)
  • all same-table cousin records from the cards table (cards records that have another account as a parent whose parent is Lucia Garcia)
Three tables - context are the parent and cross-table siblings

transactions table

The generation of the transactions records occurs with the richest context compared to the rest of the tables.

When MOSTLY AI generates transaction records for anaccount (for example, the Savings account as shown in the diagram below), it does so in the context of:

  • the grandparent customer record (Lucia Garcia)
  • the parent account record (the Savings account that belongs to Lucia Garcia)
  • all same-table parent sibling account records (the Checking account that also belongs to Lucia Garcia)
  • all cross-table parent sibling loan records (the 2 Mortgage and 1 Consumer loans that belong to Lucia Garcia)
  • all cross-table sibling card records (the Debit and Credit cards that also belong to the same parent Savings account)
  • all previously generated same-table sibling transaction records

The context not used is as follows:

  • cross-table cousin records from the cards table (cards that have another account as parent whose parent is Lucia Garcia)
  • cross-table cousin records from the payments table (payments records whose parent loan record belongs to Lucia Garcia)
  • same-table cousin records from the transactions table (transaction records with a different account parent record that belongs to Lucia Garcia)
Three tables - context are the parent and cross-table siblings

Summary of context processing scenarios

The table below summarizes the types of records that can be passed as context by the Sequential Context Processor.

Records typesTABULARLANGUAGE
parentyesyes
grandparentyesyes
same-table siblingsyesyes
cross-table siblingsyesyes
same-table parent siblingsyesX
cross-table parent siblingsyesX
same-table cousinsXX
cross-table cousinsXX