Use this guide to improve your database’s synthetic data accuracy. |
|
You will need a readily configured database catalog to complete this guide. |
|
It will take 15 mins to read this guide. |
How database synthesization works
During synthetization, MOSTLY AI distinguishes between two relationship types:
-
Relationships that are critical for privacy security and synthetic data accuracy.
MOSTLY AI classifies these as Context foreign key relationships. -
Relationships that are necessary for maintaining referential integrity.
MOSTLY AI classifies these as Smart Select foreign key relationships.
This distinction allows MOSTLY AI ensure that your synthetic database is accurate and privacy-secure, while maintaining the referential integrity of the original database.
To understand how this works, let’s take a look at the example Customers - Orders - Payments database shown below.

The Customers table is the subject table, and the Orders and Payments tables contain privacy-sensitive information about these customers. Because the Orders and Payments tables refer to these subjects, MOSTLY AI will assign them a Context foreign key relationship. This means that these tables will be synthesized in the following way:
-
First, the Customers table will be synthesized.
-
Then, the Orders table will be synthesized with the synthetic Customers table as its context.
-
Lastly, the Payments table will be synthesized, also with the synthetic Customers table as its context.
The resulting synthetic tables contain a fictional set of customers who placed fictional orders and made fictional payments, where the Customers - Orders and Customers - Payments relationships retained the original data’s statistical patterns, distributions, and correlations with the highest possible accuracy.

But what about the Orders - Payments Smart Select foreign key relationship? To maintain the referential integrity of the original database, MOSTLY AI must somehow link the new, fictional entries in the Orders table to the new, fictional entries in the Payments table.

The underlying algorithm that generates this relationship achieves this by mapping the characteristics of the relationship between the original referenced and referring tables. Using those characteristics, it will then find the appropriate matches between the entries in the synthetic referenced and referring tables.
A good analogy for the Smart Select algorithm is the job of a hiring manager. Let’s say that you have a list of all vacant positions in your company and a list of potential candidates for these jobs. A hiring manager would look at the requirements and expectations for each of these jobs and match the appropriate candidates based on their abilities, skills, work experience, and interests. For example, someone with a computer science degree and ten years of experience in software development would be a suitable candidate for a position in engineering. Similarly, someone with an extensive blogging portfolio, SEO skills, and web analytics knowledge could make a great contributor to a marketing team.
The Smart Select algorithm evaluates the referenced and referring tables of the synthetic database in a similar way:
-
First, you’ll need to help the algorithm a little bit and tell where it can find the "requirements and expectations" to match the entries in the synthetic referenced and referring tables. Your original referenced and referring tables, however, won’t state these things explicitly. In the UI, you can specify which columns the algorithm can look at to learn the correlations between the attributes in the original referenced and referring tables. We recommend selecting attributes that are suitable for the purpose. Work experience, abilities, and skills would be very relevant when matching candidates to vacant positions, while place of birth, age, and gender wouldn’t be at all.
-
Next, during the training of the synthetic data generation models, it will learn these correlations and use them to sketch an outline of how possible entries in the synthetic referenced and referring tables can be matched.
-
Once the synthetic versions of your table have been generated, this information is then used to select the appropriate entry in the synthetic referenced table for each entry in the synthetic referring table.
-
And lastly, the Smart Select algorithm populates the referring table’s foreign key column with the appropriate primary keys.
Reviewing the preconfigured relationships
To determine which linked tables have Smart Select foreign keys that point to other linked tables, click through all linked tables in the Table list
and check whether you can find such a foreign key in the column list. A column with these characteristics would have Smart Select foreign key → [name of another linked table]
as its Generation method
.

Configure Smart Select relationships by subject table
The easiest way to configure Smart Select columns is to configure them by subject table. This lets you configure all relationships between the table you selected and its referring tables at once.
To configure the subject table’s Smart Select columns, go to the Table settings tab and click Edit Smart Select.

A drawer appears where you can choose which columns are to be used as Smart Select columns.

-
Click Add row to select a column from the drop down menu. Each column you add will improve the accuracy with which MOSTLY AI can match the entries in this table and its referring tables.
-
Next, you can rank the selected Smart Select columns by order of importance by dragging them up or down. This will further improve the accuracy with which the relationship is rendered.
-
Once you have completed the configuration, click Apply to referring tables. It will then be applied to the Smart Select foreign keys of the referring tables.
-
If you want to verify whether your Smart Select configuration is applied to the referring tables, select one of them from the
Table list
and follow the steps in the section below.
Configure Smart Select relationships by column
You can also configure the Smart Select columns directly in the Column settings drawer of a Smart Select foreign key column. This will allow you to configure relationships that don’t have a subject table as a parent.
Go the to Data settings tab, select a linked table, locate the Smart Select foreign key you want to configure, and click on the cog icon.

A drawer appears where you can choose which columns are to be used as Smart Select columns.

-
Click Add row to select a column from the drop down menu. Each column you add will improve the accuracy with which MOSTLY AI can match the entries in this table and its referring tables.
-
Next, you can rank the selected Smart Select columns by order of importance by dragging them up or down. This will further improve the accuracy with which the relationship is rendered.
-
Click Save to save your configuration.