💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
August 16, 2023
2m 38s

Getting Started with MOSTLY AI - Understanding Two Table Setups in Synthetic Data Generation


In this video, you will learn how to generate time-series data on MOSTLY AI's synthetic data platform. In order to synthesize time-series or sequential data, we first need to understand what is the two-table set up and how to upload your data and configure it in a two-table setting.

Get started with synthetic data generation for free here ➡️ https://bit.ly/43IGYSv


[00:00:01] Hi, everyone. In this video, I want to talk about the important concept of what we call two tables, one subject table, and one linked table. In order to create such a dataset, we actually need to provide two data tables to the platform.

[00:00:20] I'm uploading here one, an accounts table. Now here, I will choose Add table, and I will upload the second table that actually contains transactions. You can think of this as a simple data set, a banking dataset where we have one table that has information about account holders. It just contains an ID and gender and then with transactions.

[00:00:46] Here we have the type of transaction, the dates, the amounts, and so forth. Actually, this second table here, the transaction table, is linked to the account table by the accounts ID here. This accounts ID refers actually to this ID here in the accounts table to make clear who these transactions belong to.

[00:01:09] That means one account holder here, one subject in this account table, can actually have many, many transactions in this second table. This is what we call a two-table setup. It contains time information, sequential data, time-series data, and that could be for example, bank transactions, credit card transactions, but also policy claims data, health care records, everywhere where there's a time component basically attached to it.

[00:01:37] What we now need to do is we actually need to configure the platform to make sure that this is detected or synthesized as a two-table setup. First, we need to do is we need to define this as a primary key and we can pick whatever duration format we want. Now ID is defined as the primary key in the accounts table.

[00:01:56] Now we need to make sure that this accounts ID here is referenced as a foreign key. We go here into the Generation method and we pick Foreign key. Now we select here, Context key, and the Parent table, actually, in this case, is the accounts table,

[00:02:12] and there, yes. The Primary key ID is already pre-selected, so that's correct.

[00:02:18] We save this, and now here, we have the Generation method, Context Foreign key that's linking here back to the accounts table.

[00:02:27] You see also here, now how the color has changed. That's how we can then launch this job and synthesize a two-table setup.

[00:02:36] Thanks for watching.

Ready to try synthetic data?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.