You can use this guide as a reference when configuring a job. Check out the View the job progress guide when running a job.
You will need a dataset or a readily configured catalog to complete this guide.
Feel free to download a ready-to-use dataset if you don’t have anything at hand.
It will take 30 mins to complete this guide.
You’ll promptly be sharing your synthetic data across your business and partnerships.
Upload a dataset or select a catalog
On the Jobs page, click Create synthetic data to begin.
A new page appears where you can upload your dataset or select a catalog with preconfigured data sources. Take the following actions to do so:
You can also upload tables that are partitioned over different files as long as they have the same schema.
Drag your subject table file(s) to the respective upload area or click to use your computer’s file browser. The linked table upload area becomes available once you’ve specified the subject table files.
Use the Table name fields to optionally change how your tables are called.
Click to upload your files to the MOSTLY AI server and continue to the job settings.
Catalog
Select the catalog you want to synthesize.
Click to select a data destination and start the job.
Or, click to review the catalog before starting the job.
Relationships
If there’s one or more subject tables and one or more linked tables in your data, then you can use this tab to specify how they’re linked.
When synthesizing uploaded files or a file-based catalog, MOSTLY AI will automatically link them if the subject table contains a column called id and the first column of the linked table contains _id in its name (for instance, players_id).
Please make sure your tables are correctly linked before proceeding.
If you selected a database catalog in the previous step with no or partially defined relationships, then you can use this tab to specify these.
Let’s take a look at the options that are available to manage the relationships:
Table list
This list shows all the tables that will appear in your synthetic data.
They’re sorted by table type. The subject table are at the top, the linked table in the middle,
and the reference tables at the bottom.
Click on a table to open the relationship drawer and edit its primary and foreign keys.
Referenced tables
This part shows which tables are referenced by the tables in the table list.
Clicking on the row opens the relationship drawer of the table in the table list.
Filter
Filter the relationships view by subject tables, linked tables, reference tables or tables without relations.
Add, modify, or delete relationships
Hovering over a row reveals the following options:
You can use the bulk editor to configure multiple columns at once.
Tick the checkboxes of the columns you want to configure and use the settings fields in the top row to adjust them.
Training settings
Use these settings to specify whether AI model training needs to be done quickly or accurately.
You can also optimize training performance if the results of an earlier job were not of the desired accuracy or took too long to generate.
Let’s take a look at the options on this page:
Table list
This list shows all the tables that will appear in your synthetic data.
Click on a table to view or modify its training settings.
Training settings
The following training settings are available:
Training goal
Select Accuracy to achieve the highest attainable synthetic data accuracy.
Or Speed to deliver accurate synthetic data using significantly shorter training times.
Maximum epochs
This setting allows you to limit the numbers of epochs to, for instance, 2, 5, or 10. This can significantly reduce training time, but comes at the cost of accuracy.
Model size
Adjust the model size if the synthesization job runs into memory issues, takes too long to complete, or produces synthetic data with less than the desired accuracy. Smaller sizes require less memory, run faster, and reduce synthetic data accuracy, whereas bigger sizes increase accuracy, require more memory, and take more time to complete.
Batch size
Batch size refers to the number of records used for each training step. Selecting a larger batch size can speed up training, but consumes more memory and can decrease accuracy.
If you get out of memory errors during training, then you can try to resolve it by decreasing the batch size.
Click to open the bulk editor.
You can use the bulk editor to configure multiple tables at once.
Tick the checkboxes of the tables you want to configure and use the settings fields in the top row to adjust them.
Click to improve synthetic data accuracy of databases.
To maintain referential integrity, MOSTLY AI needs to make matches between the entries of the referenced and referring tables of Smart Select relationships. By default, these are randomly linked—the foreign key column will be populated with randomly drawn ID’s from the primary key.
You can change this behavior by designating one or more columns of the parent table in a relationship as Smart Select columns. MOSTLY AI can then use these attributes to find appropriate matches with the entries in the referring table of a relationship. This will result in a more accurate rendering of these relationships in the synthetic database.
Click and select a suitable column from the drop-down menu.
Drag to rank the columns by importance.
Click to completed the configuration. They will be applied to the Smart Select foreign keys of the referring tables.
Output
Select a data destination
Choose a destination from the drop-down menu.
You can always download the synthetic data as CSV or Parquet files.
Optionally specify the size of the synthetic data
Specifying the number of generated subjects will determine the size of the synthetic data.
If you leave these fields blank, MOSTLY AI will use the same number of subjects during training and generation as the original dataset.
We use third-party web analytics tools to analyze website usage and measure the success of advertising campaigns. Cookies are set in the process and data is partly transferred to the USA. Further details can be found in our privacy policy.You can revoke or adjust your selection at any time under Settings.
Here you will find an overview of all cookies used. You can give your consent to whole categories or display further information and select certain cookies.