Data imputation

Data imputation

You can fill any null values you might have in your original data with statistically reliable values. To do so, use the Imputed columns setting when you configure a new synthetic dataset.

Prerequisites

To use Imputed columns, the model responsible for the table containing missing or null values must have flexible generation enabled.

Add imputed columns to a synthetic dataset

When you configure a new synthetic dataset, you can specify the columns you might want to impute.

If you use the web application, you can select which columns to impute from the Imputed columns menu on the Configure Synthetic Dataset page.

💡

In this example, you can see a demonstration of a generator trained on the CDNOW dataset that has Flexible generation enabled. The age_category column from the customers table contains 2.5% of null values or 557 null values out of 23,570 rows.

Steps

  1. Open the generator you want to use.
  2. Click Generate synthetic data in the upper right.
  3. Expand the generation settings for a table.
  4. Click inside the Imputed columns box and select a column to impute. MOSTLY AI - Synthetic datasets - Impute columns
  5. (Optional) Add more columns to impute if you want.
  6. Click Start generation.

Result

When you examine the age_category column in the customers of the generated data, no null values appear.