Fine-tune privacy mechanisms

Fine-tune privacy mechanisms

You can enable or disable the value protection for each table in your generator. When enabled, the Value protection setting replaces rare categories or removes numeric and date-time outliers from your original dataset before generator training.

Value protection for categorical columns

When enabled, Value protection replaces categories that occur rarely in a column and can, therefore, pose a re-identification risk. For example, the category of President of the United States in a Job title column will likely occur only once in that column and will be replaced. The replacement depends on the selected method.

  • Constant. The method maintains the original distribution of categories but replaces any rare categories with the _RARE_ token in synthetic datasets.
  • Sample. The method replaces rare categories with one of the other categories in the table. This can lead to a skewed distribution of categories in synthetic datasets.

Value protection for numeric and datetime columns

When enabled, value protection removes the minimum and maximum outliers that might make it possible to re-identify the subject they belong to.

💡

Privacy protection is built-in the design of the Generative AI models of MOSTLY AI. For example, data overfitting and extreme sequence length protection are always enabled.

For information about all privacy-protection mechanisms, see Privacy-protection mechanisms.

Set value protection

You can control whether Value protection is on or off for each table in a generator.

You can switch on the Value protection for a table from the Model configuration page.

Steps

  1. With a generator open, click Configure models in the upper right.
  2. On the Model configuration page, configure the privacy-protection mechanism.
    1. Under Value protection, enable or disable value protection with the On and Off buttons.

      This enables both the rare category protection and extreme numeric and datetime value protection mechanisms.

    2. If Value protection is enabled, define how to replace rare categories.
      • Constant. Replaces rare categories with the _RARE_ token and maintains the original distribution in categorical columns.
      • Sample. Replaces rare categories with one of the other non-rare categories in the column. This can skew the original distribution in categorical columns. MOSTLY AI - Generator configuration - Define privacy mechanisms