>

L1 distance

L1 distance is one of the possible distance measures between two probability distribution vectors and it is calculated as the sum of the absolute differences. The smaller the distance between the observed probability vectors, the higher the accuracy of the synthetic data.

L1 distance can be used to compare the empirical probability distributions of features in original and synthetic datasets. The smaller the L1 distance between the probability distributions of the two datasets, the more similar the datasets are, meaning the more accurate the synthetic data.

The accuracy of synthetic data can be assessed by measuring statistical distances between the synthetic and the original data. The metric of choice for the statistical distance is the total variation distance (TVD), which is calculated for the discretized empirical distributions. Subtracting the TVD from 100% then yields the reported accuracy measure. These are being calculated for all univariate and all bivariate distributions.

TVD is normalized L1 distance (i.e. 1/2 L1 distance), i.e. in the formula above it would be TVD(p, q) = 1/2 ∑ |pᵢ - qᵢ|