💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook

L1 distance

L1 distance is one of the possible distance measures between two probability distribution vectors and it is calculated as the sum of the absolute differences. The smaller the distance between the observed probability vectors, the higher the accuracy of the synthetic data.

L1 distance can be used to compare the empirical probability distributions of features in original and synthetic datasets. The smaller the L1 distance between the probability distributions of the two datasets, the more similar the datasets are, meaning the more accurate the synthetic data.

The accuracy of synthetic data can be assessed by measuring statistical distances between the synthetic and the original data. The metric of choice for the statistical distance is the total variation distance (TVD), which is calculated for the discretized empirical distributions. Subtracting the TVD from 100% then yields the reported accuracy measure. These are being calculated for all univariate and all bivariate distributions.

Ready to try synthetic data generation?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.