The pitfall of classic anonymization techniques is that they mask or obfuscate only parts of the data while leaving everything else intact. But in the era of big data, there is no non-sensitive attribute – and leaving information intact provides a target for adversaries to perform de-anonymization attacks.
Synthesizing data, on the other hand, is a fundamentally different approach to big data anonymization. Instead of changing an existing dataset, a deep neural network automatically learns all the structures and patterns in the actual data. Once this training is completed, the model leverages the obtained knowledge to generate new synthetic data from scratch. This artificially generated data is highly representative, yet completely anonymous. As it does not contain any one-to-one relationships to actual data subjects, the risk of re-identification is successfully eliminated.