- Data anonymization is a task that is increasingly challenging to meet. Anonymizing data successfully is practically impossible for complex datasets. As few as 15 characteristics are enough to reidentify people in the US.Â
- Protecting this data from leaking is also impossible if companies rely only on protecting it from attacks from the outside when 59% of privacy incidents originate with an organization's own employees.Â
- According to a financial data risk report, new hires at financial institutions - organizations with the highest level of data restriction - have unrestricted access to 11 million files on their first day of work.
- Old data anonymization tools, like randomization, permutation, and generalization, destroy feature correlations and render the data useless for training machine learning models.
- Pseudonymization or data masking is not data anonymization either from a legal or practical standpoint. Pseudonymized data is still personal data and is very vulnerable to privacy attacks.