Home insurance pricing was a risky business for our client. The insurance company catered to homes across the United States in areas with vastly different climate features and risk profiles. CCPA forbade the data science team to use the customers’ personal data, such as their addresses, in their modeling, so they could not assess risk and reflect that in their pricing.
Using synthetic home addresses eliminated the risk of re-identification and unlocked new insights. The team established a synthetization framework tailored to modeling based on privacy-risk classification and shortened time-to-data from 6 months to 3 days. The process kept 100% utility of the data, perfectly retaining the statistical dispersion of the original and providing an as-good-as real data alternative for training.