Health insurance companies have long been in the front lines of data-driven decision making. From advanced analytics to AI and machine learning use cases in insurance applications, data is everywhere. Increasing the accessibility of these most valuable datasets is a business-critical mission, that is ripe for a synthetic data revolution. The Humana synthetic data sandbox is a prime example of how the development of data-centric products, such as those using AI and machine learning can be accelerated.

Humana, the third largest health insurance provider in the U.S. published a synthetic data exchange platform. The aim is to unleash new insights and bring advanced products to the market. The data exchange offers access to synthetic patient data with a total of 1 500 000 synthetic records that is representative of Humana’s member population.

The data challenge in health insurance

Data sharing in a highly regulated and sensitive environment is a hard, slow and often painful process. Legal and regulatory pressures make it impossible to collaborate with external vendors efficiently. What’s more, health insurance providers want to be good shepherds of the sensitive data their patients trust them with. The Humana synthetic data exchange allows product developers to run faster tests and learn and to deliver better value to its members. All of this, while keeping their personal healthcare information perfectly safe.

Healthcare data platforms are not only benefiting health insurance companies, but are used to accelerate research and policy making across the world.

Our favorite synthetic data innovation hub

To overcome challenges, Humana set up a synthetic data sandbox. Using these granular, high-quality synthetic datasets, the relationships between different variables of interest as well as the important context in which patient care takes place are preserved. Developers and data scientists can identify where the care journey has taken a synthetic individual, how they interacted with different sites of care and maintain the specificity of the data without being identifiable.

A synthetic data dashboard gives instant access to the data. Sample datasets can be downloaded for schema and data quality exploration. Plus a comprehensive data dictionary serves as documentation. The synthetic datasets provide data on demographics and coverage details, medical and pharmacy claims, dates, diagnosis, sites of care with maintained correlations and relationships throughout. This is exactly how a synthetic data sandbox should be.

The advantages of a synthetic data sandbox

By offering easy and safe access to high quality synthetic data, Humana gives developers the most important ingredient for successful product development. This granular, as-good-as-real source of knowledge is invaluable in identifying cohorts and improving customer experience. Proof of value is also much easier to come by. A solution’s benefits and accuracy, especially of machine learning applications, will only show themselves if the data is hyper-realistic.

New tools developed with realistic synthetic data assets allow the insurance provider to assess where and how to use them along the care journey. The Humana synthetic data sandbox allows product developers to work in a production-like data-environment without the security risks or lengthy access processes.

The Humana synthetic data sandbox provides realistic, granular data samples

Synthetic healthcare data is on the rise in Europe too

On the other side of the Atlantic, the European Union has published a research paper in which they generated synthetic versions of 2 million cancer patient's records. According to their assessment, the resulting synthetic data's accuracy (98.8%) makes it suitable for collaborative research projects across institutions and countries.

The synthetic patient data was also rebalanced during the synthesization process, making it represent minority groups better. This is crucial for training machine learning models, which might not be able to pick up on rare cancer types. The EU's Join Research Center expects that synthetic data will revolutionize medical AI by eliminating the data hurdle.

The Humana synthetic data blueprint for healthcare data management

AI-generated synthetic data is the tool enabling a wide variety of data-driven use cases in healthcare and health insurance - fighting cancer is only one of them. Humana’s Data Exchange offers real hope for the acceleration of health innovation. Without data, nothing is possible.

Humana is way ahead of others in their synthetic data journey and are already exploring ways to use synthetic data for ethical AI. Undoubtedly, this is one of the most exciting use cases of synthetic data, providing fairness, privacy and explainability to AI and machine learning models. Check out the Fair synthetic data and ethical AI in healthcare podcast episode, where Laura Mariano, Lead Ethical AI Data Scientist and Brent Sundheimer, Principal AI Architect at Humana explain how fair synthetic data helps them create fair predictions!