đź’ˇ Download the complete guide to AI-generated synthetic data!
Go to the ebook

Predictive analytics in healthcare benefits from synthetic data generation

Download the case study
John Sullivan Customer Success Lead MOSTLY AI


in prediction accuracy


costs saved


synthetic data accuracy

Challenges of predictive analytics in healthcare

Prediction in healthcare must come with laser-sharp accuracy. CAR-T therapy is an immunotherapy that is capable of treating certain types of blood cancer. This therapy can be very effective, capable of achieving long lasting remission or even cure, it can come with serious neurological side effects. Accurately predicting which patients would benefit from CAR-T therapy is an extremely important mission, where even a small increase in true positives could mean lives saved. Since only a small percentage of people can benefit from the therapy, machine learning models used for prediction of the utilizers have only limited data to learn from.

The synthetic data solution for predictive analytics

Synthetic data generation was used to upsample minority classes in the training data for the machine learning model, where data subjects were split 50-50 between two groups - those who could utilize the therapy and those who wouldn’t. Using synthetic data generation to upsample minority classes in training data can be a highly effective approach for improving the performance of machine learning models, particularly in cases where there is a high class imbalance in the data. By generating additional data points for the minority class, the model becomes better trained and more accurate in its predictions, which can ultimately lead to better decision-making and outcomes.

Proof of results using synthetic traning data

The machine learning models trained on synthetic data outperformed the target by +2-3%. The overall accuracy remained high even after rebalancing: close to 98%. As the result of the improved ML performance, new patients were identified who could benefit from the therapy. Besides the improved health outcomes for these people, an estimated  $8M+ savings in cost was achieved. Privacy checks were also passes and as a result, the training of the machine learning model was carried out without a data privacy risk.