Test data management is a messy business, especially in complex enterprise environments riddled with decades old components, databases and systems. Today there are basically two approaches for generating test data. Either you use production data, or you make up some rule-based mock data.
Running tests with production data could be a way to get the job done, but it is certainly not a safe practice. Many companies then turn to legacy anonymization techniques - or even worse - simple de-identification. The privacy risks associated with these approaches are well documented today.
Rule-based mock data is no real solution either. Test engineers often don't have knowledge about the exact data schemas and have only a rough idea what the data is supposed to look like. Plus defining all the rules is a time-consuming and tedious task. Although from a privacy perspective this is a safe approach there is one last major disadvantage: mock data carries little statistical insights and can't be used for anything a little bit more sophisticated than simple testing.
But there is an alternative: synthetic test data!