A: Synthetic data is artificially generated information that mimics the statistical properties, patterns, and relationships of real data without containing any actual sensitive information. Think of it as creating realistic "fake" data that behaves like the real thing.
For example, in the Panenka project, we generated thousands of player names and faces that look and feel authentic, but none of them are real people. The names sound culturally appropriate for their countries of origin, the faces look photorealistic and age naturally over time, yet no actual person's data or likeness was used.
The key difference from simple "dummy data" or "mock data" is sophistication: synthetic data preserves complex relationships, statistical distributions, and realistic variations that make it suitable for training AI models, thorough testing, and meaningful analysis, not just filling database fields with random values.