>

Resources

>

September 29, 2025

Data for Everyone: A Manifesto

Written by

For too long, data has spoken a language understood only by a few. True democratization means breaking these barriers. Data must be accessible to everyone. It must be safe, fair, and intelligent. Only then can we explore, question, and learn from it.

Democratization requires more than opening the gates. Without usability, access is incomplete. Without truth, access is misleading. Without fairness, access is unjust. That is why providing data alone is not enough.

Data must also be:

True: reflecting the complexity of the real world.
Private and safe: protecting individuals while unlocking collective value.
Fair: representing everyone and ensuring opportunity for all.
Intelligent: enabling natural interaction without expertise.

This is Data for Everyone. It is a vision not of exclusivity, but of empowerment; not of limits, but of possibilities.To make Data for Everyone concrete, five ingredients are essential: Actual Data, Synthetic Data, Simulated Data, Mock Data, and the AI Assistant. Some data already describes the world safely, some must be transformed, some imagined, some enriched. Together, they make knowledge accessible to all.

Actual Data: Learning from the World as It Is

Every dataset begins with reality. The transactions we make, the journeys we take, the records of health, climate, and society. Actual Data is the raw memory of the world as it truly happened.

Not all of it is sensitive. Some data can be shared openly without risk, like weather and climate records, traffic flows, or economic indicators. These descriptions capture the world without revealing private information, and they are already valuable for science, policy, and innovation.

Yet, too often, even openly shareable data, like climate records or traffic flows, remains locked away in silos, scattered across institutions, or hidden behind complex systems. Those who could benefit most are left unable to access or comprehend it.

To achieve Data for Everyone, Actual Data must be made accessible and understandable. It must protect individuals while serving the collective, and it must be presented in ways people can use directly, not only as input for synthetic data, but as knowledge in its own right.

When data cannot be shared safely, Synthetic Data carries the insight forward without the risk.

Synthetic Data: Making the Invisible Visible

Most of the world’s data remains hidden, locked in silos due to privacy and business concerns. What is visible today - public datasets, open benchmarks, and internal records - is only a fraction of what exists. Without a safe mechanism to share, knowledge and innovation are lost.

The pandemic showed the cost of this. Nations collected vast amounts of health data, yet rarely shared it. Privacy laws and national interests turned insights into silos, slowing collaboration and costing lives. Where safe sharing was possible, as in the UK’s RECOVERY trial, thousands of patients were saved.

Synthetic Data changes this. By recreating the statistical truths of private datasets without exposing individuals, it turns the unreachable into the usable. Banks can innovate without revealing customer secrets. Hospitals can share insights without breaching trust. Researchers can test bold ideas without ethical risk.

To be truly trustworthy, the models that generate synthetic data must also be open. Open source, and open source without restrictions. Models must be transparent, reproducible, and verifiable by the community. Anything less risks keeping power in the hands of a few. Only fully open foundations allow society to trust the data they generate and to progress beyond the status quo.

Synthetic Data brings hidden truths into the open while protecting individuals and sensitive information. And once generative intelligence has learned the structure of data, replication is only the beginning. It can generate alternative versions of the world, rooted in what we know, to explore what could be.

Simulated Data: Exploring Alternative Realities

Synthetic Data reveals what is hidden, but generative intelligence can go further. It can create alternative versions of the world, consistent with what we know yet never observed. This is not prediction. It is the construction of plausible scenarios rooted in truth, allowing us to ask what if without risk.

Take the marathon. From thousands of past races, we have data on splits, weather, endurance, and fatigue. Now someone is running her first. As her data is captured in real-time, a generative model can simulate hundreds of outcomes, including different paces, strategies, and conditions. It can show how the race might unfold, where fatigue could strike, and which pacing strategy will bring her to her goal. She is not just running. She is experiencing a spectrum of possible futures, guided by the knowledge of those who came before.

This is the essence of Simulated Data: foresight without fortune-telling, imagination grounded in truth. Policymakers can test reforms before enacting them. Businesses can stress-test strategies against rare conditions. Scientists can study dynamics too dangerous or unethical to reproduce in reality.

With Simulated Data, we are no longer bound to a single history. We can explore alternatives, learn from them, and prepare for challenges before they arrive. Yet imagination alone is not enough. Data must also be enriched, completed, and adapted. This is the role of Mock Data.

Mock Data: Enriching, Completing, Creating

Mock Data enriches and completes existing datasets. It adds edge cases, rare scenarios, and contextual detail that real or synthetic data may lack. And when no source exists, it can generate lifelike records from scratch, safe and ready for experimentation.

Consider a team designing a new digital health app. Before launch, they must test every flow, including onboarding, billing, and handling unusual patient histories. With Mock Data, they can simulate thousands of journeys without touching a single real medical record. Or take a bank preparing its next mobile platform. Developers can stress-test logins, transactions, and fraud scenarios using safe, LLM-enriched datasets that mimic reality without exposing customers.

Mock Data empowers builders. It turns testing into a creative process, where systems can be pushed to their limits without breaking trust. Innovation never stalls for lack of risk-free data.

Yet even with Actual, Synthetic, Simulated, and Mock Data, one barrier remains: complexity. Data, no matter how rich or safe, can still overwhelm. It takes expertise to navigate, analyze, and interpret. This is where the final ingredient enters: the AI Assistant.

AI Assistant: Making Data Human

Not everyone can code, query, or model. This is why the final ingredient is the AI Assistant: the voice that makes complexity simple, the guide that turns everyone into a data consumer.

The Assistant stands on top of all forms of data, transforming them into insight through natural interaction. With it, the vision of Data for Everyone is not just about access, but about true understanding.

Imagine a city planner facing an upcoming heatwave. She asks the Assistant: “How should we prepare?”

The Assistant does not start from scratch. It begins with Actual Data, the historical record of what has already happened. It then draws on Synthetic Data, safely mirroring patterns of past heatwaves such as granular-level energy consumption, hospital admissions, and demographic vulnerabilities. Next, it creates Simulated Data, alternative versions of the city under different scenarios: a five-day heatwave, a ten-day heatwave, and interventions applied earlier or later. To ensure resilience, it incorporates Mock Data, including rare but plausible events like sudden power outages or surges in emergency calls.

Finally, it translates all of this into clear, actionable guidance: “If cooling centers open two days earlier, hospitalizations could drop by 15%. If power grid reinforcement is delayed, outages become three times more likely. Here are three steps to minimize risk.”

The planner did not need to sift through datasets or build models. She simply asked, and she understood.

This is what the AI Assistant makes possible. It unifies the power of generative intelligence across all forms of data, actual, synthetic, simulated, and mock, executes analyses securely, and delivers insights in human language. For the first time, everyone, not just experts, can engage with data as naturally as having a conversation.The AI Assistant makes Data for Everyone human.

Conclusion: Data for Everyone

Data for Everyone is a belief that knowledge should not be confined to experts. It is a conviction that innovation should not be limited to those with privileged access. It is a demand that truth should not be restricted to silos.

With Actual Data, we see reality.

With Synthetic Data, we unlock truths.

With Simulated Data, we explore possibilities.

With Mock Data, we extend knowledge.

With the AI Assistant, we transform knowledge into action.

This is Data for Everyone.

This is our manifesto.