
Data: The Public Infrastructure of the Future?
They may be, but their treatment depends on the objective being pursued.
A few days ago, at the conference GovTech4Impact in Madrid, I met good friends, and better experts: Dan Abadie, Aura Cifuentes, Mara Balestrini... Listening to them made me think of a question that has been haunting me for a long time: can we say that data is a public infrastructure?
The answer, as is often the case with big questions, is not a simple yes or no, but a “it depends”.
Starting with the basics, why does this question matter?
Not just out of a more or less academic interest. If we say, and decide, that certain data is public infrastructure, we must assume that rating has consequences:
- IInvestment and Shared Benefits: Infrastructure (such as roads or power grids) requires large initial investments. In return, they generate benefits that extend to all of society, although whoever builds them cannot always directly monetize all those benefits. If certain data is infrastructure, we may need public intervention to ensure that enough is invested in generating it.
- Reliability and Constant Availability: If we rely on certain data for the operation of essential services—let's think about the information on supply and demand of energy in real time that is needed to manage the electricity grid—we must ensure that they are reliable and always available.
- Equitable Access: As infrastructures are the foundation on which many other services and activities are built, it is essential that they be accessible (not necessarily open).
Some traits that make data comparable to infrastructure
Defining what an “infrastructure” is a challenge. Applying it to the digital world and adding the term “public” to it complicates it even more. Mariana Mazzucato, Dave Eaves and Beatriz Vaconcellos have developed these concepts in several articles (see for example hither and hither), and I recommend reading it to anyone who wants to delve deeper into the subject.
Here it is enough for us to say that there are at least five characteristics for certain data to work as an infrastructure:
Let's think about the land registry or land registry. These databases meet all of these traits. For this reason, it is the Government that generally collects, manages and regulates access to this “single source of truth”. Although some parts of the management of these records can be outsourced, ultimate control over the data and its access rules is the government's responsibility.
Another example: the real-time location of public transport. Unlike the previous ones, this data emerges as a derivative product of the provision of the transport service, but it can also serve as a public infrastructure that benefits citizens, application developers and urban planners.
What about the data collected by companies? They can also be infrastructure
The issue is that not only data managed by the Government can have these characteristics. There are data sets collected by private companies that also have these traits.
Take the case of Experian, a credit reporting agency. Experian collects information on more than 1 billion people and companies, data that is then used by banks and other institutions to make credit decisions. Let's see if these data also have the 5 “infrastructural” features:
- No rivalry: Multiple banks or lenders can check the same credit report without “spending” it.
- Positive externalities: An accurate credit report reduces risk for financial institutions, can lower credit and speed up rental or hiring processes.
- High investment costs: Building these records at the national level takes decades and requires complex and costly processes.
- Systemic risk: A serious error in Experian's data could paralyze lending and affect the economy.
- The need for neutrality and trust: The algorithms and processes of these companies must be transparent, auditable and non-discriminatory.
Precisely because of this need for neutrality and their “systemic” role, these data are subject to regulation. The Government intervenes through laws (such as Fair Credit Reporting Act in the US) that impose standards on their accuracy, auditability, non-discrimination and access, similar to what happens in other critical and regulated sectors of the economy.
When the essential thing is for data to flow
Sometimes, public value does not reside so much in a specific database (the “asset”), but rather in its fluid circulation between different entities. In these cases, the Government's objective is to facilitate this flow, establishing interoperability rules (so that systems “talk” to each other), standards and, sometimes, obligations to share data through APIs (protocols that allow different systems to communicate).
For what? To promote some benefits of this data exchange, such as innovation, or to prevent a few entities from hoarding all the data in a sector, limiting competition. This is the objective of Data Act European. In other cases, the circulation of such data facilitates public objectives such as better transport management. The new EU regulation on traffic and safety data in real time, it seeks precisely to guarantee this flow to improve the safety and sustainability of transport.
For data to flow, the Government can directly manage the technological layer (the “pipes” through which that data circulates), such as the famous Estonian X-Road platform or other data exchange mechanisms (Data exchanges). However, it is not always necessary for the Government to own these “pipes”. Using the analogy of Stack by Dave Eaves, we could imagine the data itself (data as an asset) in the database, and the flow of that data as an essential intermediate layer for connecting and generating value.

“Data Infrastructure” Doesn't Mean “Open Data”
An important clarification: Calling some data “public infrastructure” does not automatically imply that they should be freely accessible. We may want to protect certain data because of its critical role, but the decision about who accesses it and under what conditions requires considering other factors: the costs of producing and opening it, the risks to privacy, or the impact on competition and public value.
A data set may be infrastructural and have restricted access (such as power grid data), while other data may not be infrastructural and be completely open (such as aggregated data from shared bicycles in a city).
Different government roles for different objectives
Not all data is the same, so there is no single way to manage the assets or data flows that we consider to be public infrastructures.
The Government plays eight different roles, depending on the objectives being pursued. Let's imagine a picture with two axes:
- Vertical Axis - Public Objective: What do we want to achieve? Maintain data accuracy? Facilitate your access? Ensure that there are common “pipes” for your circulation? Or keep those pipes working properly?
- Horizontal Axis - Main Actor: Who should lead to achieve that goal? The Government directly, or a private actor under a regulatory framework?
The position of a database or a flow of data in this “map” helps us to think more precisely how the Government should intervene.
So, is data a public infrastructure or not?
Yes, they can be. The important thing, however, is that defining certain data as a “public infrastructure” is only a first diagnosis. The key is to identify the treatment: what type of intervention (direct management or regulatory oversight) is needed. For this reason, it is necessary to be clear about the objective and where is the possible “traffic jam”: in the availability and quality of the data set itself, or in the “pipes” that allow its circulation and use?
*Guest signature: Fernando Fernandez-Monge
Senior Associate | Bloomberg-Harvard City Leadership Initiative
This article was originally published in English in Datapolis. Read the article here.