Data: The Public Infrastructure of the Future?

By

Por

Por Fernando Fernandez-Monge | Guest Signature*

Por

Fecha de publicación
9/6/25
Compartir

Data: The Public Infrastructure of the Future?

They may be, but their treatment depends on the objective being pursued.

A few days ago, at the conference GovTech4Impact in Madrid, I met good friends, and better experts: Dan Abadie, Aura Cifuentes, Mara Balestrini... Listening to them made me think of a question that has been haunting me for a long time: can we say that data is a public infrastructure?

The answer, as is often the case with big questions, is not a simple yes or no, but a “it depends”.

Starting with the basics, why does this question matter?

Not just out of a more or less academic interest. If we say, and decide, that certain data is public infrastructure, we must assume that rating has consequences:

  1. IInvestment and Shared Benefits: Infrastructure (such as roads or power grids) requires large initial investments. In return, they generate benefits that extend to all of society, although whoever builds them cannot always directly monetize all those benefits. If certain data is infrastructure, we may need public intervention to ensure that enough is invested in generating it.
  2. Reliability and Constant Availability: If we rely on certain data for the operation of essential services—let's think about the information on supply and demand of energy in real time that is needed to manage the electricity grid—we must ensure that they are reliable and always available.
  3. Equitable Access: As infrastructures are the foundation on which many other services and activities are built, it is essential that they be accessible (not necessarily open).

Some traits that make data comparable to infrastructure

Defining what an “infrastructure” is a challenge. Applying it to the digital world and adding the term “public” to it complicates it even more. Mariana Mazzucato, Dave Eaves and Beatriz Vaconcellos have developed these concepts in several articles (see for example hither and hither), and I recommend reading it to anyone who wants to delve deeper into the subject.

Here it is enough for us to say that there are at least five characteristics for certain data to work as an infrastructure:

Característica ¿Qué implica?
No rivalidad Su uso por una persona no impide que otras también lo usen al mismo tiempo (el mismo dato sirve a muchos).
Externalidades positivas Generan ventajas indirectas para muchos actores y para la sociedad en general.
Altos costes de inversión inicial Crearlos y mantenerlos requiere un esfuerzo económico y técnico considerable.
Riesgos sistémicos Si fallan o son incorrectos, pueden causar problemas graves en cadena.
Necesidad de neutralidad y confianza Deben ser gestionados de forma imparcial y generar confianza en su veracidad.

Let's think about the land registry or land registry. These databases meet all of these traits. For this reason, it is the Government that generally collects, manages and regulates access to this “single source of truth”. Although some parts of the management of these records can be outsourced, ultimate control over the data and its access rules is the government's responsibility.

Another example: the real-time location of public transport. Unlike the previous ones, this data emerges as a derivative product of the provision of the transport service, but it can also serve as a public infrastructure that benefits citizens, application developers and urban planners.

What about the data collected by companies? They can also be infrastructure

The issue is that not only data managed by the Government can have these characteristics. There are data sets collected by private companies that also have these traits.

Take the case of Experian, a credit reporting agency. Experian collects information on more than 1 billion people and companies, data that is then used by banks and other institutions to make credit decisions. Let's see if these data also have the 5 “infrastructural” features:

  • No rivalry: Multiple banks or lenders can check the same credit report without “spending” it.
  • Positive externalities: An accurate credit report reduces risk for financial institutions, can lower credit and speed up rental or hiring processes.
  • High investment costs: Building these records at the national level takes decades and requires complex and costly processes.
  • Systemic risk: A serious error in Experian's data could paralyze lending and affect the economy.
  • The need for neutrality and trust: The algorithms and processes of these companies must be transparent, auditable and non-discriminatory.

Precisely because of this need for neutrality and their “systemic” role, these data are subject to regulation. The Government intervenes through laws (such as Fair Credit Reporting Act in the US) that impose standards on their accuracy, auditability, non-discrimination and access, similar to what happens in other critical and regulated sectors of the economy.

When the essential thing is for data to flow

Sometimes, public value does not reside so much in a specific database (the “asset”), but rather in its fluid circulation between different entities. In these cases, the Government's objective is to facilitate this flow, establishing interoperability rules (so that systems “talk” to each other), standards and, sometimes, obligations to share data through APIs (protocols that allow different systems to communicate).

For what? To promote some benefits of this data exchange, such as innovation, or to prevent a few entities from hoarding all the data in a sector, limiting competition. This is the objective of Data Act European. In other cases, the circulation of such data facilitates public objectives such as better transport management. The new EU regulation on traffic and safety data in real time, it seeks precisely to guarantee this flow to improve the safety and sustainability of transport.

For data to flow, the Government can directly manage the technological layer (the “pipes” through which that data circulates), such as the famous Estonian X-Road platform or other data exchange mechanisms (Data exchanges). However, it is not always necessary for the Government to own these “pipes”. Using the analogy of Stack by Dave Eaves, we could imagine the data itself (data as an asset) in the database, and the flow of that data as an essential intermediate layer for connecting and generating value.

“Data Infrastructure” Doesn't Mean “Open Data”

An important clarification: Calling some data “public infrastructure” does not automatically imply that they should be freely accessible. We may want to protect certain data because of its critical role, but the decision about who accesses it and under what conditions requires considering other factors: the costs of producing and opening it, the risks to privacy, or the impact on competition and public value.

A data set may be infrastructural and have restricted access (such as power grid data), while other data may not be infrastructural and be completely open (such as aggregated data from shared bicycles in a city).

Different government roles for different objectives

Not all data is the same, so there is no single way to manage the assets or data flows that we consider to be public infrastructures.

The Government plays eight different roles, depending on the objectives being pursued. Let's imagine a picture with two axes:

  • Vertical Axis - Public Objective: What do we want to achieve? Maintain data accuracy? Facilitate your access? Ensure that there are common “pipes” for your circulation? Or keep those pipes working properly?
  • Horizontal Axis - Main Actor: Who should lead to achieve that goal? The Government directly, or a private actor under a regulatory framework?

¿Cuál es el objetivo público? Actor Principal – Sector Público
El Gobierno posee y/o gestiona el activo o la capa tecnológica
Actor Principal – Sector Privado
El Gobierno regula o contrata con el propietario u operador
Veracidad del activo (los datos) Establecer y gestionar un registro oficial. Ejemplo: El catastro y el registro de la propiedad mantienen actualizados los límites parcelarios, valores catastrales, títulos de propiedad y garantías reales. Establecer normas de auditoría de calidad de datos. Ejemplo: Los datos de las agencias de crédito son auditados según las leyes de crédito al consumo para verificar tasas de error y no discriminación.
Acceso al activo (los datos) Publicar bajo licencia de datos abiertos y con una API gratuita. Ejemplo: La Plataforma de Datos de Hamburgo publica microdatos de acceso abierto. Imponer obligaciones de acceso masivo o condiciones FRAND. Ejemplo: Normativas de la UE que exigen el intercambio de datos de Tráfico y Seguridad bajo condiciones FRAND.
Garantía del flujo de datos sin gestionar directamente las “tuberías” Invertir en el desarrollo y la gestión de estándares y APIs. Ejemplo: Promoción y reconocimiento gubernamental de los estándares NeTEx para asegurar el intercambio de datos de transporte público a través de Puntos de Acceso Nacionales. Establecer estándares de interoperabilidad y un mandato de API abierta. Ejemplo: Desarrollo de estándares de movilidad MDS y mandatos de intercambio de datos. Participación pública y privada a través de la Open Mobility Foundation.
Gestión directa de las 'tuberías' para garantizar el flujo de datos Gestionar una infraestructura pública. Ejemplo: La plataforma X-Road de Estonia garantiza la disponibilidad en todos los intercambios de datos entre organismos estatales. Designar como “servicio esencial” y exigir planes de redundancia. Ejemplo: Red SWIFT. Privada pero supervisada por bancos centrales y reguladores de ciberseguridad.

The position of a database or a flow of data in this “map” helps us to think more precisely how the Government should intervene.

So, is data a public infrastructure or not?

Yes, they can be. The important thing, however, is that defining certain data as a “public infrastructure” is only a first diagnosis. The key is to identify the treatment: what type of intervention (direct management or regulatory oversight) is needed. For this reason, it is necessary to be clear about the objective and where is the possible “traffic jam”: in the availability and quality of the data set itself, or in the “pipes” that allow its circulation and use?

*Guest signature: Fernando Fernandez-Monge

Senior Associate | Bloomberg-Harvard City Leadership Initiative

This article was originally published in English in Datapolis. Read the article here.

Tech and data

Get the best content on public digital transformation and govtech in Spanish.

Thank you so much for subscribing!
Something went wrong, please contact us by another means.