Key Points

Introduction

Interoperability ensures that data can be understood, combined, accessed, and reused across tools, institutions, and workflows with minimal manual intervention.
Interoperability operates at three complementary layers:structural (how data is encoded and organized),semantic (how data is described and interpreted), and technical (how data is accessed and exchanged).
The FAIR interoperability principles I1–I3 primarily address the semantic layer. They provide essential guidance on shared metadata languages, vocabularies, and references, but they do not fully cover structural and technical interoperability.
In climate and atmospheric science, all three layers are required for practical reuse. Structural standards (e.g., NetCDF, Zarr), semantic conventions (e.g., CF), and technical mechanisms (e.g., APIs, OPeNDAP, THREDDS) must work together.
Many real-world barriers to reuse datasets (unclear metadata, missing units, inconsistent coordinate systems, incompatible file formats, unstable access mechanisms) are failures of one or more interoperability layers.
Interoperable research workflows rely on established community formats, standardized metadata conventions, stable access protocols, and scalable cloud-native layouts that allow large heterogeneous datasets to be aligned, streamed, and analysed consistently.
Interoperability is essential in climate science because datasets come from diverse sources (models, satellites, sensors, reanalysis) and must be combined into integrated analyses that are reproducible and machine-actionable.

Structural interoperability

Structural interoperability is a shared contract about how data objects are organised, encoded, related, typed, and validated.
A file extension alone does not provide that contract.
Data models, encodings, schemas, conventions, and access methods play different roles and should not be treated as synonyms.
CSV and TSV are broadly portable but weakly self-describing; explicit dialects and machine-readable schemas make them substantially more interoperable.
The most appropriate structural format depends on the data model: tables, multidimensional arrays, meteorological messages, rasters, or geospatial feature collections.
Rich containers such as HDF5 do not guarantee interoperability unless communities agree on how their internal structures are used.
NetCDF provides a shared multidimensional array data model; CF Conventions provide additional rules for climate and forecast metadata.
Zarr standardises chunked array storage, while additional conventions are still needed to express scientific relationships consistently.
Open specifications, independent implementations, validation, versioning, and transparent extension rules support long-term interoperability.
Structural interoperability makes data machine-actionable, but semantic interoperability is still required to determine whether scientific quantities are meaningfully comparable.

Semantic interoperability

Semantic interoperability concerns shared, explicit, and machine-actionable meaning.
File formats and readable labels provide containers for meaning but do not guarantee semantic agreement.
Catalogues such as EarthPortal support the discovery, assessment, mapping, and sharing of Earth-science semantic artefacts, but users must still evaluate authority, versioning, provenance, licensing, and community adoption.
Controlled vocabulary terms, units, coordinates, bounds, cell methods, grid mappings, flags, and qualified relationships work together to express scientific meaning.
A CF standard_name identifies a physical quantity; long_name remains free text for human readability.
The same standard name and convertible units are not sufficient to guarantee direct scientific comparability.
CF compliance is relative to a specific version and does not prove scientific correctness, data quality, or suitability for a particular analysis.
The global Conventions attribute is a conformance claim, not evidence that every requirement is satisfied.
Compliance checkers evaluate implemented rules and must be interpreted alongside the authoritative specification and domain knowledge.
The IDRA files are structurally similar and well documented for human readers, but some radar semantics may still require controlled mappings, clearer unit syntax, measurement-context metadata, and provenance.
Semantic interoperability is achieved through community-agreed definitions, stable identifiers, explicit relationships, and transparent harmonisation—not through variable names alone.

Technical interoperability: Data access protocols

Technical interoperability enables machine-to-machine data exchange through standardized protocols.
OPeNDAP implements the DAP protocol for remote access to structured scientific datasets.
Remote datasets can be explored without full download.
Server-side subsetting reduces bandwidth and supports scalable workflows.
Streaming protocols transform data repositories into interoperable computational infrastructure.

Technical interoperability: API

APIs operationalize technical interoperability by enabling standardized machine-to-machine interaction.
REST APIs use HTTP methods, predictable endpoints, JSON representations, stable identifiers, and authentication mechanisms.
APIs depend on structural interoperability (schemas) and semantic interoperability (controlled vocabularies).
Command-line tools such as curl provide direct access to API functionality and enable automation.
The 4TU.ResearchData API supports full dataset lifecycle management: discovery, creation, metadata update, file upload, and submission for review.

Cloud-Native Layouts

Cloud-native layouts are optimized for object storage and HTTP access.
NetCDF works well on HPC systems but is not optimized for cloud-native environments.
Zarr stores data in chunks, enabling efficient parallel access.
Kerchunk enables cloud-native access to NetCDF without data duplication.
Kerchunk changes the access pattern, not the data itself.
Cloud-native layouts affect structural interoperability, while semantic interoperability depends on metadata standards such as CF conventions.

Interoperable Infrastructure in the AI Era

AI-ready infrastructures require interoperable data layers.
Structural, semantic, and technical interoperability jointly enable AI workflows.
Cloud-native formats and consistent metadata are essential for scalable AI.
APIs, catalogs, identifiers, and versioning ensure reproducibility and automation.
AI reliability depends as much on infrastructure design as on model quality.