Introduction


  • Interoperability ensures that data can be understood, combined, accessed, and reused across tools, institutions, and workflows with minimal manual intervention.

  • Interoperability operates at three complementary layers:structural (how data is encoded and organized),semantic (how data is described and interpreted), and technical (how data is accessed and exchanged).

  • The FAIR interoperability principles I1–I3 primarily address the semantic layer. They provide essential guidance on shared metadata languages, vocabularies, and references, but they do not fully cover structural and technical interoperability.

  • In climate and atmospheric science, all three layers are required for practical reuse. Structural standards (e.g., NetCDF, Zarr), semantic conventions (e.g., CF), and technical mechanisms (e.g., APIs, OPeNDAP, THREDDS) must work together.

  • Many real-world barriers to reuse datasets (unclear metadata, missing units, inconsistent coordinate systems, incompatible file formats, unstable access mechanisms) are failures of one or more interoperability layers.

  • Interoperable research workflows rely on established community formats, standardized metadata conventions, stable access protocols, and scalable cloud-native layouts that allow large heterogeneous datasets to be aligned, streamed, and analysed consistently.

  • Interoperability is essential in climate science because datasets come from diverse sources (models, satellites, sensors, reanalysis) and must be combined into integrated analyses that are reproducible and machine-actionable.

Structural interoperability


  • Structural interoperability concerns how data are organized, not what they mean.
  • Open standards are essential for machine-actionability and long-term reuse.
  • Structural interoperability is enforced by data models, not file extensions.
  • Structural interoperability is enforced by data models.
  • Standards maintained by communities (e.g. Unidata, Pangeo, OGC/WMO) encode shared structural contracts that tools and workflows can reliably depend on.
  • NetCDF exemplifies structural interoperability for multidimensional geoscience data.

Semantic interoperability


  • Semantic interoperability ensures that data variables have shared, machine-actionable scientific meaning, not just readable structure.

  • Structural interoperability is necessary but insufficient for reliable comparison and reuse across datasets.

  • The CF Conventions provide a community-governed semantic layer on top of NetCDF through standard names, units, and coordinate semantics.

  • CF compliance enables automated discovery, comparison, and integration in climate and atmospheric science workflows.

  • Semantic interoperability depends on community-agreed conventions, not on file formats or variable names alone.

Technical interoperability: Streaming protocols


  • Technical interoperability enables machine-to-machine data exchange through standardized protocols.

  • OPeNDAP implements the DAP protocol for remote access to structured scientific datasets.

  • Remote datasets can be explored without full download.

  • Server-side subsetting reduces bandwidth and supports scalable workflows.

  • Streaming protocols transform data repositories into interoperable computational infrastructure.

Technical interoperability: API


  • APIs operationalize technical interoperability by enabling standardized machine-to-machine interaction.

  • REST APIs use HTTP methods, predictable endpoints, JSON representations, stable identifiers, and authentication mechanisms.

  • APIs depend on structural interoperability (schemas) and semantic interoperability (controlled vocabularies).

  • Command-line tools such as curl provide direct access to API functionality and enable automation.

  • The 4TU.ResearchData API supports full dataset lifecycle management: discovery, creation, metadata update, file upload, and submission for review.

Cloud-Native Layouts


  • Cloud-native layouts are optimized for object storage and HTTP access.
  • NetCDF works well on HPC systems but is not optimized for cloud-native environments.
  • Zarr stores data in chunks, enabling efficient parallel access.
  • Kerchunk enables cloud-native access to NetCDF without data duplication.
  • Kerchunk changes the access pattern, not the data itself.
  • Cloud-native layouts affect structural interoperability, while semantic interoperability depends on metadata standards such as CF conventions.

Interoperable Infrastructure in the AI Era


  • AI-ready infrastructures require interoperable data layers.
  • Structural, semantic, and technical interoperability jointly enable AI workflows.
  • Cloud-native formats and consistent metadata are essential for scalable AI.
  • APIs, catalogs, identifiers, and versioning ensure reproducibility and automation.
  • AI reliability depends as much on infrastructure design as on model quality.