Technical interoperability: Streaming protocols

Last updated on 2026-03-26 | Edit this page

Overview

Questions

  • What is technical interoperability?
  • What is the DAP (Data Access Protocol)?
  • How does OPeNDAP enable remote access without full download?
  • What happens when we open a remote NetCDF file using xarray.open_dataset()?
  • Why are streaming protocols essential for large-scale scientific workflows?

Objectives

By the end of this episode, learners will be able to:

  • Define technical interoperability in the context of scientific data infrastructures.
  • Explain how DAP enables interoperable machine-to-machine data access.
  • Access a remote NetCDF dataset via OPeNDAP using Python.
  • Perform server-side subsetting of variables and dimensions.
  • Distinguish between metadata access and actual data transfer.

What is technical interoperability?


Technical interoperability concerns machine-to-machine communication.

A system is technically interoperable when independent systems can exchange and access data through standardized protocols without manual intervention.

If structural interoperability answers:

“Can I read this file?”

Technical interoperability answers:

“Can I access and exchange this data across systems in a scalable way?”

This layer operates below semantics.
It is about transport, protocol, and infrastructure.

Examples include:

  • HTTP
  • REST APIs
  • OPeNDAP
  • OGC services

In scientific data infrastructures, technical interoperability enables remote analysis workflows.

Why file download is not scalable


Large scientific datasets (climate reanalysis, ocean models, satellite archives) often reach:

  • Tens of gigabytes
  • Terabytes
  • Petabytes

Downloading entire files:

  • Is inefficient
  • Consumes bandwidth
  • Duplicates storage
  • Breaks reproducibility pipelines

Modern workflows require:

  • Remote access
  • Server-side filtering
  • On-demand subsetting
  • Integration into automated pipelines

This is where streaming protocols become essential.

DAP and OPeNDAP


The Data Access Protocol (DAP) is a protocol designed to enable remote access to structured scientific data.

OPeNDAP is a widely adopted implementation of DAP.

DAP allows:

  • Access to metadata without full download
  • Server-side slicing (e.g., select time range, variable subset)
  • Transmission of only requested data

In practice, this means:

You interact with a dataset hosted on a remote server as if it were local — but only the necessary data is transferred.

This is technical interoperability in action.

Hands-on: Accessing NetCDF via OPeNDAP in Python


We now move from concept to practice.

We will use:

  • xarray
  • A remote OPeNDAP endpoint
  • A NetCDF dataset hosted on a THREDDS server
  • Jupyter Lab

Step 1 – Open a remote dataset

  • Open Jupyter Lab and choose the appropiate environment of the lesson (see Setup)

  • Launch Jupyter Lab, open a terminal and type:

BASH


jupyter lab
  • Open a new notebook
  • Check installed libraries

PYTHON

import xarray as xr
  • Open a dataset

PYTHON


url = "https://opendap.4tu.nl/thredds/dodsC/IDRA/2019/01/02/IDRA_2019-01-02_12-00_raw_data.nc"

ds = xr.open_dataset(url,engine="pydap")

ds

Observe:

  • The dataset structure loads immediately.

  • Dimensions and metadata are visible.

  • The file has not been fully downloaded.

What happened?

Only metadata and coordinate information were accessed.

Step 2 – Select a variable

PYTHON

ds["spectrum_width"] # still no full download, just metadata

Step 3 – Perform server-side subsetting

  • Actual data transfer occurs

  • Now lets select a variable → “spectrum_width”, using positional indexing and we will take a 10×10 subset along two dimensions.

PYTHON


ds["spectrum_width"].isel(time_processed_data=slice(0,10),range=slice(0,10))
  • Now lets print the values of this subsetting

PYTHON


ds["spectrum_width"].isel(time_processed_data=slice(0,10),range=slice(0,10)).values # to print values in the scren
  • Slicing by the names of the dimensions

PYTHON


ds["spectrum_width"].sel(
    time_processed_data=slice("2019-01-02T12:00:00.000000000", "2019-01-02T12:00:02.097152173"),
    range=slice(0, 1000)
)
  • Using head

PYTHON


ds["spectrum_width"].head()
ds["spectrum_width"].head(time_processed_data=10)
ds["spectrum_width"].head(range=2)
ds["spectrum_width"].head(range=2).to_pandas() # tabular view

PYTHON


ds["spectrum_width"].isel(time_processed_data=0).values #one radar profile (1D slice)

ds["spectrum_width"].isel(range=1).values # One time series  

Now actual data transfer occurs — but only for:

  • One variable

  • A limited time window

This is server-side subsetting enabled by DAP.

Step 4 Plotting a profile

PYTHON


import matplotlib.pyplot as plt 

ds["spectrum_width"].isel(time_processed_data=0).plot()

ds["spectrum_width"].head(range=10).plot()

You have multiple equivalent ways to express the same operation:

.isel() → positional slicing (what you used) .sel() → coordinate-aware slicing .head() → quick inspection .values → raw data extraction .plot() → visual interpretation

Relevance for resarch workflows


Streaming protocols enable:

  • Scalable climate analysis (ERA5, CMIP6)

  • AI/ML training pipelines

  • Reproducible notebooks

  • Cloud-based workflows

  • Data repository integration

Technical interoperability ensures that:

Data repositories are not only storage systems, they become computational infrastructure.

Challenge

Technical interoperability — True or False?

Indicate whether each statement is True or False and justify your answer.

  • Opening a remote dataset with xarray.open_dataset() automatically downloads the entire file.

  • DAP enables server-side filtering before data transfer.

  • Streaming protocols replace the need for structural interoperability.

  • OPeNDAP works independently of file formats.

  • Technical interoperability enables automated workflows across infrastructures.

False. Only metadata is accessed initially; data is transferred upon explicit selection.

True. Subsetting occurs on the server before transmission.

False. Technical interoperability depends on structural interoperability.

False. DAP operates on structured data models (e.g., NetCDF).

True. It enables scalable machine-to-machine access.

Key Points
  • Technical interoperability enables machine-to-machine data exchange through standardized protocols.

  • OPeNDAP implements the DAP protocol for remote access to structured scientific datasets.

  • Remote datasets can be explored without full download.

  • Server-side subsetting reduces bandwidth and supports scalable workflows.

  • Streaming protocols transform data repositories into interoperable computational infrastructure.