Summary and Setup

This lesson is about Interoperability in Climate and Atmospheric Sciences. The value of scientific data depends not only on its scientific content but on how easily it can be found, accessed, integrated, and reused by others, whether they are human researchers or automated computational workflows.

This course focuses on how to create first-class research outputs using the NetCDF format and publishing them through the 4TU.ResearchData repository. By following community best practices, these datasets can:

  • be easily found through rich, machine-actionable metadata,

  • be reliably accessed using open standards and stable identifiers,

  • be seamlessly integrated with other datasets and

  • be confidently reused .

Throughout this course, you will learn how to produce NetCDF datasets that meet these standards, datasets that are not only scientifically valuable today, but that remain accessible, interoperable, and reusable for years to come.

Target audience


This lesson is intended for researchers in the climate and atmospheric sciences who handle multidimensional NetCDF datasets and intend to make their data and software more reusable by others.

Leo’s challenge: combining climate data (use case)


Leo is studying extreme heatwaves in Europe. He wants to compare his climate model results with satellite observations, urban sensor data, and aircraft measurements.

He starts searching across platforms like Copernicus Climate Data Store, NASA EarthData, and 4TU.ResearchData. At first, everything seems available. But once he begins working with the data, problems appear: Data is spread across different repositories with different access methods, files come in many formats (NetCDF, CSV, GeoTIFF, Excel), and variable names, units, and metadata are inconsistent or unclear. Instead of focusing on heatwaves, Leo spends days just trying to understand and prepare the data. Leo’s problem is not a lack of data or tools. It is a lack of interoperability.

  • Data was not created using shared standards
  • Metadata is not machine-readable or consistent
  • Datasets are difficult to combine across sources

If datasets followed community practices , Leo could:

  • Find data faster
  • Access it programmatically
  • Combine datasets without manual cleanup
  • Focus on science instead of data wrangling

This is why interoperability matters: it turns data into something that can be easily reused, combined, and trusted. This lesson helps researchers in climate and atmospheric sciences recognize and apply this essential aspect of modern research.

Learning objectives


By the end of this lesson, learners will be able to:

  • Analyze climate and atmospheric datasets to distinguish interoperable from non-interoperable systems across structural, semantic, and technical layers.

  • Analyze a NetCDF dataset to evaluate how its data model, dimensions, variables, and metadata organization enable structural interoperability.

  • Evaluate the semantic interoperability of a NetCDF dataset using CF Conventions and explain how shared vocabularies enable machine-actionable meaning.

  • Apply OPeNDAP to access and subset remote NetCDF datasets, distinguishing between metadata retrieval and data transfer in distributed infrastructures.

  • Apply and analyze REST API principles to programmatically create and manage repository metadata, explaining how APIs operationalize technical interoperability.

  • Analyze how cloud-native data layouts (NetCDF vs Zarr) affect performance, scalability, and structural interoperability in distributed environments.

  • Evaluate a research data infrastructure against AI-readiness requirements by linking structural, semantic, and technical interoperability components to scalable machine learning workflows.

Prerequisite

To follow this lesson, learners should already be able to have :

  • Working knowledge in Python (write and execute short scripts in Python)
  • Awareness of NetCDF format

Project Setup


Create a working directory for this course:

BASH

cd ~/Desktop
mkdir Interoperability_climate_sciences
cd Interoperability_climate_sciences

Software Setup


We will use JupyterLab for live coding and exercises.

This course requires:

  • A Python 3 environment
  • A Unix-like terminal
  • Several Python libraries (installed via requirements.txt)

Follow the steps below carefully.


1. Install Python 3 (Required)


Download Python from:

👉 https://www.python.org/downloads/

This course was tested with Python 3.11, but any supported version should work: https://devguide.python.org/versions/#versions

⚠️ Python 2.7 is not supported


Verify Installation

Open a terminal and run:

BASH

python3 --version   # macOS / Linux
python --version    # Windows

Expected output (example):

BASH

Python 3.11.4

You can also start Python interactively:

BASH

python3   # or python on Windows

Exit with:

BASH

exit()

or press CTRL+D.


2. Set Up the Python Environment


We will:

  1. Create a virtual environment
  2. Define dependencies in requirements.txt
  3. Install all libraries in one step

Step 1 — Create a Virtual Environment

BASH

python3 -m venv nes-course-env

Activate it:

  • macOS / Linux

    BASH

    source nes-course-env/bin/activate
  • Windows (PowerShell)

    BASH

    nes-course-env\Scripts\Activate.ps1

You should now see (nes-course-env) in your terminal prompt.


Step 2 — Create requirements.txt

Make sure you are in your project folder:

BASH

cd ~/Desktop/Interoperability_climate_sciences

Create a file named:

BASH

touch requirements.txt

Add the following content:

TXT

# Core scientific stack
xarray
netCDF4
pydap
matplotlib
scipy

# Cloud-native data access
zarr
kerchunk
fsspec[http]
h5netcdf
h5py

# Interactive environment
jupyterlab
ipykernel

Step 3 — Install Dependencies

Upgrade pip and install all packages:

BASH

pip install --upgrade pip
pip install -r requirements.txt

BASH

python -c "import xarray, netCDF4, pydap, zarr, kerchunk, fsspec; print('All good')"

Step 5 — Register the Environment in Jupyter

BASH

python -m ipykernel install --user --name nes-course-env --display-name "NES Course (Python)"

Step 6 — Launch JupyterLab

BASH

jupyter lab

In JupyterLab:

  • Open a notebook
  • Select kernel: “NES Course (Python)”

3. Unix Terminal (Required for API Episodes)


You will need a Unix-like terminal.

Linux

Use the default terminal.

macOS

Use the default Terminal app.

Windows

Install one of:


4. API Command-Line Tools (Required for REST API Episodes)


yq (Required)

YAML processor for working with metadata.

Linux

BASH

sudo apt-get update
sudo apt install yq

macOS

BASH

brew install yq

Windows (PowerShell)

Install Scoop:

BASH

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Invoke-RestMethod -Uri https://get.scoop.sh | Invoke-Expression

Then install yq:

BASH

scoop install yq

JSON processor for formatting API output.

Linux

BASH

sudo apt-get update
sudo apt-get install -y jq

macOS

BASH

brew install jq

Windows

BASH

scoop install main/jq

Verify Installation

BASH

yq --version
jq --version