Summary and Schedule
This lesson is about Interoperability in Climate and Atmospheric Sciences. The value of scientific data depends not only on its scientific content but on how easily it can be found, accessed, integrated, and reused by others, whether they are human researchers or automated computational workflows.
This course focuses on how to create first-class research outputs using the NetCDF format and publishing them through the 4TU.ResearchData repository. By following community best practices, these datasets can:
be easily found through rich, machine-actionable metadata,
be reliably accessed using open standards and stable identifiers,
be seamlessly integrated with other datasets and
be confidently reused .
Throughout this course, you will learn how to produce NetCDF datasets that meet these standards, datasets that are not only scientifically valuable today, but that remain accessible, interoperable, and reusable for years to come.
Target audience
This lesson is intended for researchers in the climate and atmospheric sciences who handle multidimensional NetCDF datasets and intend to make their data and software more reusable by others.
Leo’s challenge: combining climate data (use case)
Leo is studying extreme heatwaves in Europe. He wants to compare his climate model results with satellite observations, urban sensor data, and aircraft measurements.
He starts searching across platforms like Copernicus Climate Data Store, NASA EarthData, and 4TU.ResearchData. At first, everything seems available. But once he begins working with the data, problems appear: Data is spread across different repositories with different access methods, files come in many formats (NetCDF, CSV, GeoTIFF, Excel), and variable names, units, and metadata are inconsistent or unclear. Instead of focusing on heatwaves, Leo spends days just trying to understand and prepare the data. Leo’s problem is not a lack of data or tools. It is a lack of interoperability.
- Data was not created using shared standards
- Metadata is not machine-readable or consistent
- Datasets are difficult to combine across sources
If datasets followed community practices , Leo could:
- Find data faster
- Access it programmatically
- Combine datasets without manual cleanup
- Focus on science instead of data wrangling
This is why interoperability matters: it turns data into something that can be easily reused, combined, and trusted. This lesson helps researchers in climate and atmospheric sciences recognize and apply this essential aspect of modern research.
Learning objectives
By the end of this lesson, learners will be able to:
Analyze climate and atmospheric datasets to distinguish interoperable from non-interoperable systems across structural, semantic, and technical layers.
Analyze a NetCDF dataset to evaluate how its data model, dimensions, variables, and metadata organization enable structural interoperability.
Evaluate the semantic interoperability of a NetCDF dataset using CF Conventions and explain how shared vocabularies enable machine-actionable meaning.
Apply OPeNDAP to access and subset remote NetCDF datasets, distinguishing between metadata retrieval and data transfer in distributed infrastructures.
Apply and analyze REST API principles to programmatically create and manage repository metadata, explaining how APIs operationalize technical interoperability.
Analyze how cloud-native data layouts (NetCDF vs Zarr) affect performance, scalability, and structural interoperability in distributed environments.
Evaluate a research data infrastructure against AI-readiness requirements by linking structural, semantic, and technical interoperability components to scalable machine learning workflows.
To follow this lesson, learners should already be able to have :
- Working knowledge in Python (write and execute short scripts in Python)
- Awareness of NetCDF format
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction |
Why interoperability is important when dealing with research
data? What are the three layers of interoperability? How can you identify if a dataset is interoperable or not? |
| Duration: 01h 00m | 2. Structural interoperability |
What is structural interoperability? How do open standards and community governance enable structurally interoperable research data? Which structural expectations must a data format satisfy to support automated, machine-actionable workflows? Which open standards are commonly used in climate and atmospheric sciences to achieve structural interoperability? What is NetCDF’s data structure? |
| Duration: 01h 45m | 3. Semantic interoperability |
What is semantic interoperability ? Why is structural interoperability alone insufficient for meaningful data reuse? How do community metadata conventions (e.g. CF) encode shared scientific meaning? What does it mean for a NetCDF file to be “CF-compliant”? |
| Duration: 02h 05m | 4. Technical interoperability: Streaming protocols |
What is technical interoperability? What is the DAP (Data Access Protocol)? How does OPeNDAP enable remote access without full download? What happens when we open a remote NetCDF file using xarray.open_dataset()?Why are streaming protocols essential for large-scale scientific workflows? |
| Duration: 02h 50m | 5. Technical interoperability: API |
What is technical interoperability in research data
infrastructures? What is a REST API? How do APIs enable machine-to-machine workflows? How do APIs depend on structural and semantic interoperability? How can we programmatically manage datasets using the 4TU.ResearchData API? |
| Duration: 04h 50m | 6. Cloud-Native Layouts |
What does “cloud-native” mean in the context of scientific data? Why can NetCDF struggle in cloud environments? How is Zarr different from NetCDF? Which part of interoperability is affected by cloud-native layouts? |
| Duration: 05h 35m | 7. Interoperable Infrastructure in the AI Era |
What does “AI-ready” mean in the context of climate data
infrastructures? Why is interoperability a prerequisite for trustworthy AI? Which infrastructural components enable AI at scale? |
| Duration: 06h 05m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Project Setup
Create a working directory for this course:
Software Setup
We will use JupyterLab for live coding and exercises.
This course requires:
- A Python 3 environment
- A Unix-like terminal
- Several Python libraries (installed via
requirements.txt)
Follow the steps below carefully.
1. Install Python 3 (Required)
Download Python from:
👉 https://www.python.org/downloads/
This course was tested with Python 3.11, but any supported version should work: https://devguide.python.org/versions/#versions
⚠️ Python 2.7 is not supported
2. Set Up the Python Environment
We will:
- Create a virtual environment
- Define dependencies in
requirements.txt - Install all libraries in one step
Step 1 — Create a Virtual Environment
Activate it:
-
macOS / Linux
-
Windows (PowerShell)
You should now see (nes-course-env) in your terminal
prompt.
Step 2 — Create requirements.txt
Make sure you are in your project folder:
Create a file named:
Add the following content:
TXT
# Core scientific stack
xarray
netCDF4
pydap
matplotlib
scipy
# Cloud-native data access
zarr
kerchunk
fsspec[http]
h5netcdf
h5py
# Interactive environment
jupyterlab
ipykernel
Step 3 — Install Dependencies
Upgrade pip and install all packages:
You should send this step prior to the lesson since it can take some time (~ 20 mins ) as part of a pre workshop email for example and recommend to the participants to complete the setup before the lesson if possible. If not , keep in mind to leave a 30 mins buffer to complete all installations.
3. Unix Terminal (Required for API Episodes)
You will need a Unix-like terminal.
Windows
Install one of:
- Git Bash: https://git-scm.com/downloads
- Windows Subsystem for Linux (WSL): https://learn.microsoft.com/en-us/windows/wsl/install