# Weeks 4–6 Tutorial  
## Reading Geoscience Data & Plotting (Step-by-Step)

You should:
- Read **every Markdown cell**
- Read **every code comment**
- Run cells slowly and experiment

Repetition is part of the learning process.


Over next **2-3 weeks**, you will practice:
- Opening netCDF files
- Understanding dimensions and variables
- Making the *same plot many times*
- Improving plots step by step



# WEEK 4 — Learning How to Read Data



## 1. What does “reading data” mean?

Reading data is **not** just loading a file.

Before plotting anything, you should be able to answer:
1. What does each variable represent physically?
2. What are the units?
3. What are the dimensions?
4. What is one single number measuring?

We will practice answering these questions repeatedly.



## 2. Import required libraries

We use:
- `netCDF4` to write netCDF files
- `xarray` to read and inspect them
- `pandas` for tables
- `matplotlib` for plotting


In [1]:

# Import libraries
# If this cell fails, your environment is not set up correctly.

import netCDF4 as nc      # for working directly with netCDF files
import xarray as xr       # higher-level interface for labeled data
import numpy as np        # numerical arrays and math
import pandas as pd       # tabular data structures
import matplotlib.pyplot as plt  # plotting

print("All libraries imported successfully.")


All libraries imported successfully.



## 3. Creating a small practice netCDF file

Instead of downloading real data, we first create a **synthetic dataset**.

Why?
- You know exactly what the data represent
- You can focus on *skills*, not data hunting

Dataset description:
- Daily data for 1 year
- Latitude × longitude grid
- Variables:
  - temperature (°C)
  - precipitation (mm/day)


In [2]:

# We import tools for working with dates and files
from datetime import datetime, timedelta
import os

# Name of the file we will create
fname = "practice_climate.nc"

# If the file already exists, delete it so we start clean
if os.path.exists(fname):
    os.remove(fname)

# Define dataset size
ntime = 365   # number of days
nlat = 4      # number of latitude points
nlon = 6      # number of longitude points

# Create a list of dates
times = [datetime(2024,1,1) + timedelta(days=i) for i in range(ntime)]

# Define latitude and longitude arrays
lats = np.linspace(38, 44, nlat)     # degrees north
lons = np.linspace(-118, -108, nlon) # degrees east

# Create synthetic temperature data
# Seasonal cycle + random noise
temp = (
    10
    + 9 * np.sin(2 * np.pi * np.arange(ntime)[:, None, None] / 365)
    + np.random.normal(0, 1.2, (ntime, nlat, nlon))
)

# Create synthetic precipitation data
# Gamma distribution mimics many small events and few large ones
precip = np.random.gamma(1.0, 2.0, (ntime, nlat, nlon))

# Create the netCDF file
ds = nc.Dataset(fname, "w")

# Define dimensions
ds.createDimension("time", ntime)
ds.createDimension("lat", nlat)
ds.createDimension("lon", nlon)

# Create coordinate variables
time_var = ds.createVariable("time", "i4", ("time",))
lat_var = ds.createVariable("lat", "f4", ("lat",))
lon_var = ds.createVariable("lon", "f4", ("lon",))

# Create data variables
temp_var = ds.createVariable("temperature", "f4", ("time", "lat", "lon"))
precip_var = ds.createVariable("precipitation", "f4", ("time", "lat", "lon"))

# Assign units and values
time_var.units = "days since 1970-01-01"
time_var[:] = nc.date2num(times, time_var.units)

lat_var.units = "degrees_north"
lat_var[:] = lats

lon_var.units = "degrees_east"
lon_var[:] = lons

temp_var.units = "C"
temp_var.long_name = "Daily mean surface temperature"
temp_var[:] = temp

precip_var.units = "mm/day"
precip_var.long_name = "Daily precipitation"
precip_var[:] = precip

# Close the file (VERY IMPORTANT)
ds.close()

print("Created netCDF file:", fname)


Created netCDF file: practice_climate.nc



## 4. Opening the netCDF file with xarray

`xarray` shows the *structure* of the dataset.


In [3]:

# Open the dataset
ds = xr.open_dataset("practice_climate.nc")

# Display dataset summary
ds



# Week 4 — Exercises

## Exercise 4.1 — Inventory the dataset (very basic, very important)

**Task:**

1. List **all variables** in the dataset.
2. For each variable, write down:
   - Dimensions
   - Units
   - Physical meaning (in your own words)
---

## Exercise 4.2 — Dimensions vs. coordinates

**Task:**

1. Identify which variables are: dimensions, coordinates, data variables.
2. Explain the difference between them in your own words.

---

## Exercise 4.3 — numbers

**Task:**

1. Select one single data value, for example:
```python
ds["temperature"].isel(time=100, lat=2, lon=3)
```
2. Explain in 2–3 sentences:
   - Where is this value located?
   - When was it measured?
   - What does it represent physically?

---

## Exercise 4.4 — Shape and size reasoning (no plotting)

**Task:**

1. Print the shape of each variable.
2. Compute:
   - Total number of temperature values
   - Approximate file size if stored as 32-bit floats

**Hint:**
```python
ds["temperature"].size
```
---

## Exercise 4.5 — Compare two grid points

**Task:**

1. Select two different grid points.
2. Compute:
   - Mean temperature at each point
   - Mean precipitation at each point
---

## Exercise 4.6 — Time slicing practice

**Task:**

1. Select precpitation data from:
   - One month
   - One season (e.g., Jun–Aug)
---



# WEEK 5 — Plotting Fundamentals



## 5. Selecting a subset of the data

We almost never plot the full dataset directly.
We first reduce it to something manageable.


In [None]:

# Select the grid point closest to a chosen location
point = ds.sel(lat=41, lon=-113, method="nearest")

# Convert to a pandas DataFrame
df = point.to_dataframe()

# Display first few rows
df.head()



## 6. Time series plots

Time series plots answer questions like:
- How does a variable change over time?
- Is there seasonality?


In [None]:

# Basic temperature time series
plt.figure(figsize=(10,3))
plt.plot(df.index, df["temperature"])

plt.xlabel("Date")
plt.ylabel("Temperature (°C)")
plt.title("Daily Temperature at One Location")

plt.tight_layout()
plt.show()



### Practice
- Change line color and thickness
- Plot precipitation instead of temperature
- Zoom into one month



## 7. Histograms

Histograms show how values are distributed.
The number of bins changes how the story looks.


In [None]:

# Histogram of temperature
plt.figure(figsize=(6,3))
plt.hist(df["temperature"], bins=20)

plt.xlabel("Temperature (°C)")
plt.ylabel("Count")
plt.title("Temperature Distribution")

plt.tight_layout()
plt.show()



### Practice
- Try 10 bins, then 40 bins
- Make a histogram of precipitation



## 8. Scatter plots

Scatter plots show relationships between two variables.
Each point is one day.


In [None]:

# Scatter plot: temperature vs precipitation
plt.figure(figsize=(6,4))
plt.scatter(df["temperature"], df["precipitation"], alpha=0.4)

plt.xlabel("Temperature (°C)")
plt.ylabel("Precipitation (mm/day)")
plt.title("Temperature vs Precipitation")

plt.tight_layout()
plt.show()



# WEEK 6 — Plot Quality & Interpretation



## 9. Improving plot quality

A good scientific plot:
- Has labels with units
- Has a clear title
- Is readable without explanation


In [None]:

# Improved time series plot
plt.figure(figsize=(10,3))
plt.plot(df.index, df["temperature"], linewidth=0.8)

plt.xlabel("Date")
plt.ylabel("Temperature (°C)")
plt.title("Daily Surface Temperature (Improved Plot)")

plt.tight_layout()
plt.show()



## 10. Required interpretation

Write **one full paragraph (5–7 sentences)** answering:
- What patterns do you see?
- Which plot was most informative?
- What question would you ask next?



# Week 5-6 — Exercises

These exercises focus on making many plots from the same data and thinking about plotting choices.
Complete the required tasks and write short reflections in Markdown cells below each exercise or in a single `week5_exercises.md` file.

---

## Exercise 5.1 — Time series

**Task:**

1. Make a time series plot of temperature at one grid point.
2. Make a second time series plot of precipitation at the same point.

**Requirements:**

- Axis labels with units
- Title describing *what* and *where*
- Readable time axis

---

## Exercise 5.2 — Plot the same data, three ways

**Task:** Create three different time series plots of temperature:

1. Raw daily values  
2. 7-day rolling mean  
3. Monthly mean

---

## Exercise 5.3 — Zooming in and out (scale awareness)

**Task:**

1. Plot temperature for the entire year
2. Plot temperature for one month
3. Plot temperature for one week

---

## Exercise 5.4 — Histogram

**Task:** Make three histograms of precipitation with different bin counts:
- 10 bins
- 25 bins
- 50 bins

---

## Exercise 5.5 — Comparing distributions

**Task:**

1. Select two different grid points.
2. Make overlaid histograms of temperature at both locations (use same bins and alpha transparency).

---

## Exercise 5.6 — Scatter plots

**Task:**

1. Create a scatter plot of temperature vs precipitation for the chosen point or for all grid points averaged by day.
2. Color points by month or season to reveal structure.

---

## Exercise 5.9 — Choose the right plot

For each question below, choose the best plot type and justify it (1–2 sentences), then make the plot:

1. “How does temperature vary over the year?”  
2. “What temperatures are most common?”  
3. “Is precipitation related to temperature?”

---