

| [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](../LICENSE) | [![Python](https://img.shields.io/badge/Python-3.10+-black.svg)](https://www.python.org/) | [![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-red.svg)](https://jupyter.org/) | [![GeoPandas](https://img.shields.io/badge/Geo-GeoPandas-darkgreen.svg)](https://geopandas.org/) | [![Requests](https://img.shields.io/badge/HTTP-Requests-darkred.svg)](https://docs.python-requests.org/) | [![Pathlib](https://img.shields.io/badge/FS-Pathlib-black.svg)](https://docs.python.org/3/library/pathlib.html) | [![JSON](https://img.shields.io/badge/Data-JSON-grey.svg)](https://www.json.org/) |
|---|---|---|---|---|---|---|



### **Notebook 1 - Data Collection, Inspection and Preparation**

---

**Flood Risk in Lærdal: Official municipality boundary (kommune 4642, Vestland)**      
Source: [https://www.geonorge.no](https://www.geonorge.no)

### Project overview

**Goal:** Download and validate the official boundary for Lærdal municipality to use as a base layer in flood risk mapping and geospatial analysis.      
**Method:** HTTP download - ZIP - vector extraction - GeoDataFrame - exploratory checks - output for reuse.       
**Tools:** Python (`requests`, `zipfile`, `pathlib`, `geopandas`, `json`)

### Reproducibility – quick reference | Reproduserbarhet – hurtigoversikt

**Outputs (this notebook):**

* `data/raw/laerdal_boundary.zip` *(ZIP archive from GeoNorge)*
* `data/processed/laerdal_boundary_lines_2023.geojson` *(Reprojected to: EPSG:25832)*
* `data/processed/laerdal_bbox_centroid.json` *(bounding box and centroid data)*
* `data/results/laerdal_boundary_preview_raw_geometry_2023` *(preview raw geometry)*

**Parameters (this notebook):**

* **Kommune number** = 4642 (Lærdal)
* **Projection for area calculation** = EPSG:25832
* **Projection for visualization** = EPSG:4326

---

In [None]:
# (EN) Reproducibility parameters and paths
# (NO) Reproduserbarhetsparametere og stier

MUNICIPALITY = "Lærdal"
KOMMUNE_NR = "4642"
YEAR = "2023"

# Coordinate Reference Systems
CRS_EPSG = "EPSG:25832"  # Projected CRS for spatial analysis (UTM zone 32N)
CRS_WGS84 = "EPSG:4326"  # Geographic CRS for visualization (lat/lon)

# Source download (GeoNorge vector ZIP for Lærdal boundary)
ZIP_URL = "https://nedlasting.geonorge.no/api/download/order/7bc71575-8447-4159-a928-26274eb4a001/458a3462-df83-4b35-9d51-d893a5a135f3"

# Project directories (relative to notebooks)
from pathlib import Path
PROJECT_ROOT = Path.cwd().parent
RAW_DIR = PROJECT_ROOT / "data" / "raw"
PROCESSED_DIR = PROJECT_ROOT / "data" / "processed"
RESULT_DIR = PROJECT_ROOT / "results" 

# Ensure output folders exist
for folder in [RAW_DIR, PROCESSED_DIR, RESULT_DIR]:
    folder.mkdir(parents=True, exist_ok=True)

# Define standard output paths
BOUNDARY_LINES_GEOJSON = PROCESSED_DIR / f"laerdal_boundary_lines_{YEAR}.geojson"
BOUNDARY_PLOT_PNG = RESULT_DIR / f"laerdal_boundary_preview_raw geometry_{YEAR}.png"  
ZIP_PATH = RAW_DIR / f"laerdal_boundary_{YEAR}.zip"

In [None]:
# (EN) Import required libraries and verify data/result folders
# (NO) Importerer nødvendige biblioteker og bekrefter mapper for data/resultater

import json
import zipfile
import geopandas as gpd
import requests
import matplotlib.pyplot as plt

# Confirm folder setup
print(f"  • RAW_DIR exists:        {RAW_DIR.exists()}")
print(f"  • PROCESSED_DIR exists:  {PROCESSED_DIR.exists()}")
print(f"  • RESULT_DIR exists:    {RESULT_DIR.exists()}")

In [None]:
# (EN) Download boundary ZIP file
# (NO) Last ned ZIP‑fil med kommunegrense

print("Downloading municipality boundary ZIP file...")
response = requests.get(ZIP_URL, timeout=60)
response.raise_for_status()

# Save 
ZIP_PATH.write_bytes(response.content)
print(f"Saved ZIP to: {ZIP_PATH}")

In [None]:
# (EN) Read metadata file 
# (NO) Les metadatafil (om tilgjengelig) 

with zipfile.ZipFile(ZIP_PATH, 'r') as zf:
    metadata_files = [f for f in zf.namelist() if "metadata" in f.lower() and f.endswith(".json")]
    
    if metadata_files:
        print(f"Found metadata file: {metadata_files[0]}")
        with zf.open(metadata_files[0]) as f:
            metadata = json.load(f)
            for key, value in metadata.items():
                print(f"{key}: {value}")
    else:
        print("No metadata JSON file found in this dataset. Optional step skipped.")

In [None]:
# (EN) List all files inside the ZIP archive to understand its structure
# (NO) Vis alle filer i ZIP-arkivet for å forstå strukturen

with zipfile.ZipFile(ZIP_PATH, 'r') as zf:
    print("Files in ZIP archive:\n")
    for name in zf.namelist():
      print("•", name)

In [None]:
# (EN) Extract vector layer (municipality boundary)- If multiple layers are present, prefer the one with 'Grense' in its name to ensure clarity
# (NO) Hent geodata-laget som inneholder kommunegrensen til Lærdal - Hvis flere lag finnes, prioriter filen med 'Grense' i navnet for tydelighet

with zipfile.ZipFile(ZIP_PATH, 'r') as zf:
    members = zf.namelist()
    vector_files = [f for f in members if f.lower().endswith((".geojson", ".gml", ".shp", ".gpkg"))]

    if not vector_files:
        raise ValueError("No vector file found. Available files:\n" + "\n".join(members))

    # Prefer file containing 'Grense' in name
    selected_file = next((f for f in vector_files if "Grense" in f), vector_files[0])
    print("Selected vector layer:", selected_file)

    gdf_all = gpd.read_file(zf.open(selected_file))

print("Loaded features:", len(gdf_all))

In [None]:
# (EN) Preview dataset structure - Check first rows, dataset structure, and coordinate reference system (CRS)
# (NO) Forhåndsvis datastrukturen - Sjekk de første radene, datastruktur og koordinatsystem 

print("First rows:")
display(gdf_all.head())

print("\nCoordinate Reference System (CRS):")
print(gdf_all.crs)

print("\nDataset info:")
gdf_all.info()

In [None]:
# (EN) Check for missing values and unique object types
# (NO) Sjekk for manglende verdier og unike objekttyper

print("Missing values per column:")
display(gdf_all.isna().sum())

if "objtype" in gdf_all.columns:
    print("\nUnique object types:")
    display(gdf_all["objtype"].unique())

In [None]:
# (EN) Visual preview of the raw geometry (unprojected)
# (NO) Visuell forhåndsvisning av rågeometrien (ikke reprojisert)

ax = gdf_all.plot(edgecolor="black", figsize=(6, 6))
ax.set_title("Visual preview of the raw geometry (unprojected)")
ax.set_axis_off()

plt.savefig(BOUNDARY_PLOT_PNG, dpi=300, bbox_inches="tight", pad_inches=0.1)
plt.show()

print(f"Saved figure to: {BOUNDARY_PLOT_PNG}")

In [None]:
# (EN) Check for missing values in the boundary dataset
# (NO) Sjekk for manglende verdier i kommunegrensedatasettet

print("Missing values per column:")
display(gdf_all.isna().sum())

In [None]:
# (EN) Filter for Lærdal municipality (kommune 4642)
# (NO) Filtrer for Lærdal kommune (4642)

if "kommunenummer" in gdf_all.columns:
    print(f"Filtering by kommunenummer == {KOMMUNE_NR}")
    gdf = gdf_all[gdf_all["kommunenummer"] == KOMMUNE_NR]

    if gdf.empty:
        raise ValueError(f"Kommune {KOMMUNE_NR} not found. Available values: {gdf_all['kommunenummer'].unique()}")
else:
    print("Column 'kommunenummer' not present. Assuming dataset contains only Lærdal.")
    gdf = gdf_all.copy()

In [None]:
# (EN) Reproject to UTM zone 32N for accurate area calculation
# (NO) Reprojiser til UTM sone 32N for nøyaktig arealberegning

gdf = gdf.to_crs(CRS_EPSG)
print(f"Reprojected to: {CRS_EPSG}")

# (EN) Save as GeoJSON in the 'processed' folder
# (NO) Lagre som GeoJSON i mappen 'processed'

gdf.to_file(BOUNDARY_LINES_GEOJSON, driver="GeoJSON")
print(f"Saved boundary to: {BOUNDARY_LINES_GEOJSON}")

Save municipal boundary: The cleaned and reprojected boundary of Lærdal municipality is now saved as a GeoJSON file. This standardized format allows us to reuse it later for spatial clipping, flood zone analysis, and visualization.

In [None]:
# (EN) Save bounding box
# (NO) Lagre bounding box

minx, miny, maxx, maxy = gdf.total_bounds
bbox_data = {
    "minx": minx,
    "miny": miny,
    "maxx": maxx,
    "maxy": maxy,
    "crs": CRS_EPSG
}
with open(PROCESSED_DIR / "laerdal_bbox.json", "w") as f:
    json.dump(bbox_data, f, indent=2)

# Save centroid in WGS84 (lat/lon) - Lagre sentrumspunkt i WGS84 (bredde/lengdegrad)

centroid = gdf.to_crs(CRS_WGS84).geometry.centroid.iloc[0]
centroid_data = {
    "lat": float(centroid.y),
    "lon": float(centroid.x),
    "crs": CRS_WGS84
}
with open(PROCESSED_DIR / "laerdal_centroid.json", "w") as f: 
    json.dump(centroid_data, f, indent=2)

print("Saved: bbox and centroid JSON files to PROCESSED_DIR")

Bounding box and centroid: This bounding box defines the rectangular area around Lærdal municipality.  
It can be used to filter spatial APIs, define plot limits, or clip other datasets. The centroid is calculated in WGS84 (latitude/longitude), which is useful for mapping, tooltips, and location-based filters.

In [None]:
# (EN) Check geometry type and validity 
# (NO) Sjekk geometriens type og gyldighet

# Load boundary file from processed folder
gdf = gpd.read_file(PROCESSED_DIR / "laerdal_boundary_lines_2023.geojson")

print("Geometry type:")
display(gdf.geometry.type.unique())

print("Is geometry valid?")
display(gdf.is_valid.all())

print("Is geometry empty?")
display(gdf.is_empty.all())

print("CRS:")
display(gdf.crs)

**EN:** Phase 1 is complete:
- Set reproducibility params and folder layout; ensured standard paths.
- Downloaded the **Lærdal** boundary ZIP from GeoNorge and extracted the vector layer.
- Loaded features into GeoPandas and previewed structure (columns, head, info, missing values, unique `objtype`).
- Filtered to municipality **4642 Lærdal** when applicable.
- Reprojected to **ETRS89 / UTM zone 32N** (**EPSG:25832**)

*Next:* Polygonize boundary lines into a closed municipality polygon, verify topology, and prepare for clipping of flood-hazard layers (20/200/1000 year zones).

**NO:** Fase 1 er fullført:
- Satt opp parametere for reproduserbarhet og mappeoppsett; bekreftet standardstier (`RAW_DIR`, `PROCESSED_DIR`, `GEO_DIR`, `RESULT_DIR`).
- Lastet ned ZIP-filen med **Lærdal** kommunegrense fra GeoNorge og hentet vektorlaget.
- Leste inn data i GeoPandas og forhåndsvistet struktur (kolonner, head, info, manglende verdier, unike `objtype`).
- Filtrerte til kommune **4642 Lærdal** der det var relevant.
- Reprojiserte til **ETRS89 / UTM sone 32N** (**EPSG:25832**) 

*Neste steg:* Polygonisere grenselinjene til et lukket kommune-polygon, verifisere topologi og forberede klipping av flomsonekart (20/200/1000 års).

**Navigation Links**
  
- [Notebook 2 - Boundary Polygonization and Flood Clipping](./02_floodzone_analysis.ipynb)  