# Week 7: Supervised Land Cover Classification with Google Earth Engine

This lab introduces Google Earth Engine (GEE), a powerful platform for planetary-scale geospatial analysis. We'll use GEE to perform supervised land cover classification on Sentinel-2 imagery over Vienna, using training data from last week's lab.

GEE is widely used in remote sensing because it handles the infrastructure for you—no need to download imagery, manage storage, or provision compute. The tradeoff is that you're working with a different programming model (lazy evaluation, server-side compute) that takes some getting used to.

In [3]:
import ee
import geemap
import geopandas as gpd
import pandas as pd

## 1. Authentication and Setup

Before using Earth Engine, you need to:
1. Have a Google account
2. Register for Earth Engine at https://earthengine.google.com/
3. Create a Google Cloud Project (free tier is fine)

The first time you run this, `ee.Authenticate()` will open a browser window for OAuth. After that, credentials are cached locally.

In [4]:
# Authenticate (only needed once per machine)
ee.Authenticate()

True

In [None]:
# Initialize with your Google Cloud project
# Replace 'your-project-id' with your actual project ID
ee.Initialize(project="grand-magpie-459819-i5")

## 2. Load Sentinel-2 Imagery for Vienna

We'll use the same area and time period as last week (March 2020) but now at native 10m resolution. GEE parallelizes computation across its infrastructure, so 10m over all of Vienna runs in seconds. We filter for low cloud cover and take a median composite to reduce noise.

In [6]:
# Vienna bounding box (same as week 6)
vienna_bounds = [16.32, 47.86, 16.9, 48.407]
vienna_bbox = ee.Geometry.Rectangle(vienna_bounds)

# Sentinel-2 Surface Reflectance, March 2020
s2 = (
    ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
    .filterBounds(vienna_bbox)
    .filterDate("2020-03-01", "2020-03-31")
    .filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 30))
)

print(f"Found {s2.size().getInfo()} Sentinel-2 scenes")

Found 7 Sentinel-2 scenes


In [18]:
# Create a cloud-masked median composite


def mask_clouds(image):
    """Mask clouds using the SCL band."""
    scl = image.select("SCL")
    # Mask out clouds (9), cloud shadows (3), and cirrus (10)
    mask = scl.neq(3).And(scl.neq(9)).And(scl.neq(10))
    return image.updateMask(mask)


# Select spectral bands and compute median
bands = ["B2", "B3", "B4", "B8", "B11", "B12"]  # Blue, Green, Red, NIR, SWIR1, SWIR2
band_names = ["blue", "green", "red", "nir", "swir1", "swir2"]

composite = s2.map(mask_clouds).select(bands, band_names).median().clip(vienna_bbox)

# Add spectral indices
ndvi = composite.normalizedDifference(["nir", "red"]).rename("ndvi")
mndwi = composite.normalizedDifference(["green", "swir1"]).rename("mndwi")

composite = composite.addBands([ndvi, mndwi])
print("Bands in composite:", composite.bandNames().getInfo())

Bands in composite: ['blue', 'green', 'red', 'nir', 'swir1', 'swir2', 'ndvi', 'mndwi']


In [19]:
# Visualize the composite
Map = geemap.Map()
Map.centerObject(vienna_bbox, 10)

vis_params = {"bands": ["red", "green", "blue"], "min": 0, "max": 3000}
Map.addLayer(composite, vis_params, "Sentinel-2 RGB")

fc_params = {"bands": ["nir", "red", "green"], "min": 0, "max": 4000}
Map.addLayer(composite, fc_params, "False Color (NIR-R-G)")

Map

Map(center=[48.13337915930423, 16.609999999999516], controls=(WidgetControl(options=['position', 'transparent_…

## 3. Load Training Data

We'll use the same training data from week 6 (`vienna_samples.parquet`). The `geemap.gdf_to_ee()` function converts a local GeoDataFrame directly to an Earth Engine FeatureCollection—no need to upload assets.

In [9]:
# Load training data from week 6
gdf_samples = gpd.read_parquet("../week06/vienna_samples.parquet")

# Check class distribution
class_counts = gdf_samples["class_name"].value_counts()
print("Training samples per class:")
print(class_counts.to_string())
print(f"\nTotal: {len(gdf_samples)} samples")

# Convert to Earth Engine FeatureCollection
# GEE expects class labels as consecutive integers starting from 0
class_mapping = {
    "Water": 0,
    "Wetland": 1,
    "Urban / built-up": 2,
    "Cropland": 3,
    "Grassland": 4,
    "Forest / shrub": 5,
}
gdf_samples["class_id"] = gdf_samples["class_name"].map(class_mapping)

# Convert to EE FeatureCollection
training_fc = geemap.gdf_to_ee(gdf_samples)
print(
    f"\nConverted to EE FeatureCollection with {training_fc.size().getInfo()} features"
)

Training samples per class:
class_name
Cropland            5438
Forest / shrub      1853
Urban / built-up    1496
Grassland            538
Water                414
Wetland              261

Total: 10000 samples

Converted to EE FeatureCollection with 10000 features


## 4. Train Random Forest Classifier

We'll train a Random Forest model using GEE's `ee.Classifier.smileRandomForest()`.
The workflow is:
1. Sample the image at training point locations to get feature values
2. Split into train/validation sets using a random column
3. Train the classifier
4. Evaluate on the validation set

**Note:** GEE doesn't have built-in k-fold cross-validation like sklearn. For rigorous validation, you'd either implement manual folds or export data and run sklearn locally. For this lab, we use a simple 80/20 split.

In [10]:
# Sample the image at training points
feature_bands = ["blue", "green", "red", "nir", "swir1", "swir2", "ndvi", "mndwi"]
label = "class_id"

# Extract pixel values at training points
training_data = composite.select(feature_bands).sampleRegions(
    collection=training_fc,
    properties=[label],
    scale=10,  # Native Sentinel-2 resolution
    geometries=True,
)

print(f"Training samples with features: {training_data.size().getInfo()}")

# Add random column for train/validation split
training_data = training_data.randomColumn(seed=42)

# 80/20 split
train_set = training_data.filter(ee.Filter.lt("random", 0.8))
val_set = training_data.filter(ee.Filter.gte("random", 0.8))

print(f"Training set: {train_set.size().getInfo()}")
print(f"Validation set: {val_set.size().getInfo()}")

Training samples with features: 8437
Training set: 6749
Validation set: 1688


In [11]:
# Train Random Forest classifier
classifier = ee.Classifier.smileRandomForest(
    numberOfTrees=100,
    minLeafPopulation=5,
    seed=42,
).train(
    features=train_set,
    classProperty=label,
    inputProperties=feature_bands,
)

print("Classifier trained successfully")

Classifier trained successfully


In [12]:
# Evaluate on validation set
validated = val_set.classify(classifier)

# Compute confusion matrix
confusion_matrix = validated.errorMatrix(label, "classification")

# Extract metrics
accuracy = confusion_matrix.accuracy().getInfo()
kappa = confusion_matrix.kappa().getInfo()
producers = confusion_matrix.producersAccuracy().getInfo()
consumers = confusion_matrix.consumersAccuracy().getInfo()

print("=" * 50)
print("VALIDATION RESULTS")
print("=" * 50)
print(f"Overall Accuracy: {accuracy:.4f}")
print(f"Kappa Coefficient: {kappa:.4f}")
print()
print("Producer's Accuracy (Recall) per class:")
class_names = ["Water", "Wetland", "Urban", "Cropland", "Grassland", "Forest"]
for i, name in enumerate(class_names):
    print(f"  {name}: {producers[i][0]:.4f}")
print()
print("Consumer's Accuracy (Precision) per class:")
for i, name in enumerate(class_names):
    print(f"  {name}: {consumers[0][i]:.4f}")

VALIDATION RESULTS
Overall Accuracy: 0.8086
Kappa Coefficient: 0.6649

Producer's Accuracy (Recall) per class:
  Water: 0.8621
  Wetland: 0.5750
  Urban: 0.5983
  Cropland: 0.9607
  Grassland: 0.0659
  Forest: 0.7304

Consumer's Accuracy (Precision) per class:
  Water: 0.9615
  Wetland: 0.9200
  Urban: 0.7114
  Cropland: 0.8236
  Grassland: 0.3750
  Forest: 0.8045


In [13]:
# Display confusion matrix
cm_array = confusion_matrix.getInfo()
cm_df = pd.DataFrame(
    cm_array,
    index=[f"True: {n}" for n in class_names],
    columns=[f"Pred: {n}" for n in class_names],
)
print("Confusion Matrix:")
cm_df

Confusion Matrix:


Unnamed: 0,Pred: Water,Pred: Wetland,Pred: Urban,Pred: Cropland,Pred: Grassland,Pred: Forest
True: Water,50,0,4,1,0,3
True: Wetland,0,23,1,12,1,3
True: Urban,0,0,143,76,1,19
True: Cropland,0,2,16,929,4,16
True: Grassland,0,0,22,52,6,11
True: Forest,2,0,15,58,4,214


## 5. Classify the Full Image

Now we apply our trained classifier to the entire Vienna scene. GEE handles this efficiently—classification happens server-side and only the rendered tiles are transferred to your browser.

In [14]:
# Classify the full image
classified = composite.select(feature_bands).classify(classifier)

In [15]:
# Create a new map with classification results
Map2 = geemap.Map()
Map2.centerObject(vienna_bbox, 10)

# Class visualization parameters
class_palette = [
    "0064c8",  # Water - blue
    "7a7aff",  # Wetland - light purple
    "e60000",  # Urban - red
    "ffd37f",  # Cropland - yellow
    "a8e600",  # Grassland - light green
    "267300",  # Forest - dark green
]

class_vis = {"min": 0, "max": 5, "palette": class_palette}

# Add layers
rgb_vis = {"bands": ["red", "green", "blue"], "min": 0, "max": 3000}
Map2.addLayer(composite, rgb_vis, "Sentinel-2 RGB")
Map2.addLayer(classified, class_vis, "Land Cover Classification")

# Add legend
legend_dict = {
    "Water": "0064c8",
    "Wetland": "7a7aff",
    "Urban": "e60000",
    "Cropland": "ffd37f",
    "Grassland": "a8e600",
    "Forest": "267300",
}
Map2.add_legend(title="Land Cover", legend_dict=legend_dict)

Map2

Map(center=[48.13337915930423, 16.609999999999516], controls=(WidgetControl(options=['position', 'transparent_…

## 6. Feature Importance

Random Forest provides feature importance through `classifier.explain()`. This tells us which bands and indices contributed most to the classification. Recall per lats week that this is distinct from permutation importance of SHAP, and can struggle with high cardinality and/or multi-colinear data.

In [16]:
# Get classifier explanation
explanation = classifier.explain().getInfo()

# Extract feature importance
importance = explanation.get("importance", {})
importance_df = pd.DataFrame(
    {"Feature": list(importance.keys()), "Importance": list(importance.values())}
).sort_values("Importance", ascending=False)

print("Feature Importance (Mean Decrease in Impurity):")
print("=" * 45)
for _, row in importance_df.iterrows():
    bar = "█" * int(row["Importance"] / importance_df["Importance"].max() * 30)
    print(f"{row['Feature']:8s} {row['Importance']:8.4f} {bar}")

Feature Importance (Mean Decrease in Impurity):
swir2    183.0310 ██████████████████████████████
ndvi     159.8957 ██████████████████████████
nir      144.5961 ███████████████████████
swir1    144.1539 ███████████████████████
mndwi    130.0786 █████████████████████
red      117.9535 ███████████████████
blue     112.9113 ██████████████████
green    112.7990 ██████████████████


## 7. Austria-Wide Comparison with ESA WorldCover

Earth Engine's major advantage is its scalability. Here, we'll apply our trained classifier to all of Austria, comparing our outputs to ESA WorldCover. Notice how quickly this runs--it should take less than 30 seconds to apply the classifier and render the outputs interactively. How do the results compare to WorldCover? Recall that the classes aren't exactly the same, and that our model is admittedly a toy, not the real deal. Still... not bad, all things considered.

In [20]:
# Austria bounding box
austria_bbox = ee.Geometry.Rectangle([9.5, 46.3, 17.2, 49.0])

# Load ESA WorldCover 2020 for Austria
worldcover = ee.Image("ESA/WorldCover/v100/2020").clip(austria_bbox)

# Classify all of Austria using our Vienna-trained model
austria_composite = (
    ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED")
    .filterBounds(austria_bbox)
    .filterDate("2020-03-01", "2020-03-31")
    .filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 30))
    .map(mask_clouds)
    .select(bands, band_names)
    .median()
    .clip(austria_bbox)
)

# Add indices
ndvi_austria = austria_composite.normalizedDifference(["nir", "red"]).rename("ndvi")
mndwi_austria = austria_composite.normalizedDifference(["green", "swir1"]).rename(
    "mndwi"
)
austria_composite = austria_composite.addBands([ndvi_austria, mndwi_austria])

# Classify Austria with Vienna-trained model
austria_classified = austria_composite.select(feature_bands).classify(classifier)

In [21]:
# Comparison map
Map3 = geemap.Map()
Map3.centerObject(austria_bbox, 7)

# Our classification
our_palette = [
    "0064c8",  # Water
    "7a7aff",  # Wetland
    "e60000",  # Urban
    "ffd37f",  # Cropland
    "a8e600",  # Grassland
    "267300",  # Forest
]
our_vis = {"min": 0, "max": 5, "palette": our_palette}

# WorldCover visualization (using their official palette)
wc_vis = {
    "min": 10,
    "max": 100,
    "palette": [
        "006400",  # Tree cover
        "ffbb22",  # Shrubland
        "ffff4c",  # Grassland
        "f096ff",  # Cropland
        "fa0000",  # Built-up
        "b4b4b4",  # Bare/sparse
        "f0f0f0",  # Snow/ice
        "0064c8",  # Water
        "0096a0",  # Wetland
        "00cf75",  # Mangroves
        "fae6a0",  # Moss/lichen
    ],
}


# Add layers
Map3.addLayer(austria_classified, our_vis, "Our Classification")
Map3.addLayer(worldcover, wc_vis, "ESA WorldCover 2020", shown=False)

# Add legends
our_legend = {
    "Water": "0064c8",
    "Wetland": "7a7aff",
    "Urban": "e60000",
    "Cropland": "ffd37f",
    "Grassland": "a8e600",
    "Forest": "267300",
}

wc_legend = {
    "Tree cover": "006400",
    "Shrubland": "ffbb22",
    "Grassland": "ffff4c",
    "Cropland": "f096ff",
    "Built-up": "fa0000",
    "Water": "0064c8",
    "Wetland": "0096a0",
}

Map3.add_legend(title="Our Classes", legend_dict=our_legend, position="bottomleft")
Map3.add_legend(title="WorldCover", legend_dict=wc_legend, position="bottomright")

Map3

Map(center=[47.70274662445126, 13.349999999999982], controls=(WidgetControl(options=['position', 'transparent_…