<a href="https://colab.research.google.com/github/werowe/deepstate-map-data/blob/main/class/annotated_DeepStateMap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here is an extensive explanation of the logic and techniques used in your Jupyter Notebook.

Following the explanation, I have provided a **Python script** that you can run to automatically generate the new, heavily annotated .ipynb file.

### **Part 1: Detailed Code Explanation**

This notebook performs a sophisticated geospatial Extract-Transform-Load (ETL) process. It scrapes live war-zone data, cleanses geometry, verifies data freshness, validates topology, and exports the result.

#### **1\. Data Extraction (The API Hack)**

The script essentially "hacks" the DeepStateMap frontend by calling its backend API directly.

* **Discovery:** The markdown explains using the Browser's "Network Tab" to find hidden API calls (XHR/Fetch).  
* **Request:** It uses requests.get() to hit https://deepstatemap.live/api/history/last.  
* **Significance:** Instead of scraping HTML (which is messy), this retrieves the raw GeoJSON data used to render the map, providing the highest possible fidelity.

#### **2\. Geometry Flattening (The 3D to 2D Fix)**

The raw data comes in (Longitude, Latitude, Altitude) or (X, Y, Z) format.

* **The Problem:** Most 2D mapping libraries (like Matplotlib or standard Shapefiles) struggle with the Z-axis (Altitude), or it causes errors during topological operations (like intersections).  
* **The Solution:** The code uses a serialization trick:  
  Python

```

wkt.loads(wkt.dumps(shape(geom), output_dimension=2))
```

*   
  It converts the shape to a text representation (WKT), forces it to 2 dimensions (dropping Z), and converts it back to a geometric object.

#### **3\. Data Parsing & Translation**

The name field in the raw data contains a string separated by /// (e.g., "Ukrainian Text /// English Text /// Code").

* **Logic:** A custom function extract\_first\_part splits this string and grabs index 1 (the English text) to make the dataset internationally readable.

#### **4\. Geospatial Filtering**

The dataset contains mixed types: Points (cities, events) and Polygons (territories).

* **Type Filtering:** gdf.geometry.apply(lambda x: isinstance(x, Polygon)) creates a mask to keep only territorial shapes, discarding points.  
* **Attribute Filtering:** It filters the dataframe for specific keys: \['CADR and CALR', 'Occupied', 'Occupied Crimea'\]. This isolates the Russian-controlled territories from liberated or contested zones.

#### **5\. Data Verification (The "Contains" Check)**

To ensure the map isn't stale, the script performs a "Point-in-Polygon" test.

* **Control Point:** A hardcoded coordinate (\_latest\_rus\_advance) representing a very recent frontline change is created.  
* **Logic:** occupied\_ua\_gdf\_raw.contains(point) checks if this new point physically sits inside the downloaded polygons. If True, the map data is current.

#### **6\. Topological Merging & Repair (The Buffer Trick)**

This is the most advanced part of the script.

* **Union:** union\_all() merges hundreds of small polygons into one giant shape (Multipolygon).  
* **Artifacts:** Merging often creates "slivers" or "ghost gaps" due to floating-point math errors at polygon boundaries.  
* **The Fix (Buffer/Debuffer):**  
  * **Buffer (+$\\epsilon$):** Expands the shape outward by a tiny fraction ($0.000009$). This forces edges to overlap and snaps gaps shut.  
  * **Buffer (-$\\epsilon$):** Shrinks the shape back by the same amount.  
  * **Result:** The general shape is preserved, but internal artifacts/holes are healed.

#### **7\. Validation & CRS Projection**

* **Projection:** Coordinates are transformed from EPSG:4326 (Lat/Lon, measured in degrees) to EPSG:9835 (a local projection for Donetsk/Ukraine, measured in meters).  
* **Area Check:** It calculates the area of the original shape vs. the healed shape. The difference is 0.000027%, proving the repair altered the geometry only microscopically (removing artifacts) without corrupting the data.

---

### **Part 2: Generator Script**

Run the following Python block. It will create a file named annotated\_DeepStateMap.ipynb in your current directory. This new file contains your original code, but broken down with Markdown cells explaining every step in detail, and inline comments for complex lines.

# DeepStateMap Geospatial ETL Pipeline

## 1. Environment Setup
Importing necessary libraries for HTTP requests (`requests`), handling geospatial dataframes (`geopandas`), geometric manipulation (`shapely`), and visualization (`matplotlib`).

In [None]:
import requests
import geopandas as gpd
from shapely.geometry import shape
from shapely.geometry import Polygon
from shapely.geometry import Point
from shapely import wkt
from shapely.geometry import JOIN_STYLE
import matplotlib.pyplot as plt

## 2. API Extraction Strategy
Instead of scraping HTML, we target the backend API directly. This provides the raw GeoJSON data used by the website to render the map.

**Note:** This endpoint (`api/history/last`) retrieves the current state of the map.

In [None]:
url = 'https://deepstatemap.live/api/history/last'

# Execute GET request to fetch raw map state
response = requests.get(url)

# Check HTTP 200 (Success)
response.status_code

## 3. Data Exploration & parsing
We parse the JSON response. The data is nested: `id` -> `map` -> `features`. The `features` list contains the actual geometric data.

In [None]:
deep_state_data_raw = response.json()

# Inspect top-level keys to ensure structure matches expectations
print(deep_state_data_raw.keys())
print(deep_state_data_raw['map'].keys())

## 4. Geometry flattening (3D to 2D)
**Critical Step:** The API returns coordinates in `(x, y, z)` format (Longitude, Latitude, Altitude). Most 2D analysis tools fail if Z-coordinates are present.

We loop through the features and use `shapely.wkt` to serialize the shape to text while forcing `output_dimension=2`, effectively dropping the Z-axis.

In [None]:
geo_list = []

for f in deep_state_data_raw['map']['features']:
    geom = f['geometry']
    name = f['properties']['name']

    # Convert raw dict to Shape object -> Dump to WKT (Text) forcing 2D -> Load back to Shape
    # This effectively strips the 'Z' coordinate
    clean_geometry = wkt.loads(wkt.dumps(shape(geom), output_dimension=2))

    new_feature = {
      "name": name,
      "geometry": clean_geometry
    }

    geo_list.append(new_feature)

# Check total count of features retrieved
print(f"Total features extracted: {len(geo_list)}")

## 5. Metadata Translation
The `name` field is formatted as `Ukrainian /// English /// Code`. We create a utility to split this string and retain only the English label.

In [None]:
def extract_first_part(name, part=0):
    """Splits the DeepState name string and returns the specific index requested."""
    first_part = name.split('///')[part].strip()
    return first_part

# Apply translation to all items in the list (Index 1 = English)
for item in geo_list:
    item['name'] = extract_first_part(item['name'], part=1)

# Verify the first item is now English
print(geo_list[0])

## 6. GeoDataFrame Construction
We convert the list of dicts into a `GeoDataFrame`. We explicitly set the CRS (Coordinate Reference System) to **EPSG:4326** (WGS84 - Standard Latitude/Longitude).

In [None]:
raw_deepstatemap_gdf = gpd.GeoDataFrame(geo_list)
raw_deepstatemap_gdf = raw_deepstatemap_gdf.set_crs(4326)

# Visual inspection of the raw data
raw_deepstatemap_gdf.plot()
plt.title("Raw Extracted Features")
plt.show()

## 7. Data Cleaning & Filtering
The dataset contains mixed geometries (Points for events/cities, Polygons for territories). We filter to keep **only Polygons**.

In [None]:
# Filter: Keep rows where geometry is a Polygon
deepstatemap_gdf = raw_deepstatemap_gdf[
    raw_deepstatemap_gdf.geometry.apply(lambda x: isinstance(x, Polygon))
].copy()

print(f"Polygons remaining: {len(deepstatemap_gdf)}")

## 8. Identifying Occupied Territories
We filter specifically for names associated with Russian occupation (`Occupied`, `Occupied Crimea`, etc). This discards 'Liberated' or 'Grey Zone' areas.

In [None]:
targets = ['CADR and CALR', 'Occupied', 'Occupied Crimea']

occupied_ua_gdf_raw = deepstatemap_gdf[
    deepstatemap_gdf['name'].isin(targets)
].copy().reset_index()

occupied_ua_gdf_raw.plot(cmap='viridis')
plt.title("Filtered Occupied Territories")
plt.show()

## 9. Data Freshness Verification (Point-in-Polygon)
To validate the data is not stale, we define a point (`_latest_rus_advance`) known to be recently occupied.

We use the `.contains()` method. If the point lies inside our polygon set, the map data includes the latest updates.

In [None]:
# Define a control point (Lat/Lon)
_latest_rus_advance = Point(37.50500679016114, 48.21177662359289)

# Check if this point exists within any of the occupied polygons
is_up_to_date = occupied_ua_gdf_raw.contains(_latest_rus_advance).any()

print(f"Data Freshness Confirmed: {is_up_to_date}")

# Visual Verification
fig, ax = plt.subplots()
occupied_ua_gdf_raw.plot(ax=ax, alpha=0.5)
ax.scatter(_latest_rus_advance.x, _latest_rus_advance.y, color='red', marker='o', label='Latest Advance')

# Zoom in on the point
buffer = 0.05
ax.set_xlim(_latest_rus_advance.x - buffer, _latest_rus_advance.x + buffer)
ax.set_ylim(_latest_rus_advance.y - buffer, _latest_rus_advance.y + buffer)
plt.legend()
plt.show()

## 10. Topology Repair (The Buffer Trick)
We merge all individual polygons into one `MultiPolygon` using `union_all()`. However, computational math often leaves microscopic gaps (slivers) between touching polygons.

**The Fix:**
1. **Buffer (+epsilon):** Expand the shape slightly to overlap and close gaps.
2. **Buffer (-epsilon):** Shrink the shape back to original size.

We use `JOIN_STYLE.mitre` to preserve sharp corners.

In [None]:
# 1. Create a single fused geometry
occupied_ua_gsr = gpd.GeoSeries(occupied_ua_gdf_raw.union_all(), crs=4326)

# 2. Apply Buffer/Debuffer trick to remove artifacts
eps = 0.000009 # Epsilon value (tiny distance)

occupied_ua = (
    occupied_ua_gsr
    .buffer(eps, 1, join_style=JOIN_STYLE.mitre)  # Expand
    .buffer(-eps, 1, join_style=JOIN_STYLE.mitre) # Shrink
    .to_crs(4326)
    .copy()
)

# Plot boundaries to ensure no internal artifacts remain
occupied_ua.boundary.plot()
plt.title("Cleaned Geometry Boundaries")
plt.show()

## 11. Statistical Validation
Did the buffer trick distort the map? We check the area difference.

**Note:** We calculate area using `EPSG:9835` (UCS-2000), a projection system specifically accurate for the Donetsk region, rather than using generic Lat/Lon degrees.

In [None]:
# Calculate area difference between raw and cleaned versions using local projection
area_cleaned = occupied_ua.to_crs(9835).area.sum()
area_raw = occupied_ua_gsr.to_crs(9835).area.sum()

pct_diff = abs((area_cleaned - area_raw) / area_raw) * 100

print(f"Area Difference: {pct_diff:.6f}%")
if pct_diff < 0.01:
    print("Validation Passed: Topology repair did not distort geometry.")

## 12. Export
Saving the finalized, cleaned, and verified geometry to GeoJSON.

In [None]:
occupied_ua.to_crs(4326).to_file("occupied-areas-ua.geojson", driver="GeoJSON")
print("Export Complete.")