# Learning GeoPandas

## Data Preparation

### 1. Load Data
- Load two datasets:
  - `price`: dataset with price data.
  - `districts`: dataset with municipal district boundaries.

### 2. Clean Data
- Replace `"N"` in the `CENA` column with `None` and convert to `float`.


In [None]:
import pandas as pd
import geopandas as gpd

price_file_path = r"data\SED_CenovaMapa_p.shp"
districts_file_path = r"data\MAP_MESTSKECASTI_P.shp"

# Load datasets
price = gpd.read_file(price_file_path)
districts = gpd.read_file(districts_file_path)

# Convert 'N' to None
price['CENA'] = price['CENA'].replace('N', None).astype(float)

# Display the first few rows of the cleaned price dataset
price.head()

# Display the first few rows of the districts dataset
districts.head()

## Map Making

### 1. Visualize Data
- Create a map showing:
  - Price distribution.
  - Overlay of district boundaries.

### 2. Map Requirements
- Use different colors for boundaries and prices.
- Include a legend.
- Apply **CartoDB Voyager** or **CartoDB Dark Matter** basemap.


In [None]:

# Interactive map using Geodataframe explore

# Set basemap and plot price data
m = price.explore(
    column="CENA",
    cmap="viridis",
    tiles="CartoDB Voyager",
    legend=True,
    legend_kwds={
        'caption': 'Price Legend',
        'colorbar': True,
        'orientation': 'vertical',
        'interval': True
    },
    style_kwds=dict(color="white", weight=1, opacity=0.6),
)

# Add district boundaries
districts.explore(
    m=m,
    color="white",
    style_kwds=dict(weight=1, opacity=0.8),
    popup=True,
    # popup_kwds={
    #     'fields': ['DAT_VZNIK', 'DAT_ZMENA', 'PLOCHA', 'POSKYT', 'NAZEV_MC'],
    #     'aliases': ['Date of Creation', 'Date of Change', 'Area', 'Provider', 'District Name'],
    # }
)


## Measuring

### 1. Geometric Analysis
- Create convex hulls around each polygon in `price`.
- Calculate the area of these convex hulls.
- Filter convex hulls within the 40th to 60th percentile and save them.

### 2. Multi-layer Map
- Color smaller areas differently from larger ones.


In [None]:
# Create convex hull
price['convex_hull'] = price.geometry.convex_hull

# Calculate the area of convex hulls
price['convex_hull_area'] = price['convex_hull'].area

# Set 40th and 60tg quantiles values
lower_bound = price['convex_hull_area'].quantile(0.40)
upper_bound = price['convex_hull_area'].quantile(0.60)
average = price[(price['convex_hull_area'] >= lower_bound) & (price['convex_hull_area'] <= upper_bound)]

smallest = price[price['convex_hull_area'] < lower_bound]
rest = price[price['convex_hull_area'] >= lower_bound]


# Create map
m = price.explore(
    column='convex_hull_area',
    cmap='coolwarm',
    tiles='CartoDB Voyager',
    legend=True,
    legend_kwds={
        'caption': 'Convex Hull Area',
        'colorbar': True,
        'orientation': 'vertical'
    },
    style_kwds=dict(color='white', weight=1, opacity=0.6)
)

# Add district boundaries
districts.explore(
    m=m,
    color='black',
    style_kwds=dict(weight=1, opacity=0.8)
)

# Add smallest to map
smallest.explore(
    m=m,
    color='red',
    style_kwds=dict(weight=0.5, opacity=0.6, color='white'),
    legend=False
)

# Add rest areas to map
rest.explore(
    m=m,
    color='gray',
    style_kwds=dict(weight=1, opacity=0.2),
    legend=False
)

## Joining

### 1. Merge GeoDataFrames
- Use `.sjoin()` or `.overlay()` to join `price` and `districts`.

### 2. Data Analysis
- Is the mean price higher in **Praha 3** or **Praha 6**?
- Which district is the **cheapest**?
- What is the difference between the cheapest and most expensive districts?

### 3. Additional Info
- District names are in the `"NAZEV_1"` column.


In [None]:

# Spatial join price to districts using intersect
price_districts = gpd.sjoin(price, districts, how="inner", predicate="intersects")

# Calculate meann price
mean_price_per_district = price_districts.groupby('NAZEV_MC')['CENA'].mean()

# Determine if the mean price is higher in Praha 3 or Praha 6
mean_price_praha_3 = mean_price_per_district['Praha 3']
mean_price_praha_6 = mean_price_per_district['Praha 6']

district_price_higher = 'Praha 3' if mean_price_praha_3 > mean_price_praha_6 else 'Praha 6'
print(f"The district with the higher mean price is {district_price_higher}.")

# Find the cheapest district
cheapest_district = mean_price_per_district.idxmin()
cheapest_price = mean_price_per_district.min()

# Find the most expensive district
most_expensive_district = mean_price_per_district.idxmax()
most_expensive_price = mean_price_per_district.max()

# Calculate the difference between the cheapest and the most expensive district
price_difference = most_expensive_price - cheapest_price

print(f"The cheapest district is {cheapest_district} with a mean price of {round(cheapest_price, 2)}.")
print(f"The most expensive district is {most_expensive_district} with a mean price of {round(most_expensive_price, 2)}.")
print(f"The difference between the cheapest and the most expensive district is {round(price_difference, 2)}.")
