## Geocode Provinces (Deprecated)

Purpose: Legacy workflow for assigning geographic coordinates (latitude, longitude) to each province using the Nominatim (OpenStreetMap) API. Kept only for historical reference.

- Read the province list from the input CSV.
- Use geocode_location to query Nominatim and retrieve coordinates for each province.
- Handle network failures and empty-result cases.
- Save the results, including province names and coordinates, to geo_provinces.csv.

### 1. Setup


In [1]:
import pandas as pd
import requests
import time
from pathlib import Path
from tqdm import tqdm

### 2. Paths


In [2]:
project_root = Path.cwd()
input_csv_path = project_root / Path("..") / "resources" / "vietnam_provinces.csv"
output_csv_path = project_root / Path("..") / "resources" / "vietnam_provinces_geocoded.csv"

print(f"Input file path: {input_csv_path.resolve()}")
print(f"Output file path: {output_csv_path.resolve()}")

Input file path: /home/tan/geo-weather-lake/resources/vietnam_provinces.csv
Output file path: /home/tan/geo-weather-lake/resources/vietnam_provinces_geocoded.csv


### 3. Nominatim API URL


In [3]:
NOMINATIM_API_URL = "https://nominatim.openstreetmap.org/search"

### 4. Load provinces CSV


In [4]:
df_provinces = pd.read_csv(input_csv_path)
df_provinces

Unnamed: 0,name
0,Thành phố Hải Phòng
1,Thành phố Hồ Chí Minh
2,Cà Mau
3,Gia Lai
4,Đồng Tháp
5,Thành phố Cần Thơ
6,Lâm Đồng
7,An Giang
8,Quảng Ngãi
9,Quảng Trị


### 5. Geocoding function


In [5]:
def geocode_location(query_string: str) -> tuple:
    params = {
        'q': query_string,
        'format': 'json',
        'addressdetails': 1,
        'limit': 1
    }

    headers = {
        'User-Agent': 'weather-de'
    }

    try:
        response = requests.get(NOMINATIM_API_URL, params=params, headers=headers)

        if response.status_code == 200:
            results = response.json()
            if results:
                top_result = results[0]
                lat = float(top_result.get('lat'))
                lon = float(top_result.get('lon'))
                return lat, lon
            else:
                print(f"No results found for query: {query_string}")
                return None, None
        else:
            print(f"API request failed for '{query_string}' with status code {response.status_code}")
            return None, None
    except requests.exceptions.RequestException as e:
        print(f"A network error occurred: {e}")
        return None, None

In [6]:
test_province = "Thủ đô Hà Nội"
test_lat, test_lon = geocode_location(test_province)
print(f"Test geocoding for {test_province}: Latitude={test_lat}, Longitude={test_lon}")

Test geocoding for Thủ đô Hà Nội: Latitude=21.0151214, Longitude=105.832535


### 6. Batch Geocoding


In [7]:
geocoded_res = []
for i, row in tqdm(df_provinces.iterrows(), total=len(df_provinces)):
    province = row['name']
    query = f"{province}, Việt Nam"
    lat, lon = geocode_location(query)
    geocoded_res.append({'latitude': lat, 'longitude': lon})
    time.sleep(1.1)

100%|██████████| 34/34 [01:21<00:00,  2.41s/it]


In [8]:
df_coords = pd.DataFrame(geocoded_res)
geocoded_res


[{'latitude': 20.8623278, 'longitude': 106.6799266},
 {'latitude': 10.7755254, 'longitude': 106.7021047},
 {'latitude': 9.0180177, 'longitude': 105.0869724},
 {'latitude': 14.0201373, 'longitude': 108.6354524},
 {'latitude': 10.425183, 'longitude': 105.9271362},
 {'latitude': 10.0362046, 'longitude': 105.7872656},
 {'latitude': 11.6614957, 'longitude': 108.1335279},
 {'latitude': 10.3188672, 'longitude': 105.0432488},
 {'latitude': 14.9953739, 'longitude': 108.691729},
 {'latitude': 17.2166964, 'longitude': 106.9548246},
 {'latitude': 20.6065846, 'longitude': 106.2843471},
 {'latitude': 22.7426936, 'longitude': 106.1060926},
 {'latitude': 22.3069302, 'longitude': 104.1829592},
 {'latitude': 20.2421142, 'longitude': 105.9746207},
 {'latitude': 12.2980751, 'longitude': 108.9950386},
 {'latitude': 21.9610968, 'longitude': 105.8440789},
 {'latitude': 11.0800527, 'longitude': 106.2610531},
 {'latitude': 21.2276769, 'longitude': 104.1575944},
 {'latitude': 19.1976001, 'longitude': 105.060676

### 7. Combine results and check failed Geocodes


In [9]:
df_geocoded = pd.concat([df_provinces.reset_index(drop=True), df_coords], axis=1)
df_geocoded

Unnamed: 0,name,latitude,longitude
0,Thành phố Hải Phòng,20.862328,106.679927
1,Thành phố Hồ Chí Minh,10.775525,106.702105
2,Cà Mau,9.018018,105.086972
3,Gia Lai,14.020137,108.635452
4,Đồng Tháp,10.425183,105.927136
5,Thành phố Cần Thơ,10.036205,105.787266
6,Lâm Đồng,11.661496,108.133528
7,An Giang,10.318867,105.043249
8,Quảng Ngãi,14.995374,108.691729
9,Quảng Trị,17.216696,106.954825


In [10]:
failed_count = df_geocoded['latitude'].isnull().sum()
failed_count

0

### 8. Save Geocoded CSV


In [11]:
df_geocoded.to_csv(output_csv_path, index=False)