# Urban Demography - Demographic Change

- Author: Andrew Zimmer 
- Date Created: 2024-11-14 
- Last Edited:  2024-11-14 
- Version: 1.0

Description: 
- This script performs data processing and analysis on population metrics. 
- It calculates changes in population, migration, and other demographic indicators for urban centers between 2000 and 2020.

Input Files:
- './01_data/04_final_demographic_data/gudd_annual_metrics.csv'

Output Files:
- './01_data/04_final_demographic_data/gudd_change.csv': Final dataset with calculated metrics

Steps in Script
- Reads in multiple CSV files from a specified folder.
- Merges them into one DataFrame.
- Pivot the data to create a structured table with sex-age group keys as columns.
- Saves the final dataset to a CSV file.


In [1]:
# imports
import pandas as pd
from pathlib import Path

# data folder
data_folder = '../01_data/'

In [2]:
gudd_annual_metrics = pd.read_csv(Path(data_folder) / '04_final_demographic_data/01_static_boundaries/gudd_annual_metrics_static_boundaries.csv')
gudd_annual_metrics.head()

Unnamed: 0,ID_UC_G0,year,Name,Country,Continent,Development,YearOfBirth,YearOfDeath,latitude,longitude,...,old_sr,women_cr,general_fr,death_rate,deaths_total,pop_change,natural_change,migration,births,migration_annual_perc
0,4.0,2000,Papeete,French Polynesia,Oceania,High income,1980.0,2030.0,-17.555923,-149.581676,...,0.913754,368.682914,80.022962,4.559418,177.395559,-878.902679,645.486612,-1524.389292,822.882172,1.734423
1,4.0,2001,Papeete,French Polynesia,Oceania,High income,1980.0,2030.0,-17.555923,-149.581676,...,0.916727,369.59706,81.266897,4.551271,173.078471,-1561.575348,648.9879,-2210.563248,822.066371,1.415598
2,4.0,2002,Papeete,French Polynesia,Oceania,High income,1980.0,2030.0,-17.555923,-149.581676,...,0.918553,361.336994,79.091896,4.568052,166.583254,4344.895988,606.487183,3738.408805,773.070437,0.860414
3,4.0,2003,Papeete,French Polynesia,Oceania,High income,1980.0,2030.0,-17.555923,-149.581676,...,0.920591,347.200486,63.052923,4.599687,187.722073,-2610.77446,507.498412,-3118.272871,695.220485,1.194386
4,4.0,2004,Papeete,French Polynesia,Oceania,High income,1980.0,2030.0,-17.555923,-149.581676,...,0.92327,332.904904,71.268351,4.593705,175.484793,85.950363,564.858535,-478.908172,740.343328,-5.571916


In [3]:
# extract unique city info for use later
city_details = gudd_annual_metrics[["ID_UC_G0", "Name", "Country", "Continent",
                               "Development", "YearOfBirth", "YearOfDeath",
                               "latitude", "longitude"]].drop_duplicates()



In [4]:
# filter to keep only 2000 and 2020
interim_change = gudd_annual_metrics[["ID_UC_G0", "year", 
                                 "young_pop", "working_pop", "old_pop", "total_pop",
                                 "total_dr", "young_dr", "old_dr",
                                 "women_cba", "women_cr",
                                 "young_sr", "working_sr", "old_sr", "total_sr", "general_fr"]]

interim_change = interim_change[interim_change["year"].isin([2000, 2020])]


In [5]:
# calculate deltas (year-over-year changes) - group by city and calculate differences

metric_cols = ["young_pop", "working_pop", "old_pop", "total_pop",
               "total_dr", "young_dr", "old_dr",
               "women_cba", "women_cr",
               "young_sr", "working_sr", "old_sr", "total_sr", "general_fr"]

change_variables = (interim_change
                   .sort_values(["ID_UC_G0", "year"])
                   .groupby("ID_UC_G0")[metric_cols]
                   .diff()
                   .add_suffix("_Delta")
                   .reset_index())


In [6]:
# merge back in the city id values and years
change_variables["ID_UC_G0"] = interim_change["ID_UC_G0"].values
change_variables["year"] = interim_change["year"].values


In [7]:
# filter rows where at least one delta is non-NA
change_variables = change_variables.dropna(subset=[col for col in change_variables.columns if "_Delta" in col], how='all')


In [8]:
# summarize total births, deaths, and natural change
sum_variables = (gudd_annual_metrics
                .groupby("ID_UC_G0")
                .agg(sum_births=("births", "sum"),
                     sum_deaths=("deaths_total", "sum"))
                .reset_index())
sum_variables["natural_change"] = sum_variables["sum_births"] - sum_variables["sum_deaths"]

In [9]:
# merge natural change into change_variables
change_variables = change_variables.merge(sum_variables, on="ID_UC_G0", how="left")

# calculate total migration
change_variables["total_migration"] = change_variables["total_pop_Delta"] - change_variables["natural_change"]

In [10]:
# extract total population for 2020
total_pop_2020 = gudd_annual_metrics[gudd_annual_metrics["year"] == 2020][["ID_UC_G0", "total_pop"]].rename(columns={"total_pop": "total_pop_2020"}) 
change_variables = change_variables.merge(total_pop_2020, on="ID_UC_G0", how="left")


In [11]:
# calculate percentage of population change from migration
change_variables["perc_from_migration"] = (change_variables["total_migration"] / change_variables["total_pop_Delta"]) * 100

# merge in city details
change_variables = change_variables.merge(city_details, on="ID_UC_G0", how="left")

# save final csv
change_variables.to_csv(Path(data_folder) / '04_final_demographic_data/01_static_boundaries/gudd_change_2000_2020_static_boundaries.csv', index=False)

In [12]:
# summary statistics
total_pop_change = change_variables["total_pop_Delta"].sum(skipna=True)
total_natural_change = change_variables["natural_change"].sum(skipna=True)
total_migration = change_variables["total_migration"].sum(skipna=True)

# calculate percentages
natural_change_pct = (total_natural_change / total_pop_change) * 100 if total_pop_change else 0
migration_pct = (total_migration / total_pop_change) * 100 if total_pop_change else 0

# print with commas and extra info
print(f"Total population change across all cities: {total_pop_change:,.0f}")
print(f"Total natural change (births - deaths): {total_natural_change:,.0f} "
      f"({natural_change_pct:.1f}% of total change)")
print(f"Total net migration: {total_migration:,.0f} "
      f"({migration_pct:.1f}% of total change)")


Total population change across all cities: 785,987,044
Total natural change (births - deaths): 431,975,361 (55.0% of total change)
Total net migration: 354,011,682 (45.0% of total change)
