# General tasks and directions

- Add your name, today's date, and the assignment title to the designated cell.
- Write your answers in the cells that contain `Add your answer here.` line.
- Write your code in the cells that contain `# Add your implementation here.` line.
- Use autograder tests that are provided for your convenience.
- Don't change or delete any provided code (including [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) such as `%%capture output`).


## Add your name, today's date, and the assignment title

author: Ratanak Uddam Chea

date: 02/08/2023

assignment: exercise1


# World

## Description

Extract data about countries from multiple files and store it in a `dictionary` of `namedtuple` objects.

This exercise consists of several tasks:

1. Read official names and various codes from *world_codes.csv* into a dictionary `world_codes_data`.
1. Read demographic data from *world_demo.csv* into a dictionary `world_demo_data`.
1. Read geographical data from *world_geo.csv* into a dictionary `world_geo_data`.
1. Read government information from *world_gov.csv* into a dictionary `world_gov_data`.
1. Read regional location *world_regions.csv* into a dictionary `world_regions_data`.
1. Merge all the dictionaries into one big dictionary `the_world`. Note that the number of countries and territories in each of the provided files is different, so you should use *Country* value in the *world_codes.csv* as a standard name and consider UN member states **only** when merging.

All data files have headers and use comma (`,`) to separate values. You *should* use `DictReader` function from the `csv` module.

## References

- [Lists of countries and territories - Wikipedia](https://en.wikipedia.org/wiki/Lists_of_countries_and_territories)
- [List of ISO 3166 country codes - Wikipedia](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes)
- [List of countries by United Nations geoscheme - Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_by_United_Nations_geoscheme)
- [List of countries by system of government - Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_by_system_of_government)
- [List of current heads of state and government - Wikipedia](https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government)
- [List of countries and dependencies by area - Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area)
- [List of countries by population (United Nations) - Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations))


In [1]:
# Import statements

import csv  # Optional but recommended

from collections import namedtuple, defaultdict
from pprint import pprint

## Task 1

Read the data from *world_codes.csv* and store the result as a `dictionary` of dictionaries.

Use country names as keys in the `world_regions_data` dictionary and country details ("Country", "Alpha-2 code", "Alpha-3 code", "Internet ccTLD", "Numeric code", "Official state name", and "Soveraignty") as keys of the embedded dictionary.

Note that "Numeric code" must be stored as an integer number.


In [2]:
world_codes_data = dict()

with open("world_codes.csv", "r", encoding="utf-8") as f:
    # Add your implementation here.
    #try and use defaultdict to skip writing condition when inserting new object into dict
    reader = csv.DictReader(f)
    for row in reader:
        inner_dict = defaultdict(dict)
        for key, value in sorted(row.items()):
            if key == "Numeric code":
                inner_dict[key] = int(value)
            else:
                inner_dict[key] = value
        world_codes_data[row["Country"]] = inner_dict

In [3]:
assert isinstance(world_codes_data, dict)

In [4]:
assert len(world_codes_data) == 249

In [5]:
assert world_codes_data["United States of America"] == {
    'Alpha-2 code': 'US',
    'Alpha-3 code': 'USA',
    'Country': 'United States of America',
    'Internet ccTLD': '.us',
    'Numeric code': 840,
    'Official state name': 'The United States of America',
    'Sovereignty': 'UN member state'
}

In [6]:
assert world_codes_data["Svalbard"] == {
    'Alpha-2 code': 'SJ',
    'Alpha-3 code': 'SJM',
    'Country': 'Svalbard',
    'Internet ccTLD': '',
    'Numeric code': 744,
    'Official state name': 'Svalbard and Jan Mayen',
    'Sovereignty': 'Norway'
}

## Task 2

Read data from *world_demo.csv* and store the result as a `dictionary` of dictionaries.

Use country names as keys in the `world_demo_data` dictionary and country details ("Country", "Population") as keys of the embedded dictionary.

Note that "Numeric code" must be stored as an integer number.

 

In [7]:
world_demo_data = dict()

with open("world_demo.csv", "r", encoding="utf-8") as f:
    # Add your implementation here.
    reader = csv.DictReader(f)
    for row in reader:
        inner_dict = defaultdict(dict)
        for key, value in sorted(row.items()):
            if key == "Population":
                inner_dict[key] = int(value.replace(",", ""))
            else:
                inner_dict[key] = value
        world_demo_data[row["Country"]] = inner_dict

In [8]:
assert isinstance(world_demo_data, dict)

In [9]:
assert len(world_demo_data) == 233

In [10]:
assert world_demo_data["United States of America"] == {
    'Country': 'United States of America',
    'Population': 329064917
}

In [11]:
assert world_demo_data["Vatican City"] == {
    'Country': 'Vatican City',
    'Population': 799
}

## Task 3

Read data from *world_geo.csv* and store the result as a `dictionary` of dictionaries.

Use country names as keys in the `world_geo_data` dictionary and country details ("Country", "Total (km2)", "Land (km2)", "Water (km2)") as keys of the embedded dictionary.

Note that "Total (km2)", "Land (km2)", and "Water (km2)" must be stored as an integer numbers.

 

In [12]:
world_geo_data = dict()

with open("world_geo.csv", "r", encoding="utf-8") as f:
    # Add your implementation here.
    reader = csv.DictReader(f)
    for row in reader:
        inner_dict = defaultdict(dict)
        for key, value in sorted(row.items()):
            if key == "Land (km2)" or key == "Total (km2)" or key == "Water (km2)":
                if value == "" or value == "-":
                    inner_dict[key] = value
                else:
                    inner_dict[key] = int(value.replace(",", ""))
            else:
                inner_dict[key] = value
        world_geo_data[row["Country"]] = inner_dict

In [13]:
assert isinstance(world_geo_data, dict)

In [14]:
assert len(world_geo_data) == 261

In [15]:
assert world_geo_data["United States of America"] == {
    'Country': 'United States of America',
    'Land (km2)': 9147593,
    'Total (km2)': 9525067,
    'Water (km2)': 377424
}

In [16]:
assert world_geo_data["Saint Barthélemy"] == {
    'Country': 'Saint Barthélemy',
    'Land (km2)': '',
    'Total (km2)': 21,
    'Water (km2)': ''
}

## Task 4

Read data from *world_gov.csv* and store the result as a `dictionary` of dictionaries.

Use country names as keys in the `world_gov_data` dictionary and country details ("Country", "Constitutional form", "Head of state", "Capital") as keys of the embedded dictionary.
 

In [17]:
world_gov_data = dict()

with open("world_gov.csv", "r", encoding="utf-8") as f:
    # Add your implementation here.
    reader = csv.DictReader(f)
    for row in reader:
        inner_dict = defaultdict(dict)
        for key, value in sorted(row.items()):
            inner_dict[key] = value
        world_gov_data[row["Country"]] = inner_dict

In [18]:
assert isinstance(world_gov_data, dict)

In [19]:
assert len(world_gov_data) == 195

In [20]:
assert world_gov_data["United States of America"] == {
    'Capital': 'Washington D.C.',
    'Constitutional form': 'Republic',
    'Country': 'United States of America',
    'Head of state': 'Joe Biden'
}

In [21]:
assert world_gov_data["Cape Verde"] == {
    'Capital': 'Praia',
    'Constitutional form': 'Republic',
    'Country': 'Cape Verde',
    'Head of state': 'José Maria Neves'
}

## Task 5

Read data from *world_regions.csv* and store the result as a `dictionary` of dictionaries.

Use country names as keys in the `world_regions_data` dictionary and country details ("Country", "Region", "Continent") as keys of the embedded dictionary.
 

In [22]:
world_regions_data = dict()

with open("world_regions.csv", "r", encoding="utf-8") as f:
    # Add your implementation here.
    reader = csv.DictReader(f)
    for row in reader:
        inner_dict = defaultdict(dict)
        for key, value in sorted(row.items()):
            inner_dict[key] = value
        world_regions_data[row["Country"]] = inner_dict

In [23]:
assert isinstance(world_regions_data, dict)

In [24]:
assert len(world_regions_data) == 247

In [25]:
assert world_regions_data["United States of America"] == {
    'Continent': 'North America',
    'Country': 'United States of America',
    'Region': 'Northern America'
}

## Taks 6

Store all the **UN member states** data as a single dictionary of `namedtuple` objects, `the_world`.

The must be the following relation between the previously gathered country properties and attributes of the named tuple `Country`:

- "Country": "name"
- "Official state name": "official_name"
- "Sovereignty": "sovereignty"
- "Constitutional form": "constitutional_form"
- "Head of state": "head_of_state"
- "Population",
- "Capital": "capital"
- "Region" : "region"
- "Continent": "continent"
- "Land (km2)": "land_area"
- "Water (km2)": "water_area"
- "Total (km2)": "total_area"
- "Alpha-2 code": "alpha2_code"
- "Alpha-3 code": "alpha3_code",
- "Numeric code": "numeric_code"
- "Internet ccTLD": "ccTLD"

Any missing value or an empty string must be replaced with `None` in the named tuple.

Note that the names of some countries differ between data files (e.g. *Cabo Verde* vs *Cape Verde*), so you must use country names found in the *world_codes.csv* as the dictionary keys.


In [26]:
assert world_regions_data["Tokelau"] == {
    'Continent': 'Oceania',
    'Country': 'Tokelau',
    'Region': 'Polynesia'
}

In [27]:
# named tuple Country has 16 attributes

Country = namedtuple("Country", [
    "name",
    "official_name",
    "sovereignty",
    "constitutional_form",
    "head_of_state",
    "population",
    "capital",
    "region",
    "continent",
    "landarea",
    "water_area",
    "total_area",
    "alpha2_code",
    "alpha3_code",
    "numeric_code",
    "ccTLD",
])


In [165]:
the_world = dict()

# Add your implementation here.
for country, info in world_codes_data.items():
    if info["Sovereignty"] == "UN member state":
        gov_data = world_gov_data.get(country, {})
        demo_data = world_demo_data.get(country, {})
        regions_data = world_regions_data.get(country, {})
        geo_data = world_geo_data.get(country, {})
        
        the_world[country] = Country(
            name=country,
            official_name=info.get("Official state name"),
            sovereignty=info.get("Sovereignty"),
            constitutional_form=gov_data.get("Constitutional form"),
            head_of_state=gov_data.get("Head of state"),
            population=demo_data.get("Population"),
            capital=gov_data.get("Capital"),
            region=regions_data.get("Region"),
            continent=regions_data.get("Continent"),
            landarea=geo_data.get("Land (km2)"),
            water_area=geo_data.get("Water (km2)"),
            total_area=geo_data.get("Total (km2)"),
            alpha2_code=info.get("Alpha-2 code"),
            alpha3_code=info.get("Alpha-3 code"),
            numeric_code=info.get("Numeric code"),
            ccTLD=info.get("Internet ccTLD"),
        )

In [166]:
assert len(the_world) == 193

In [167]:
assert the_world["Cabo Verde"] == Country(
    name='Cabo Verde',
    official_name='The Republic of Cabo Verde',
    sovereignty='UN member state',
    constitutional_form=None,
    head_of_state=None,
    population=None,
    capital=None,
    region='Western Africa',
    continent='Africa',
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='CV',
    alpha3_code='CPV',
    numeric_code=132,
    ccTLD='.cv'
)

In [168]:
assert the_world["Congo"] == Country(
    name='Congo',
    official_name='The Republic of the Congo',
    sovereignty='UN member state',
    constitutional_form=None,
    head_of_state=None,
    population=5380508,
    capital=None,
    region=None,
    continent=None,
    landarea=341500,
    water_area=500,
    total_area=342000,
    alpha2_code='CG',
    alpha3_code='COG',
    numeric_code=178,
    ccTLD='.cg'
)

In [170]:
assert the_world["Côte d'Ivoire"] == Country(
    name="Côte d'Ivoire",
    official_name="The Republic of Côte d'Ivoire",
    sovereignty='UN member state',
    constitutional_form='Republic',
    head_of_state='Alassane Ouattara',
    population=None,
    capital='Yamoussoukro',
    region='Western Africa',
    continent='Africa',
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='CI',
    alpha3_code='CIV',
    numeric_code=384,
    ccTLD='.ci'
)

In [171]:
assert the_world["Democratic Republic of the Congo"] == Country(
    name='Democratic Republic of the Congo',
    official_name='The Democratic Republic of the Congo',
    sovereignty='UN member state',
    constitutional_form='Republic',
    head_of_state='Félix Tshisekedi',
    population=86790567,
    capital='Kinshasa',
    region='Middle Africa',
    continent='Africa',
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='CD',
    alpha3_code='COD',
    numeric_code=180,
    ccTLD='.cd'
)

In [172]:
assert the_world["Republic of Korea"] == Country(
    name='Republic of Korea',
    official_name='The Republic of Korea',
    sovereignty='UN member state',
    constitutional_form='Republic',
    head_of_state='Moon Jae-in',
    population=None,
    capital='Seoul',
    region='Eastern Asia',
    continent='Asia',
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='KR',
    alpha3_code='KOR',
    numeric_code=410,
    ccTLD='.kr'
)

In [173]:
assert the_world["Timor-Leste"] == Country(
    name='Timor-Leste',
    official_name='The Democratic Republic of Timor-Leste',
    sovereignty='UN member state',
    constitutional_form='Republic',
    head_of_state='Francisco Guterres',
    population=1293119,
    capital='Dili',
    region='South-eastern Asia',
    continent='Asia',
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='TL',
    alpha3_code='TLS',
    numeric_code=626,
    ccTLD='.tl'
)

In [174]:
assert the_world["United Kingdom of Great Britain and Northern Ireland"] == Country(
    name='United Kingdom of Great Britain and Northern Ireland',
    official_name='The United Kingdom of Great Britain and Northern Ireland',
    sovereignty='UN member state',
    constitutional_form=None,
    head_of_state=None,
    population=None,
    capital=None,
    region=None,
    continent=None,
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='GB',
    alpha3_code='GBR',
    numeric_code=826,
    ccTLD='.uk'
)


In [175]:
assert the_world["United States of America"] == Country(
    name='United States of America',
    official_name='The United States of America',
    sovereignty='UN member state',
    constitutional_form='Republic',
    head_of_state='Joe Biden',
    population=329064917,
    capital='Washington D.C.',
    region='Northern America',
    continent='North America',
    landarea=9147593,
    water_area=377424,
    total_area=9525067,
    alpha2_code='US',
    alpha3_code='USA',
    numeric_code=840,
    ccTLD='.us'
)

In [176]:
assert the_world["Viet Nam"] == Country(
    name='Viet Nam',
    official_name='The Socialist Republic of Viet Nam',
    sovereignty='UN member state',
    constitutional_form=None,
    head_of_state=None,
    population=None,
    capital=None,
    region=None,
    continent=None,
    landarea=None,
    water_area=None,
    total_area=None,
    alpha2_code='VN',
    alpha3_code='VNM',
    numeric_code=704,
    ccTLD='.vn'
)

Done!

## Submission Checklist

- [ ] Your name, today's date, and the assignment title in the designated cell.
- [ ] Your answers in the designated cells (if required).
- [ ] Your code runs and produces the expected output.
- [ ] The validity of your code is verified by autograders (if provided).
- [ ] Restart the kernel and run all cells (in the menubar, select *Kernel*, then *Restart Kernel and Run All Cells*).
- [ ] Save the notebook.
- [ ] Submit the assignment.
