### Steps

1. Load in the pandas DataFrame with headline, countries, and cities.
  + If a headline contains multiple cities/countries, decide which single one to keep.
2. For each city/country, match the name to the latitude and longitude in `geonamescache`.
  + You can use the function `gc.get_cities_by_names_` `_(“city_name”)`.
  + Some cities will return multiple matches with the previous function in different countries. You’ll have to decide which city to keep based on a heuristic (rule of thumb).
  + If you have trouble, work with a single problematic city until you figure it out, then write a function to apply on all headlines.
3. Add longitude and latitude coordinates to your DataFrame for each headline.
  + It will be helpful to get the `countrycode` of each headline at this point.
  + If you were not able to find many countries, think about **dropping the column**. You also need to decide what to do with headlines that have no coordinates.
  + You should end up with over 600 headlines that have geographic coordinates.

### Questions

* What does "dropping the column" mean? 


In [1]:
import pandas as pd
import re
import geonamescache

In [2]:
gc = geonamescache.GeonamesCache() 

In [3]:
DF_PICK = './data/headlines_cities_countries.pkl'

In [4]:
df = pd.read_pickle(DF_PICK)

In [5]:
countries = gc.get_countries_by_names()
cities = gc.get_cities()

In [14]:
def unpack_city_results(cities):
    # TODO: rearrange city results so tht you have just an array of objects.
    out = []
    for c in cities:
        for k,v in c.items():
            out.append(v)
    return out

def pick_city(city_results):
    print(city_results)
    city_results = unpack_city_results(city_results)
    cr = sorted(city_results, key=lambda x: x['population'], reverse=True)
    print(cr)
    print("\n")
    return cr[0]
    
def find_country_coordinates(country):
    global cities

    cities_in_country = filter(lambda x: x.countrycode == cc, cities)
    cities_in_country.sort(key="population", reverse=True)
    return { k:cities_in_country[0][k] for k in ['latitude', 'longitude'] }

def find_coords(row):
    global countries
    
    if row.City:
        print(row.City)
        # LEFT OFF HERE - YOU CAN'T LOOK UP CITIES BY NAME B/C YOU REMOVED ACCENT MARKS. 
        city = pick_city(gc.get_cities_by_name(row.City))
        return { k:city[k] for k in ['latitude', 'longitude']}
    elif row.Country:
        country = countries[row.Country]
        return find_country_coordinates(country)
    return ''


In [15]:
df['geo'] = df.apply(find_coords, axis=1)

Miami
[{'4164138': {'geonameid': 4164138, 'name': 'Miami', 'latitude': 25.77427, 'longitude': -80.19366, 'countrycode': 'US', 'population': 441003, 'timezone': 'America/New_York', 'admin1code': 'FL'}}]
[{'geonameid': 4164138, 'name': 'Miami', 'latitude': 25.77427, 'longitude': -80.19366, 'countrycode': 'US', 'population': 441003, 'timezone': 'America/New_York', 'admin1code': 'FL'}]


New York City
[{'5128581': {'geonameid': 5128581, 'name': 'New York City', 'latitude': 40.71427, 'longitude': -74.00597, 'countrycode': 'US', 'population': 8175133, 'timezone': 'America/New_York', 'admin1code': 'NY'}}]
[{'geonameid': 5128581, 'name': 'New York City', 'latitude': 40.71427, 'longitude': -74.00597, 'countrycode': 'US', 'population': 8175133, 'timezone': 'America/New_York', 'admin1code': 'NY'}]


Miami Beach
[{'4164143': {'geonameid': 4164143, 'name': 'Miami Beach', 'latitude': 25.79065, 'longitude': -80.13005, 'countrycode': 'US', 'population': 92312, 'timezone': 'America/New_York', 'admin1co

IndexError: ('list index out of range', 'occurred at index 7')

In [23]:
df.head()

Unnamed: 0,Headline,City,Country,geo
0,Zika Outbreak Hits Miami,Miami,,"[{'4164138': {'geonameid': 4164138, 'name': 'M..."
1,Could Zika Reach New York City?,New York City,,"[{'5128581': {'geonameid': 5128581, 'name': 'N..."
2,First Case of Zika in Miami Beach,Miami Beach,,"[{'4164143': {'geonameid': 4164143, 'name': 'M..."
3,"Mystery Virus Spreads in Recife, Brazil",Recife,Brazil,"[{'3390760': {'geonameid': 3390760, 'name': 'R..."
4,Dallas man comes down with case of Zika,Dallas,,"[{'4684888': {'geonameid': 4684888, 'name': 'D..."


### Testing

In [25]:
gc.get_countries()


{'AD': {'geonameid': 3041565,
  'name': 'Andorra',
  'iso': 'AD',
  'iso3': 'AND',
  'isonumeric': 20,
  'fips': 'AN',
  'continentcode': 'EU',
  'capital': 'Andorra la Vella',
  'areakm2': 468,
  'population': 84000,
  'tld': '.ad',
  'currencycode': 'EUR',
  'currencyname': 'Euro',
  'phone': '376',
  'postalcoderegex': '^(?:AD)*(\\d{3})$',
  'languages': 'ca',
  'neighbours': 'ES,FR'},
 'AE': {'geonameid': 290557,
  'name': 'United Arab Emirates',
  'iso': 'AE',
  'iso3': 'ARE',
  'isonumeric': 784,
  'fips': 'AE',
  'continentcode': 'AS',
  'capital': 'Abu Dhabi',
  'areakm2': 82880,
  'population': 4975593,
  'tld': '.ae',
  'currencycode': 'AED',
  'currencyname': 'Dirham',
  'phone': '971',
  'postalcoderegex': '',
  'languages': 'ar-AE,fa,en,hi,ur',
  'neighbours': 'SA,OM'},
 'AF': {'geonameid': 1149361,
  'name': 'Afghanistan',
  'iso': 'AF',
  'iso3': 'AFG',
  'isonumeric': 4,
  'fips': 'AF',
  'continentcode': 'AS',
  'capital': 'Kabul',
  'areakm2': 647500,
  'population': 

In [7]:
gc.get_cities_by_name('Boston')

[{'2655138': {'geonameid': 2655138,
   'name': 'Boston',
   'latitude': 52.97633,
   'longitude': -0.02664,
   'countrycode': 'GB',
   'population': 41340,
   'timezone': 'Europe/London',
   'admin1code': 'ENG'}},
 {'4930956': {'geonameid': 4930956,
   'name': 'Boston',
   'latitude': 42.35843,
   'longitude': -71.05977,
   'countrycode': 'US',
   'population': 667137,
   'timezone': 'America/New_York',
   'admin1code': 'MA'}}]

In [13]:
countries['Brazil']

{'geonameid': 3469034,
 'name': 'Brazil',
 'iso': 'BR',
 'iso3': 'BRA',
 'isonumeric': 76,
 'fips': 'BR',
 'continentcode': 'SA',
 'capital': 'Brasilia',
 'areakm2': 8511965,
 'population': 201103330,
 'tld': '.br',
 'currencycode': 'BRL',
 'currencyname': 'Real',
 'phone': '55',
 'postalcoderegex': '^\\d{5}-\\d{3}$',
 'languages': 'pt-BR,es,en,fr',
 'neighbours': 'SR,PE,BO,UY,GY,PY,GF,VE,CO,AR'}

In [14]:
countries['United States']

{'geonameid': 6252001,
 'name': 'United States',
 'iso': 'US',
 'iso3': 'USA',
 'isonumeric': 840,
 'fips': 'US',
 'continentcode': 'NA',
 'capital': 'Washington',
 'areakm2': 9629091,
 'population': 310232863,
 'tld': '.us',
 'currencycode': 'USD',
 'currencyname': 'Dollar',
 'phone': '1',
 'postalcoderegex': '^\\d{5}(-\\d{4})?$',
 'languages': 'en-US,es-US,haw,fr',
 'neighbours': 'CA,MX,CU'}

In [15]:
countries['United States']['iso'] #countrycode

'US'

AttributeError: 'GeonamesCache' object has no attribute 'get_cities_by_names'