# Get Cities of The World Quality of Life Data

In this exercise, we will try to get scoring information about the quality of life for several cities around the world. 🌍

For this exercise, we will be using the following API:

- <a href="https://developers.teleport.org/api/getting_started/" target="_blank">Teleport</a>

We will also need to use a website called RandomList.com that will give us a random cities around the world to get a scoring. 

Then we will store the data we got into an S3 Bucket! 

Quite a project, right? 🥵

🥰🥰 You'll learn a lot during this exercise 🥰🥰 

So let's go 💪💪💪

## Part 1: Get data for 1 City 

To simplify this exercise, let's start by trying to scrape data for only 1 city: Paris. In another part, we'll try to get scores for 100 different cities.

- Import the library called `requests`:

In [40]:
import requests
import pandas as pd

* Check teleport's API, to find a way to search information on Paris. Especially, we would need its `geonameid`

  * Here is the link for the documentation 👉👉👉 [Teleport API](https://developers.teleport.org/api/getting_started/)

In [41]:
paris_search = requests.get("https://api.teleport.org/api/cities/?search=paris").json()

ℹ️ℹ️You should get the following result ℹ️ℹ️

In [42]:
display(paris_search)

{'_embedded': {'city:search-results': [{'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/'}},
    'matching_alternate_names': [{'name': 'Paris'},
     {'name': 'paris'},
     {'name': 'Parisi'}],
    'matching_full_name': 'Paris, Île-de-France, France'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:4717560/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Paris, Texas, United States'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:3489854/'}},
    'matching_alternate_names': [],
    'matching_full_name': 'Kingston, Kingston, Jamaica'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:966166/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Parys, Orange Free State, South Africa (Paris)'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geoname

* Now that you got the a list of search results, try to isolate Paris' `geonameid`

In [43]:
def extract_city(json, country):
    json = json['_embedded']['city:search-results']
    for city in json:
        if country.lower() in city['matching_full_name'].lower():
            return city["_links"]['city:item']['href']
paris_url = extract_city(paris_search, 'France')
display(paris_url)

'https://api.teleport.org/api/cities/geonameid:2988507/'

* Use `requests` to get information about Paris 

In [44]:
paris = requests.get(paris_url).json()
display(paris)

{'_links': {'city:admin1_division': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/admin1_divisions/geonames:11/',
   'name': 'Île-de-France'},
  'city:alternate-names': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/alternate_names/'},
  'city:country': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/',
   'name': 'France'},
  'city:timezone': {'href': 'https://api.teleport.org/api/timezones/iana:Europe%2FParis/',
   'name': 'Europe/Paris'},
  'city:urban_area': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/',
   'name': 'Paris'},
  'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templa

In [45]:
paris_ql_url = paris["_links"]["city:urban_area"]["href"]
display(paris_ql_url)

'https://api.teleport.org/api/urban_areas/slug:paris/'

* You should now be able to get Paris' quality of life scores 

In [46]:
paris_ql = requests.get(paris_ql_url).json()
display(paris_ql)

{'_links': {'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Country/#!/relations/{rel}/',
    'name': 'country',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Admin1Division/#!/relations/{rel}/',
    'name': 'a1',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Timezone/#!/relations/{rel}/',
    'name': 'tz',
    'templated': True}],
  'self': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/'},
  'ua:admin1-divisions': [{'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/admin1_

* Use `Pandas` to create a DataFrame where you'll get all the scores for Paris 

In [72]:
scores = requests.get(paris_ql["_links"]["ua:scores"]["href"]).json()
df = pd.DataFrame(scores["categories"])
df.head()

Unnamed: 0,color,name,score_out_of_10
0,#f3c32c,Housing,3.5835
1,#f3d630,Cost of Living,3.664
2,#f4eb33,Startups,9.2765
3,#d2ed31,Venture Capital,7.513
4,#7adc29,Travel Connectivity,10.0


* We now need to upload this DataFrame to S3. Let's first create a Boto3 session 
  * For the following, refer to the following documentation 👉👉👉 [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html)

In [9]:
!pip install Boto3

Collecting Boto3
  Downloading boto3-1.16.43-py2.py3-none-any.whl (130 kB)
[K     |████████████████████████████████| 130 kB 7.9 MB/s eta 0:00:01
[?25hCollecting botocore<1.20.0,>=1.19.43
  Downloading botocore-1.19.43-py2.py3-none-any.whl (7.2 MB)
[K     |████████████████████████████████| 7.2 MB 15.5 MB/s eta 0:00:01     |███████▏                        | 1.6 MB 15.5 MB/s eta 0:00:01
[?25hCollecting jmespath<1.0.0,>=0.7.1
  Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting s3transfer<0.4.0,>=0.3.0
  Using cached s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Installing collected packages: jmespath, botocore, s3transfer, Boto3
Successfully installed Boto3-1.16.43 botocore-1.19.43 jmespath-0.10.0 s3transfer-0.3.3


In [58]:
import boto3

* Now create a resource session 

In [59]:
s3 = boto3.resource("s3")

* Create a Bucket that you'll call `scoring-cities-in-the-world`

In [60]:
s3.create_bucket(Bucket="tibo-scoring-cities-in-the-world")

s3.Bucket(name='tibo-scoring-cities-in-the-world')

* Use `Pandas` to export your DataFrame as a csv file

In [74]:
from io import StringIO
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)

* Use `put_object()` function to create an Object within the bucket you just created 

In [75]:
bucket = s3.Bucket("tibo-scoring-cities-in-the-world")
bucket.put_object(Key="paris.csv", Body=csv_buffer.getvalue())

s3.Object(bucket_name='tibo-scoring-cities-in-the-world', key='paris.csv')

## Get Data For Several Cities 

😉 Congrats ! 😉 You made it to the second part of the exercise. We now need more data to be able to compare them later. Let's try to find a way to get data for a lot more cities 

* Go on to [this Wikipedia page](https://en.wikipedia.org/wiki/List_of_largest_cities). There you'll find a list of the world's largest cities.
  * Use `scrapy` to scrape the city names directly from this page 😎

* Read the json file with results from the crawling :

In [48]:
cities_df = pd.read_json("cities_ranking/cities_ranking.json")
cities_df.head()

Unnamed: 0,cities,countries
0,Tokyo,Japan
1,Delhi,India
2,Shanghai,China
3,São Paulo,Brazil
4,Mexico City,Mexico


* Finally, create a loop that will go through each city, search for information and store it to your S3 bucket 
  * You might get some errors, definitely use the `try: \ except:` structure 
  * (It's totally fine if you couldn't get info for all cities) 😌😌

In [89]:
def get_city_life_quality(city, country):
    city_url = extract_city(requests.get("https://api.teleport.org/api/cities/?search=" + city.lower()).json(), country)
    city_dict = requests.get(city_url).json()
    city_ql = requests.get(city_dict["_links"]["city:urban_area"]["href"]).json()
    city_score = requests.get(city_ql["_links"]["ua:scores"]["href"]).json()
    csv_buffer = StringIO()
    pd.DataFrame(city_score["categories"]).to_csv(csv_buffer, index=False)
    bucket = s3.Bucket("tibo-scoring-cities-in-the-world")
    bucket.put_object(Key=city.lower().replace(" ", "_").replace(".", "").replace(",", "") + ".csv", Body=csv_buffer.getvalue())

In [90]:
bucket.objects.all().delete()
for index, row in cities_df.iterrows():
    try:
        get_city_life_quality(row["cities"], row["countries"])
        display(row["cities"] + " done!")
    except:
        display("Couldn't find results for " + row["cities"])

'Tokyo done!'

'Delhi done!'

'Shanghai done!'

'São Paulo done!'

'Mexico City done!'

'Cairo done!'

'Mumbai done!'

'Beijing done!'

"Couldn't find results for Dhaka"

'Osaka done!'

'New York City done!'

"Couldn't find results for Karachi"

'Buenos Aires done!'

"Couldn't find results for Chongqing"

'Istanbul done!'

"Couldn't find results for Kolkata"

'Manila done!'

'Lagos done!'

'Rio de Janeiro done!'

"Couldn't find results for Tianjin"

"Couldn't find results for Kinshasa"

'Guangzhou done!'

'Los Angeles done!'

'Moscow done!'

'Shenzhen done!'

"Couldn't find results for Lahore"

'Bangalore done!'

'Paris done!'

'Bogotá done!'

'Jakarta done!'

'Chennai done!'

'Lima done!'

'Bangkok done!'

'Seoul done!'

"Couldn't find results for Nagoya"

'Hyderabad done!'

'London done!'

'Tehran done!'

'Chicago done!'

"Couldn't find results for Chengdu"

"Couldn't find results for Nanjing"

"Couldn't find results for Wuhan"

'Ho Chi Minh City done!'

"Couldn't find results for Luanda"

"Couldn't find results for Ahmedabad"

'Kuala Lumpur done!'

"Couldn't find results for Xi'an"

"Couldn't find results for Hong Kong"

'Dongguan done!'

'Hangzhou done!'

'Foshan done!'

"Couldn't find results for Shenyang"

'Riyadh done!'

"Couldn't find results for Baghdad"

'Santiago done!'

"Couldn't find results for Surat"

'Madrid done!'

'Suzhou done!'

"Couldn't find results for Pune"

"Couldn't find results for Harbin"

'Houston done!'

'Dallas done!'

'Toronto done!'

'Dar es Salaam done!'

'Miami done!'

"Couldn't find results for Belo Horizonte"

'Singapore done!'

'Philadelphia done!'

'Atlanta done!'

'Fukuoka done!'

"Couldn't find results for Khartoum"

'Barcelona done!'

'Johannesburg done!'

'Saint Petersburg done!'

"Couldn't find results for Qingdao"

"Couldn't find results for Dalian"

'Washington, D.C. done!'

"Couldn't find results for Yangon"

"Couldn't find results for Alexandria"

"Couldn't find results for Jinan"

'Guadalajara done!'

🎊🎊🎊 Congratulations, You made it to the end of this exercise !! 🎊🎊🎊🎊

In [91]:
bucket.objects.all().delete()
bucket.delete()

{'ResponseMetadata': {'RequestId': '8CE5785A5E0E7428',
  'HostId': 'jU5YkfSpMAPhJjRiKZwZTZoQzFU4FXqjYMHcvEF2SyhYkKUzu7M1IHCp46s1BITx/pc+2VirG+U=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'jU5YkfSpMAPhJjRiKZwZTZoQzFU4FXqjYMHcvEF2SyhYkKUzu7M1IHCp46s1BITx/pc+2VirG+U=',
   'x-amz-request-id': '8CE5785A5E0E7428',
   'date': 'Sun, 27 Dec 2020 19:28:12 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}