### Women in Data Science ATX Meetup - November 30, 2017

# Getting Data Using APIs

In this notebook we introduce a few Python libraries for getting and dealing with data using web Application Programming Interfaces (APIs), then work through some examples. Here is a quick overview of the topic directed toward data scientists:

https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/

In the author's words:
> "In simple words, an API is a (hypothetical) contract between 2 softwares saying if the user software provides input in a pre-defined format, the later with extend its functionality and provide the outcome to the user software. Think of it like this, Graphical user interface (GUI) or command line interface (CLI) allows humans to Interact with code, where as an Application programmable interface (API) allows one piece of code to interact with other code.
..."

> "An API is a set of rules with which the interaction between various entities is defined. We are specifically talking about interaction between two software."

We'll be accessing data from the following APIs:
- [USNO Astronomical Applications Department API](http://aa.usno.navy.mil/data/docs/api.php)
- [Google Maps APIs](https://developers.google.com/maps/web-services/)
- [OpenWeatherMap API](https://openweathermap.org/api)
- [Planet Data API](https://www.planet.com/docs/reference/data-api/)
- [Twitter API](https://developer.twitter.com/en/docs)

***
## Useful Packages
We'll use the following packages to make requests, parse responses, and read in authorization credentials for accessing APIs that require them:
- [requests](http://docs.python-requests.org/en/master/) "is the only Non-GMO HTTP library for Python, safe for human consumption"
- [json](https://docs.python.org/3/library/json.html) is a common package for encoding and decoding JSON (JavaScript Object Notation)
- [pyyaml](https://github.com/yaml/pyyaml) is a package for reading YAML (Yet Another Markup Language) formatted files

### JSON Package
Many APIs these days return data in JavaScript Object Notation (JSON) format. It looks a lot like a Python `dict`, but is a language-agnostic serialization standard. We will use the `json` package for converting JSON-formatted responses to Python data structures such as dictionaries. Double-quotes are a big deal to JSON, whereas Python is all like, "meh" (or, 'meh'?).

Let's try an example of parsing JSON, taken from "Data Science from Scratch" by Joel Grus.

In [1]:
import json


serialized = """{ "title" : "Data Science Book",
                  "author" : "Joel Grus",
                  "publicationYear" : 2014,
                  "topics" : [ "data", "science", "data science"] }"""
print("Serialized: {}".format(type(serialized)))
print(serialized)
print("")

# parse the JSON to create a Python object
print("...parsing json...")
print("")
deserialized = json.loads(serialized)
if "data science" in deserialized["topics"]:
    print("Deserialized: {}".format(type(deserialized)))
    print(deserialized)

Serialized: <class 'str'>
{ "title" : "Data Science Book",
                  "author" : "Joel Grus",
                  "publicationYear" : 2014,
                  "topics" : [ "data", "science", "data science"] }

...parsing json...

Deserialized: <class 'dict'>
{'title': 'Data Science Book', 'author': 'Joel Grus', 'publicationYear': 2014, 'topics': ['data', 'science', 'data science']}


### Requests Package
We will use the Python Requests library for making HTTP requests.
- http://docs.python-requests.org/en/master/
- http://docs.python-requests.org/en/master/user/quickstart/

This package is not (yet) in the Python standard library, but it's really nice to use and well-documented. Other packages are `http.client` and `urllib`. See here: https://docs.python.org/3/library/internet.html

Let's try another example from "Data Science from Scratch" for making a `GET` `HTTP` request.

In [2]:
import json
import requests


endpoint = "https://api.github.com/users/joelgrus/repos"

response = requests.get(endpoint)

repos = json.loads(response.text)  # "loads" means "load string"

Use `json.loads()` for decoding **from** JSON and `json.dumps()` for encoding **to** JSON.

<div class="alert alert-info">
**Exercise:** Replace `json.loads(response.text)` with `response.json()` and verify that they return the same thing.
</div>

In [3]:
## CODE HERE ##

In [4]:
print(json.loads(response.text)[0])

{'id': 11189868, 'name': 'bitstarter', 'full_name': 'joelgrus/bitstarter', 'owner': {'login': 'joelgrus', 'id': 1308313, 'avatar_url': 'https://avatars1.githubusercontent.com/u/1308313?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/joelgrus', 'html_url': 'https://github.com/joelgrus', 'followers_url': 'https://api.github.com/users/joelgrus/followers', 'following_url': 'https://api.github.com/users/joelgrus/following{/other_user}', 'gists_url': 'https://api.github.com/users/joelgrus/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/joelgrus/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/joelgrus/subscriptions', 'organizations_url': 'https://api.github.com/users/joelgrus/orgs', 'repos_url': 'https://api.github.com/users/joelgrus/repos', 'events_url': 'https://api.github.com/users/joelgrus/events{/privacy}', 'received_events_url': 'https://api.github.com/users/joelgrus/received_events', 'type': 'User', 'site_admin': False}, 'private

In [5]:
print(response.json()[0])

{'id': 11189868, 'name': 'bitstarter', 'full_name': 'joelgrus/bitstarter', 'owner': {'login': 'joelgrus', 'id': 1308313, 'avatar_url': 'https://avatars1.githubusercontent.com/u/1308313?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/joelgrus', 'html_url': 'https://github.com/joelgrus', 'followers_url': 'https://api.github.com/users/joelgrus/followers', 'following_url': 'https://api.github.com/users/joelgrus/following{/other_user}', 'gists_url': 'https://api.github.com/users/joelgrus/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/joelgrus/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/joelgrus/subscriptions', 'organizations_url': 'https://api.github.com/users/joelgrus/orgs', 'repos_url': 'https://api.github.com/users/joelgrus/repos', 'events_url': 'https://api.github.com/users/joelgrus/events{/privacy}', 'received_events_url': 'https://api.github.com/users/joelgrus/received_events', 'type': 'User', 'site_admin': False}, 'private

<div class="alert alert-info">
**Exercise:** If you have a GitHub account, replace `joelgrus` with your username and explore.
</div>

In [6]:
## CODE HERE ##

In [7]:
endpoint = "https://api.github.com/users/womenindatascienceatx/repos"
response = requests.get(endpoint)
repos = json.loads(response.text)  # "loads" means "load string"
[print(repo['name']) for repo in repos];

Clustering
data-science-from-scratch-workshop
Decision-Trees
KNIME_2017_Workshop
logistic-regression
Meetup-Slides
multiple-linear-regression-and-gradient-descent
NaiveBayesPresentation
Neural_Networks
PCA_SVD
RecommenderSystems
SQLworkshop
stats_and_regression
titanic-data-munging
titanic-EDA
titanic-hypothesis-testing
wids-github
widsatx-mapreduce
widsatx-nlp
widsatx-python
wids_network_analysis


<div class="alert alert-info">
**Exercise:** The `repos` object above is a list of dicts. What do you get when you make a `GET` request to the endpoints `https://api.github.com` and `https://api.github.com/users`?
</div>

In [8]:
## CODE HERE ##

In [9]:
endpoint = "https://api.github.com"
response = requests.get(endpoint)
print(type(response.json()))

endpoint = "https://api.github.com/users"
response = requests.get(endpoint)
print(type(response.json()))

<class 'dict'>
<class 'list'>


Let's take a closer look at what got returned (a list of dicts where each element of the list is a repo).

In [10]:
sorted(repos[0].keys())  # dictionary keys of the first repo in the list of repos

['archive_url',
 'archived',
 'assignees_url',
 'blobs_url',
 'branches_url',
 'clone_url',
 'collaborators_url',
 'comments_url',
 'commits_url',
 'compare_url',
 'contents_url',
 'contributors_url',
 'created_at',
 'default_branch',
 'deployments_url',
 'description',
 'downloads_url',
 'events_url',
 'fork',
 'forks',
 'forks_count',
 'forks_url',
 'full_name',
 'git_commits_url',
 'git_refs_url',
 'git_tags_url',
 'git_url',
 'has_downloads',
 'has_issues',
 'has_pages',
 'has_projects',
 'has_wiki',
 'homepage',
 'hooks_url',
 'html_url',
 'id',
 'issue_comment_url',
 'issue_events_url',
 'issues_url',
 'keys_url',
 'labels_url',
 'language',
 'languages_url',
 'license',
 'merges_url',
 'milestones_url',
 'mirror_url',
 'name',
 'notifications_url',
 'open_issues',
 'open_issues_count',
 'owner',
 'private',
 'pulls_url',
 'pushed_at',
 'releases_url',
 'size',
 'ssh_url',
 'stargazers_count',
 'stargazers_url',
 'statuses_url',
 'subscribers_url',
 'subscription_url',
 'svn_url'

Next we will use `dateutil.parser` to convert from ISO datetime format to a datetime object.

In [11]:
repos[0]['created_at']  # ISO datetime format

'2017-02-23T21:58:30Z'

In [12]:
from collections import Counter
from dateutil.parser import parse

parse(repos[0]['created_at'])  # datetime object

datetime.datetime(2017, 2, 23, 21, 58, 30, tzinfo=tzutc())

Now, count up the number of repos created by month and by weekday using a `Counter` object from the `collections` package.

In [13]:
dates = [parse(repo["created_at"]) for repo in repos]
month_counts = Counter(date.month for date in dates)
weekday_counts = Counter(date.weekday() for date in dates)

In [14]:
# Sort by month/weekday
print("Number of repos created by month:", sorted(month_counts.items()))
print("Number of repos created by weekday:", sorted(weekday_counts.items()))

Number of repos created by month: [(1, 1), (2, 1), (3, 2), (4, 2), (6, 4), (7, 2), (8, 2), (9, 4), (10, 2), (11, 1)]
Number of repos created by weekday: [(0, 1), (1, 1), (2, 3), (3, 13), (4, 3)]


In [15]:
# Sort by counts
print(month_counts.most_common())  # What month were the most repos created?
print(weekday_counts.most_common())  # which day(s) of the week were the most repos created?

[(9, 4), (6, 4), (3, 2), (10, 2), (8, 2), (7, 2), (4, 2), (2, 1), (11, 1), (1, 1)]
[(3, 13), (2, 3), (4, 3), (0, 1), (1, 1)]


### PyYaml Library
We will use a YAML-formatted (Yet Another Markup Language) file which we call `config.yml` for keeping configuration data all in one place, and for keeping API keys and tokens secret. 

You will find a file named `config.yml.template` in the same directory as this notebook. It contains entries like this:

```
...
planet:
    url: https://api.planet.com/data/v1
    key: {PLANET_API_KEY}
...
```

Please keep that file open in an editor and, as we go through the tutorial, replace the values in curly braces with your personal API keys.

For example, when we get to the section on on the Planet API you would replace `{PLANET_API_KEY}` with a key looking something like this: `a3a64774d30c4749826b6be445489d3b` (not a real key, but you can generate one by signing up for an account).

**IMPORTANT NOTE: DO NOT COMMIT config.yml TO GITHUB!**
If you plan to commit any code from this tutorial to GitHub, ensure that `config.yml` is in the repository's `.gitignore` file.

Next, let's make a function for loading our configuration file. We can call this whenever we make a change to `config.yml` and want to re-load the file to memory.

In [16]:
import yaml

def load_config():
    """load the configuration file as a python dictionary"""
    with open("config.yml", 'r') as ymlfile:
        cfg = yaml.load(ymlfile)
    return cfg

***
## USNO Astronomical Applications Department API
http://aa.usno.navy.mil/data/docs/api.php

This API does not require any authentication credentials, so we can immediately start doing `GET` requests.

In general, an API request takes the form:
```
http://api.usno.navy.mil/<web_service>?<parameters>
```

The available web services are:
- `imagery`: synthetic images of astronomical bodies under a set of conditions
- `moon`: dates and times of a list of primary moon phases
- `rstt`: rise, set, and transit times for the Sun and Moon
- `sidtime`: Greenwich mean and apparent sidereal time, local mean and apparent sidereal time, and the Equation of the Equinoxes
- `eclipses/solar`: local circumstances for solar eclipses
- `christian`: selected Christian observances
- `jewish`: selected Jewish observances
- `islamic`: selected Islamic observcances
- `jdconverter`: converts dates between the Julian/Gregorian calendar and Julian date

You can find a description of the various parameters in the docs link above.

In [17]:
configuration = load_config()  # load config.yml

ECLIPSE_API_URL = configuration['usno_solar_eclipse']['url']
## NOTE: This API doesn't require a key

In [18]:
def get_local_eclipse_data(geocode):
    """Get data on the solar eclipse at a particular location."""
    query_params = {
        "date": "8/21/2017",
        "coords": geocode,
        "format": "json"
    }
    response = requests.get(
        ECLIPSE_API_URL,
        params=query_params
    )
    return response.json()

In [19]:
geocode = "46.67,1.48"
get_local_eclipse_data(geocode)

{'apiversion': '2.0.0',
 'day': 21,
 'deltaT': '69.4s',
 'description': 'Sun in Partial Eclipse at this Location',
 'duration': '0h 11m 08.6s',
 'error': False,
 'event': 'Solar Eclipse of 2017 Aug. 21',
 'height': '0m',
 'lat': '46.670000',
 'local_data': [{'altitude': '1.0',
   'azimuth': '286.3',
   'day': '21',
   'phenomenon': 'Eclipse Begins',
   'position_angle': '229.3',
   'time': '18:42:26.8',
   'vertex_angle': '187.0'},
  {'altitude': '----',
   'azimuth': '288.3',
   'day': '21',
   'phenomenon': 'Sunset',
   'time': '18:54'}],
 'lon': '1.480000',
 'magnitude': '0.158',
 'month': 8,
 'obscuration': '7.4%',
 'tz': '0',
 'year': 2017}

## Google Maps APIs

Google provides lots of useful public APIs, including their Maps APIs:

https://developers.google.com/maps/web-services/

We will use Google Map's geocoding API to get latitude and longitude coordinates (a geocode) for a given postcode.

For contrast, we'll use the `geopy` Python client wrapper for getting a timezone ID after getting a geocode from a postcode.

From their GitHub page: "geopy is a Python 2 and 3 client for several popular geocoding web services."

https://github.com/geopy/geopy

In [20]:
GEOCODING_API_URL = configuration['google_geocoding']['url']
GEOCODING_API_KEY = configuration['google_geocoding']['key']

In [21]:
import time


def geocode_postcode_data(postcode):
    """Use Google geocode API to get geocode from postcode."""
    print("get the geocode for postcode {}".format(postcode))
    if not postcode:
        return None
    time.sleep(1)  # avoid rate limiting if calling this function in a loop
    url = "{}/json?components=postal_code:{}&key={}".format(
        GEOCODING_API_URL,
        postcode,
        GEOCODING_API_KEY
    )
    r = requests.get(url)
    postcode_geocode = None
    if r.status_code == 200:
        results = r.json()['results']
        if len(results) > 0:
            location = results[0]['geometry']['location']
            postcode_geocode = "{},{}".format(location['lat'], location['lng'])
    return postcode_geocode

In [22]:
from geopy import geocoders


def get_timezone(geocode):
    """Use geopy client to get timezone from lat/lon."""
    # Localize the event times
    lat = float(geocode.split(',')[0])
    lon = float(geocode.split(',')[1])
    # Get timezone from lat/lon
    g = geocoders.GoogleV3()
    timezone = str(g.timezone((lat, lon)))
    return timezone

Now let's write a function to localize a datetime object to a particular timezone and format it as a more easily human-readable string.

In [23]:
import datetime as dt
import pytz


def localize_time(time_string, timezone_id):
    time_split = time_string.split(':')
    datetime = dt.datetime(2017, 8, 21, int(time_split[0]), int(time_split[1]))
    utc_datetime = datetime.replace(tzinfo=pytz.utc)
    time_format = '%-I:%M %p %Z'  # e.g., '1:44 PM EST'

    # Convert timezone_id to pytz timezone object
    timezone = pytz.timezone(timezone_id)

    # Localize timezone-aware datetime object
    localized_datetime = utc_datetime.astimezone(timezone)

    return localized_datetime.strftime(time_format)

In [24]:
localize_time("16:41:01.2", timezone_id)

NameError: name 'timezone_id' is not defined

<div class="alert alert-info">
**Exercise:** Use the above functions to get and print the begin time, the end time, and the maximum obscuration for the August 21, 2017 eclipse in your zip code and your time zone.
</div>

In [None]:
## CODE HERE ##

In [None]:
MY_POSTCODE = 78731
geocode = geocode_postcode_data(MY_POSTCODE)
print(geocode)
print("")

dict_eclipse_data = get_local_eclipse_data(geocode)

In [None]:
dict_eclipse_data

In [None]:
for local in dict_eclipse_data['local_data']:
    print(local['phenomenon'], local['time'])
    
obscuration = dict_eclipse_data['obscuration']
print("Obscuration at Maximum Eclipse: {}".format(obscuration))

In [None]:
timezone_id = get_timezone(geocode)
print(timezone_id)

In [None]:
for local in dict_eclipse_data['local_data']:
    print(local['phenomenon'], localize_time(local['time'], timezone_id))
    
obscuration = dict_eclipse_data['obscuration']
print("Obscuration at Maximum Eclipse: {}".format(obscuration))

***
## Planet Data API

Sign up for an Planet account [here](https://www.planet.com/account/#/). Log in and copy/paste your API key to your `config.yml` file.

Docs: https://www.planet.com/docs/reference/data-api/

In [None]:
# Load the needed configuration file variables
cfg = load_config()
PLANET_API_URL = cfg['planet']['url']
PLANET_API_KEY = cfg['planet']['key']

The following example is taken from here:
- https://github.com/planetlabs/notebooks/blob/master/jupyter-notebooks/data-api-tutorials/search_and_download_quickstart.ipynb.

### Define an Area of Interest

In [None]:
# Stockton, CA bounding box (created via geojson.io) 
geojson_geometry = {
  "type": "Polygon",
  "coordinates": [
    [ 
      [-121.59290313720705, 37.93444993515032],
      [-121.27017974853516, 37.93444993515032],
      [-121.27017974853516, 38.065932950547484],
      [-121.59290313720705, 38.065932950547484],
      [-121.59290313720705, 37.93444993515032]
    ]
  ]
}

In [None]:
# # Austin, TX bounding box (created via geojson.io)
# geojson_geometry = {
#     "type": "Polygon",
#     "coordinates": [
#       [
#         [-97.84698486328125, 30.115433670851925],
#         [-97.63481140136719, 30.115433670851925],
#         [-97.63481140136719, 30.433281874927655],
#         [-97.84698486328125, 30.433281874927655],
#         [-97.84698486328125, 30.115433670851925]
#       ]
#     ]
# }

### Create Filters

In [None]:
# get images that overlap with our AOI 
geometry_filter = {
  "type": "GeometryFilter",
  "field_name": "geometry",
  "config": geojson_geometry
}

# get images acquired within a date range
date_range_filter = {
  "type": "DateRangeFilter",
  "field_name": "acquired",
  "config": {
    "gte": "2016-08-31T00:00:00.000Z",
    "lte": "2016-09-01T00:00:00.000Z"
  }
}

# only get images which have <50% cloud coverage
cloud_cover_filter = {
  "type": "RangeFilter",
  "field_name": "cloud_cover",
  "config": {
    "lte": 0.5
  }
}

# combine our geo, date, cloud filters
combined_filter = {
  "type": "AndFilter",
  "config": [geometry_filter, date_range_filter, cloud_cover_filter]
}

### Searching: Items and Assets
You can learn more about item & asset types in Planet's Data API [here](https://www.planet.com/docs/reference/data-api/items-assets/).

In [None]:
import os
import json
import requests
from requests.auth import HTTPBasicAuth


item_type = "PSScene3Band"

# API request object
search_request = {
  "interval": "day",
  "item_types": [item_type], 
  "filter": combined_filter
}

# fire off the POST request
search_result = \
  requests.post(
    'https://api.planet.com/data/v1/quick-search',
    auth=HTTPBasicAuth(PLANET_API_KEY, ''),
    json=search_request)

print(json.dumps(search_result.json(), indent=1))

In [None]:
# extract image IDs only
image_ids = [feature['id'] for feature in search_result.json()['features']]
print(image_ids)

In [None]:
# For demo purposes, just grab the first image ID
id0 = image_ids[0]
id0_url = 'https://api.planet.com/data/v1/item-types/{}/items/{}/assets'.format(item_type, id0)

# Returns JSON metadata for assets in this ID. Learn more: planet.com/docs/reference/data-api/items-assets/#asset
result = \
  requests.get(
    id0_url,
    auth=HTTPBasicAuth(PLANET_API_KEY, '')
  )

# List of asset types available for this particular satellite image
print(result.json().keys())

### Activation and Downloading

In [None]:
# This is "inactive" if the "visual" asset has not yet been activated; otherwise 'active'
print(result.json()['visual']['status'])

In [None]:
# Parse out useful links
links = result.json()[u"visual"]["_links"]
self_link = links["_self"]
activation_link = links["activate"]

# Request activation of the 'visual' asset:
activate_result = \
  requests.get(
    activation_link,
    auth=HTTPBasicAuth(PLANET_API_KEY, '')
  )

In [None]:
activation_status_result = \
  requests.get(
    self_link,
    auth=HTTPBasicAuth(PLANET_API_KEY, '')
  )
    
print(activation_status_result.json()["status"])

In [None]:
# Image can be downloaded by making a GET with your Planet API key, from here:
download_link = activation_status_result.json()["location"]
print(download_link)

***
## Twitter APIs
https://developer.twitter.com/en/docs

In this case there is a well-written Python client wrapper called `twython` that can be used in lieue of the lower level `requests` library (or the lower-level `http.client` library).

This paragraph from "Data Science from Scratch" is on point:

> "Typically we won’t be working with APIs at this low “make the requests and parse the
responses ourselves” level. One of the benefits of using Python is that someone has
already built a library for pretty much any API you’re interested in accessing. When
they’re done well, these libraries can save you a lot of the trouble of figuring out the
hairier details of API access. (When they’re not done well, or when it turns out they’re
based on defunct versions of the corresponding APIs, they can cause you enormous
headaches.)

> Nonetheless, you’ll occasionally have to roll your own API-access library (or, more
likely, debug why someone else’s isn’t working), so it’s good to know some of the
details."


### Get  Your Twitter Credentials
Sign in to your Twitter account and go to https://apps.twitter.com/. We'll go over this part together.

The following example is taken from here:
- https://github.com/joelgrus/data-science-from-scratch/blob/master/code/getting_data.py

In [None]:
configuration = load_config()

# these should be in your 
CONSUMER_KEY = configuration['twitter']['consumer_key']
CONSUMER_SECRET = configuration['twitter']['consumer_secret']
ACCESS_TOKEN = configuration['twitter']['access_token']
ACCESS_TOKEN_SECRET = configuration['twitter']['access_token_secret']

### Twitter Search API

In [None]:
from twython import Twython


def call_twitter_search_api():

    twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET)

    # search for tweets containing the phrase "data science"
    for status in twitter.search(q='"data science"')["statuses"]:
        user = status["user"]["screen_name"].encode('utf-8')
        text = status["text"].encode('utf-8')
        print(user, ":", text)
        print("")

In [None]:
# search for tweets containing the phrase "data science"
call_twitter_search_api()

### Twitter Streaming API

In [None]:
from twython import TwythonStreamer


# appending data to a global variable is pretty poor form
# but it makes the example much simpler
tweets = [] 

class MyStreamer(TwythonStreamer):
    """our own subclass of TwythonStreamer that specifies
    how to interact with the stream"""

    def on_success(self, data):
        """what do we do when twitter sends us data?
        here data will be a Python object representing a tweet"""

        # only want to collect English-language tweets
        if data['lang'] == 'en':
            tweets.append(data)

        # stop when we've collected enough
        if len(tweets) >= 100:
            self.disconnect()

    def on_error(self, status_code, data):
        print(status_code, data)
        self.disconnect()

def call_twitter_streaming_api():
    stream = MyStreamer(CONSUMER_KEY, CONSUMER_SECRET, 
                        ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

    # starts consuming public statuses that contain the keyword 'data'
    stream.statuses.filter(track='data')
    
    ## if instead we wanted to start consuming a sample of *all* public statuses
    # stream.statuses.sample()
    

In [None]:
call_twitter_streaming_api()

In [None]:
len(tweets)

In [None]:
tweets[0]

<div class="alert alert-info">
**Exercise:** Find the most common hashtags in `tweets`.
</div>


In [None]:
## CODE HERE ##

In [None]:
top_hashtags = Counter(
    hashtag['text'].lower()
    for tweet in tweets
    for hashtag in tweet["entities"]["hashtags"]
)
print(top_hashtags.most_common(5))

## Thanks for Coming!

- agarwal.meghann@gmail.com
- https://www.linkedin.com/in/meghann-agarwal/