# Collecting and analyzing data beyond numbers

Welcome to the sixth week of the course. By the end of this week, you should acquire:

**Knowledge on:**
* What an Application Programming Interface is
* The various ways in which APIs can be used
* The concept of authentication and it's pains
* Basic HTTP responses

**Skills on:**
* Interacting with a basic API
* Working with different types of data

In today's class we will rely on some pretty basic Python packages such as *requests*. 

In [1]:
import time # allows us to do interesting things with time objects and the passage of time
import json #allows us to interact with json objects (which is what a lot of APIs communciate in)
import requests #allows us to make so-called http requests, i.e. the same thing you do whenever you launch facebook in a broswer
import pandas as pd

# Basic APIs - cat facts!

In [2]:
data = requests.get('https://catfact.ninja/fact')

In [3]:
data

<Response [200]>

In [4]:
data.json()

{'fact': 'The earliest ancestor of the modern cat lived about 30 million years ago. Scientists called it the\xa0Proailurus, which means “first cat” in Greek. The group of animals that pet cats belong to emerged around 12 million years ago.',
 'length': 226}

In [5]:
#To get multiple facts, we can use our Python knowledge from previous weeks 
counter=0
cats=[]
while counter<30:
    data = requests.get('https://catfact.ninja/fact')
    cats.append(data.json())
    counter+=1

In [7]:
cats

[{'fact': 'Blue-eyed, pure white cats are frequently deaf.', 'length': 47},
 {'fact': 'Cats can judge within 3 inches the precise location of a sound being made 1 yard away.',
  'length': 86},
 {'fact': 'Purring does not always indicate that a cat is happy and healthy - some cats will purr loudly when they are terrified or in pain.',
  'length': 129},
 {'fact': 'Cats can be taught to walk on a leash, but a lot of time and patience is required to teach them. The younger the cat is, the easier it will be for them to learn.',
  'length': 161},
 {'fact': 'Kittens who are taken along on short, trouble-free car trips to town tend to make good passengers when they get older. They get used to the sounds and motions of traveling and make less connection between the car and the visits to the vet.',
  'length': 239},
 {'fact': 'A kitten will typically weigh about 3 ounces at birth.  The typical male housecat will weigh between  7 and 9 pounds, slightly less for female housecats.',
  'length': 153

In [8]:
#What did we get?
cats_df = pd.DataFrame.from_dict(cats)

In [10]:
cats_df

Unnamed: 0,fact,length
0,"Blue-eyed, pure white cats are frequently deaf.",47
1,Cats can judge within 3 inches the precise loc...,86
2,Purring does not always indicate that a cat is...,129
3,"Cats can be taught to walk on a leash, but a l...",161
4,"Kittens who are taken along on short, trouble-...",239
5,A kitten will typically weigh about 3 ounces a...,153
6,"The oldest cat to give birth was Kitty who, at...",136
7,A cat will tremble or shiver when it is extrem...,53
8,"When a family cat died in ancient Egypt, famil...",331
9,A cat's cerebral cortex contains about twice a...,173


# Basic APIs - rhyming words!

In [11]:
#Check out the docs here https://www.datamuse.com/api/
parameter = {"rel_rhy":"bye"}
request = requests.get('https://api.datamuse.com/words',parameter)

In [12]:
print(request.json())

[{'word': 'lie', 'score': 8446, 'numSyllables': 1}, {'word': 'i', 'score': 5939, 'numSyllables': 1}, {'word': 'by', 'score': 5817, 'numSyllables': 1}, {'word': 'fly', 'score': 5385, 'numSyllables': 1}, {'word': 'eye', 'score': 5205, 'numSyllables': 1}, {'word': 'hi', 'score': 3146, 'numSyllables': 1}, {'word': 'pie', 'score': 3095, 'numSyllables': 1}, {'word': 'buy', 'score': 2960, 'numSyllables': 1}, {'word': 'high', 'score': 2530, 'numSyllables': 1}, {'word': 'tie', 'score': 2494, 'numSyllables': 1}, {'word': 'apply', 'score': 2396, 'numSyllables': 2}, {'word': 'die', 'score': 2334, 'numSyllables': 1}, {'word': 'ally', 'score': 2287, 'numSyllables': 2}, {'word': 'supply', 'score': 2216, 'numSyllables': 2}, {'word': 'identify', 'score': 2126, 'numSyllables': 4}, {'word': 'dry', 'score': 2094, 'numSyllables': 1}, {'word': 'sky', 'score': 1876, 'numSyllables': 1}, {'word': 'shy', 'score': 1866, 'numSyllables': 1}, {'word': 'wry', 'score': 1789, 'numSyllables': 1}, {'word': 'alumni', 'sc

### Note that the above code is equivalent to running https://api.datamuse.com/words?rel_rhy=bye in your browser. 


# The Guardian API

In [13]:
#writing own function to query API
#https://towardsdatascience.com/discovering-powerful-data-the-guardian-news-api-into-python-for-nlp-1829b568fb0f


def query_api(tag, page, from_date, api_key):
    """
    Function to query the API for a particular tag
    returns: a response from API
    """
    response = requests.get("https://content.guardianapis.com/search?tag="
                            + tag + "&from-date=" + from_date 
                            +"&page=" + str(page) + "&page-size=200&api-key=" + api_key)
    return response

response = query_api('money/energy', '1', '2022-03-01', 'e723ffce-dfd5-427e-b9b8-779f5efedb02')
data= response.json()


In [16]:
data

{'response': {'status': 'ok',
  'userTier': 'developer',
  'total': 1112,
  'startIndex': 1,
  'pageSize': 200,
  'currentPage': 1,
  'pages': 6,
  'orderBy': 'newest',
  'results': [{'id': 'business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
    'type': 'article',
    'sectionId': 'business',
    'sectionName': 'Business',
    'webPublicationDate': '2024-02-28T18:18:32Z',
    'webTitle': 'UK power plants lined up to command record high energy prices this decade',
    'webUrl': 'https://www.theguardian.com/business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
    'apiUrl': 'https://content.guardianapis.com/business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
    'isHosted': False,
    'pillarId': 'pillar/news',
    'pillarName': 'News'},
   {'id': 'environment/2024/feb/27/environmentally-friendly-heat-pumps-hit-slump-in-europe-says-lobby-group',
    'type': 'article',
    'sectionId': 'environment',
    'sectionName': 'Environ

In [29]:
#How do we get the data structured?

news_df = pd.DataFrame.from_dict(data['response']['results'])

In [26]:
data['response']

{'status': 'ok',
 'userTier': 'developer',
 'total': 1112,
 'startIndex': 1,
 'pageSize': 200,
 'currentPage': 1,
 'pages': 6,
 'orderBy': 'newest',
 'results': [{'id': 'business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
   'type': 'article',
   'sectionId': 'business',
   'sectionName': 'Business',
   'webPublicationDate': '2024-02-28T18:18:32Z',
   'webTitle': 'UK power plants lined up to command record high energy prices this decade',
   'webUrl': 'https://www.theguardian.com/business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
   'apiUrl': 'https://content.guardianapis.com/business/2024/feb/28/uk-power-plants-record-high-energy-prices-auction',
   'isHosted': False,
   'pillarId': 'pillar/news',
   'pillarName': 'News'},
  {'id': 'environment/2024/feb/27/environmentally-friendly-heat-pumps-hit-slump-in-europe-says-lobby-group',
   'type': 'article',
   'sectionId': 'environment',
   'sectionName': 'Environment',
   'webPublicationDate': '20

In [30]:
news_df

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,business/2024/feb/28/uk-power-plants-record-hi...,article,business,Business,2024-02-28T18:18:32Z,UK power plants lined up to command record hig...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News
1,environment/2024/feb/27/environmentally-friend...,article,environment,Environment,2024-02-27T13:24:02Z,Environmentally friendly heat pumps hit slump ...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News
2,money/2024/feb/23/energy-price-cap-great-brita...,article,money,Money,2024-02-23T08:01:35Z,Energy price cap in Great Britain to fall to £...,https://www.theguardian.com/money/2024/feb/23/...,https://content.guardianapis.com/money/2024/fe...,False,pillar/lifestyle,Lifestyle
3,environment/2024/feb/22/net-zero-requires-new-...,article,environment,Environment,2024-02-22T17:35:19Z,Net zero requires new ways of thinking about h...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News
4,business/2024/feb/19/citizens-advice-says-size...,article,business,Business,2024-02-19T05:00:04Z,Citizens Advice says Sizewell C costs should n...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News
...,...,...,...,...,...,...,...,...,...,...,...
195,news/2023/mar/31/solar-power-viable-rainy-engl...,article,news,News,2023-03-31T05:00:16Z,Solar is now viable even in rainy climes – so ...,https://www.theguardian.com/news/2023/mar/31/s...,https://content.guardianapis.com/news/2023/mar...,False,pillar/news,News
196,money/2023/mar/31/uk-national-price-hike-day-w...,article,money,Money,2023-03-31T05:00:15Z,‘UK national price hike day’: what to expect a...,https://www.theguardian.com/money/2023/mar/31/...,https://content.guardianapis.com/money/2023/ma...,False,pillar/lifestyle,Lifestyle
197,money/2023/mar/31/people-uk-april-bill-rises-c...,article,money,Money,2023-03-31T05:00:14Z,‘We live from month to month’: people in UK br...,https://www.theguardian.com/money/2023/mar/31/...,https://content.guardianapis.com/money/2023/ma...,False,pillar/lifestyle,Lifestyle
198,environment/2023/mar/28/delays-landlord-energy...,article,environment,Environment,2023-03-27T23:01:03Z,Delays to landlord energy efficiency standards...,https://www.theguardian.com/environment/2023/m...,https://content.guardianapis.com/environment/2...,False,pillar/news,News


# Working with text

In [31]:
#recap - operations on strings

text = '  People who are allergic to cats are actually allergic to cat saliva or to cat dander.  \n If the resident cat is bathed regularly the allergic people tolerate it better.'
print(text)

#What do we need to do?

  People who are allergic to cats are actually allergic to cat saliva or to cat dander.  
 If the resident cat is bathed regularly the allergic people tolerate it better.


In [39]:
#recap - operations on strings

text = '  People who are allergic to cats are actually allergic to cat saliva or to cat dander.  
If the resident cat is bathed regularly the allergic people tolerate it better.'
print(text)

#What do we need to do?

SyntaxError: EOL while scanning string literal (2676884132.py, line 3)

In [35]:
"cats" in text

True

In [38]:
text.strip().upper()

'PEOPLE WHO ARE ALLERGIC TO CATS ARE ACTUALLY ALLERGIC TO CAT SALIVA OR TO CAT DANDER.  \n IF THE RESIDENT CAT IS BATHED REGULARLY THE ALLERGIC PEOPLE TOLERATE IT BETTER.'

In [40]:
text.replace('\n',' ')

'  People who are allergic to cats are actually allergic to cat saliva or to cat dander.    If the resident cat is bathed regularly the allergic people tolerate it better.'

In [43]:
# processing a larger amount of data - df with articles
news_df['titles'] = news_df['webTitle'].str.strip().str.lower()

In [45]:
news_df.head()

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName,titles
0,business/2024/feb/28/uk-power-plants-record-hi...,article,business,Business,2024-02-28T18:18:32Z,UK power plants lined up to command record hig...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News,uk power plants lined up to command record hig...
1,environment/2024/feb/27/environmentally-friend...,article,environment,Environment,2024-02-27T13:24:02Z,Environmentally friendly heat pumps hit slump ...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News,environmentally friendly heat pumps hit slump ...
2,money/2024/feb/23/energy-price-cap-great-brita...,article,money,Money,2024-02-23T08:01:35Z,Energy price cap in Great Britain to fall to £...,https://www.theguardian.com/money/2024/feb/23/...,https://content.guardianapis.com/money/2024/fe...,False,pillar/lifestyle,Lifestyle,energy price cap in great britain to fall to £...
3,environment/2024/feb/22/net-zero-requires-new-...,article,environment,Environment,2024-02-22T17:35:19Z,Net zero requires new ways of thinking about h...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News,net zero requires new ways of thinking about h...
4,business/2024/feb/19/citizens-advice-says-size...,article,business,Business,2024-02-19T05:00:04Z,Citizens Advice says Sizewell C costs should n...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News,citizens advice says sizewell c costs should n...


In [None]:
# writing functions with strings - write a function that classifies if an article title mentions uk or not



1. define a function
2. condition is uk or united kingdom mentioned
3. return 0 if False and 1 if True
4. apply it to column 'titles'
5. create a new column by applying the function


In [47]:
def uk_checker(title):
    if 'uk' in title.lower():
        return 1
    elif 'united kingdom' in title.lower():
        return 1
    else:
        return 0

In [48]:
uk_checker('uk power plants lined up to command')

1

In [51]:
news_df['uk_present'] = news_df['titles'].apply(uk_checker)

In [52]:
news_df.head()

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName,titles,uk_present
0,business/2024/feb/28/uk-power-plants-record-hi...,article,business,Business,2024-02-28T18:18:32Z,UK power plants lined up to command record hig...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News,uk power plants lined up to command record hig...,1
1,environment/2024/feb/27/environmentally-friend...,article,environment,Environment,2024-02-27T13:24:02Z,Environmentally friendly heat pumps hit slump ...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News,environmentally friendly heat pumps hit slump ...,0
2,money/2024/feb/23/energy-price-cap-great-brita...,article,money,Money,2024-02-23T08:01:35Z,Energy price cap in Great Britain to fall to £...,https://www.theguardian.com/money/2024/feb/23/...,https://content.guardianapis.com/money/2024/fe...,False,pillar/lifestyle,Lifestyle,energy price cap in great britain to fall to £...,0
3,environment/2024/feb/22/net-zero-requires-new-...,article,environment,Environment,2024-02-22T17:35:19Z,Net zero requires new ways of thinking about h...,https://www.theguardian.com/environment/2024/f...,https://content.guardianapis.com/environment/2...,False,pillar/news,News,net zero requires new ways of thinking about h...,0
4,business/2024/feb/19/citizens-advice-says-size...,article,business,Business,2024-02-19T05:00:04Z,Citizens Advice says Sizewell C costs should n...,https://www.theguardian.com/business/2024/feb/...,https://content.guardianapis.com/business/2024...,False,pillar/news,News,citizens advice says sizewell c costs should n...,0


In [53]:
news_df['uk_present'].value_counts()

0    134
1     66
Name: uk_present, dtype: int64