# Collecting and analyzing data beyond numbers

Welcome to the sixth week of the course. By the end of this week, you should acquire:

**Knowledge on:**
* What an Application Programming Interface is
* The various ways in which APIs can be used
* The concept of authentication and it's pains
* Basic HTTP responses

**Skills on:**
* Interacting with a basic API
* Working with different types of data

In today's class we will rely on some pretty basic Python packages such as *requests*. 

In [6]:
import time # allows us to do interesting things with time objects and the passage of time
import json #allows us to interact with json objects (which is what a lot of APIs communciate in)
import requests #allows us to make so-called http requests, i.e. the same thing you do whenever you launch facebook in a broswer
import pandas as pd

# Basic APIs - cat facts!

In [2]:
data = requests.get('https://catfact.ninja/fact')

In [3]:
data

<Response [200]>

In [5]:
data.json()

{'fact': 'Cats have "nine lives" thanks to a flexible spine and powerful leg and back muscles',
 'length': 83}

In [4]:
#To get multiple facts, we can use our Python knowledge from previous weeks 
counter=0
cats=[]
while counter<30:
    data = requests.get('https://catfact.ninja/fact')
    cats.append(data.json())
    counter+=1

In [21]:
#What did we get?
cats_df = pd.DataFrame.from_dict(cats)

In [23]:
cats

[{'fact': 'A female cat will be pregnant for approximately 9 weeks or between 62 and 65 days from conception to delivery.',
  'length': 110},
 {'fact': 'Many cats love having their forehead gently stroked.',
  'length': 52},
 {'fact': 'People who are allergic to cats are actually allergic to cat saliva or to cat dander. If the resident cat is bathed regularly the allergic people tolerate it better.',
  'length': 165},
 {'fact': 'Cats only use their meows to talk to humans, not each other. The only time they meow to communicate with other felines is when they are kittens to signal to their mother.',
  'length': 170},
 {'fact': 'When a cat chases its prey, it keeps its head level. Dogs and humans bob their heads up and down.',
  'length': 97},
 {'fact': 'The smallest pedigreed cat is a Singapura, which can weigh just 4 lbs (1.8 kg), or about five large cans of cat food. The largest pedigreed cats are Maine Coon cats, which can weigh 25 lbs (11.3 kg), or nearly twice as much as an average

# Basic APIs - rhyming words!

In [21]:
#Check out the docs here https://www.datamuse.com/api/
parameter = {"rel_rhy":"bye"}
request = requests.get('https://api.datamuse.com/words',parameter)

In [22]:
print(request.json())

[{'word': 'lie', 'score': 8446, 'numSyllables': 1}, {'word': 'i', 'score': 5939, 'numSyllables': 1}, {'word': 'by', 'score': 5817, 'numSyllables': 1}, {'word': 'fly', 'score': 5385, 'numSyllables': 1}, {'word': 'eye', 'score': 5205, 'numSyllables': 1}, {'word': 'hi', 'score': 3146, 'numSyllables': 1}, {'word': 'pie', 'score': 3095, 'numSyllables': 1}, {'word': 'buy', 'score': 2960, 'numSyllables': 1}, {'word': 'high', 'score': 2530, 'numSyllables': 1}, {'word': 'tie', 'score': 2494, 'numSyllables': 1}, {'word': 'apply', 'score': 2396, 'numSyllables': 2}, {'word': 'die', 'score': 2334, 'numSyllables': 1}, {'word': 'ally', 'score': 2287, 'numSyllables': 2}, {'word': 'supply', 'score': 2216, 'numSyllables': 2}, {'word': 'identify', 'score': 2126, 'numSyllables': 4}, {'word': 'dry', 'score': 2094, 'numSyllables': 1}, {'word': 'sky', 'score': 1876, 'numSyllables': 1}, {'word': 'shy', 'score': 1866, 'numSyllables': 1}, {'word': 'wry', 'score': 1789, 'numSyllables': 1}, {'word': 'alumni', 'sc

### Note that the above code is equivalent to running https://api.datamuse.com/words?rel_rhy=bye in your browser. 


# The Guardian API

In [12]:
#writing own function to query API
#https://towardsdatascience.com/discovering-powerful-data-the-guardian-news-api-into-python-for-nlp-1829b568fb0f


def query_api(tag, page, from_date, api_key):
    """
    Function to query the API for a particular tag
    returns: a response from API
    """
    response = requests.get("https://content.guardianapis.com/search?tag="
                            + tag + "&from-date=" + from_date 
                            +"&page=" + str(page) + "&page-size=200&api-key=" + api_key)
    return response

response = query_api('money/energy', '1', '2022-03-01', 'e723ffce-dfd5-427e-b9b8-779f5efedb02')
data= response.json()


In [13]:
data

{'response': {'status': 'ok',
  'userTier': 'developer',
  'total': 897,
  'startIndex': 1,
  'pageSize': 200,
  'currentPage': 1,
  'pages': 5,
  'orderBy': 'newest',
  'results': [{'id': 'business/2023/mar/12/british-gas-service-arvato-made-a-third-of-all-warrant-request-to-force-fit-prepay-meters',
    'type': 'article',
    'sectionId': 'business',
    'sectionName': 'Business',
    'webPublicationDate': '2023-03-12T16:12:04Z',
    'webTitle': 'British Gas debt agents made third of all applications to force-fit prepay meters',
    'webUrl': 'https://www.theguardian.com/business/2023/mar/12/british-gas-service-arvato-made-a-third-of-all-warrant-request-to-force-fit-prepay-meters',
    'apiUrl': 'https://content.guardianapis.com/business/2023/mar/12/british-gas-service-arvato-made-a-third-of-all-warrant-request-to-force-fit-prepay-meters',
    'isHosted': False,
    'pillarId': 'pillar/news',
    'pillarName': 'News'},
   {'id': 'lifeandstyle/2023/mar/09/the-closure-of-swimming-pools

In [19]:
#How do we get the data structured?

# Working with text

In [38]:
#recap - operations on strings

text = '  People who are allergic to cats are actually allergic to cat saliva or to cat dander.  \n If the resident cat is bathed regularly the allergic people tolerate it better.'
print(text)

#What do we need to do?

  People who are allergic to cats are actually allergic to cat saliva or to cat dander.  
 If the resident cat is bathed regularly the allergic people tolerate it better.


In [None]:
# processing a larger amount of data - df with articles

In [None]:
# writing functions with strings