# APIs

## Learning objectives:
- Learn what a REST API is
- Use REST APIs to obtain data

In the last notebook, we looked at scraping the web to obtain some (housing) data. In many cases, especially when wanting textual data, we may need to resort to scraping the web. However, some websites offer web APIs we may access to pull information from. Information coming from APIs are returned in a structured format, such as JSON.

The word API keeps popping... but what is it? And what is a REST API?

An **API** stands for Application Programmable Interface. When we are writing a program/code, we would often need to interface with other people's code (e.g. a library). An API defines the rules we need to follow to talk to the code (e.g. function names).

A **REST API** allows communication over HTTP. The client sends a request, and the server receives a response. Requests will take on one four following types: GET, PUT, POST, and DELETE. Most related to pulling data from other services (via APIs) is **GET**. As the name implies, this is the HTTP Method we use when we want to request some data.

**So how do we request data?**
Well, we need a place to request data from, and this comes in the form of an endpoint URL. An endpoint URL usually looks something along these lines:

![](images/api_url_structure.png)

Let's visit the github API endpoint to see what the **response** is: https://api.github.com/users/ai-core/repos?sort=pushed&direction=desc

As we can see, the response from calling the Github API is a JSON object. However, this doesn't necessarily have to be the case - the developer who coded the API could have allowed for any file format to be returned (XML, CSV, Images etc.). For gathering data through APIs, JSON is typically the easiest to work with, so where possible, we should favour this.

### HTTP Codes
<img src="https://infidigit.b-cdn.net/wp-content/uploads/2019/12/20191227_012601_0000.png" style="width: 350px"/>

Read the docs! https://api.stackexchange.com/docs

Let's collect data from StackExchange's API. Here we'll be working in a slightly roundabout fashion to pull the data we want from their API. This is for teaching purposes, so we can understand the structure of JSON, and for you to get some hands on experience with using a REST API.

We'll be collecting the body/contents of questions posted on StackOverflow. To do this, we'll first pull some posts within a date range. If the type of the post is a question, we'll make another API request to StackExchange's questions endpoint to pull the body of the question.

In [5]:
import requests
# api key
ROOT_URL = "https://api.stackexchange.com"
POSTS_ENDPOINT = "/2.2/posts?fromdate=1596240000&todate=1596585600&order=desc&sort=activity&site=stackoverflow"
r = requests.get(ROOT_URL+POSTS_ENDPOINT)

In [6]:
r.status_code

200

In [7]:
r.json()

{'items': [{'owner': {'reputation': 41,
    'user_id': 14045196,
    'user_type': 'registered',
    'profile_image': 'https://www.gravatar.com/avatar/b0267f35bc49691febef17386d54728c?s=128&d=identicon&r=PG&f=1',
    'display_name': 'Dashing',
    'link': 'https://stackoverflow.com/users/14045196/dashing'},
   'score': 4,
   'last_activity_date': 1597952535,
   'creation_date': 1596507109,
   'post_type': 'question',
   'post_id': 63239300,
   'content_license': 'CC BY-SA 4.0',
   'link': 'https://stackoverflow.com/q/63239300'},
  {'owner': {'reputation': 51204,
    'user_id': 2864740,
    'user_type': 'registered',
    'accept_rate': 78,
    'profile_image': 'https://www.gravatar.com/avatar/e7a05a144f218bde07b659bc98e1ca7d?s=128&d=identicon&r=PG&f=1',
    'display_name': 'user2864740',
    'link': 'https://stackoverflow.com/users/2864740/user2864740'},
   'score': 4,
   'last_edit_date': 1597952535,
   'last_activity_date': 1597952535,
   'creation_date': 1596508518,
   'post_type': 'a

In [25]:
def get_questions(items_object):
    data = {"display_name": [], "profile_image_url": [], "post_id": [], "post_contents": []}
    
    ## Loop over the items object. For the relevant fields in the 'data' variable defined above,
    ## Populate those fields IF the type of the post is a question.
    ## If the type of a post is a question, additionally a request to the relevent API method to obtain the question body
    ## READ READ READ the documentation (or Google it 🙄) to find out how to do so
    ## The question body should be populated in the 'post_contents' field
    ## Return the data object
    for item in items_object:
        if item["post_type"] == "question":

            data["display_name"].append(item["owner"]["display_name"])
            data["profile_image_url"].append(item["owner"]["profile_image"])
            data["post_id"].append(item["post_id"])
            
            question_endpoint = "/2.2/questions/{}?order=desc&sort=activity&site=stackoverflow&filter=withbody".format(item["post_id"])
            r = requests.get(ROOT_URL+question_endpoint)
            if r.status_code == 200:
                body = r.json()["items"][0]["body"]
                data["post_contents"].append(body)
        
    return data

In [29]:
import pprint

questions = get_questions(r.json()["items"])
pprint.pprint(questions)

{'display_name': ['Dashing',
                  'Liamdale',
                  'Tinu',
                  'Uuuuuumm',
                  'user716255',
                  'Kevko',
                  'nikolifish',
                  'Tim',
                  'Tulon',
                  'Andreas',
                  'Ahmed Ghrib',
                  'Siddhesh DilipKumar',
                  'Jinkinson',
                  'Jag99',
                  'JFortYork',
                  'Rahul',
                  'Sabahat',
                  'Saad Ashraf',
                  'Somanden',
                  'Gaurav Chaudhary',
                  'Francesco Iapicca',
                  'Pixeel',
                  'xuan',
                  'JPV',
                  'Trts',
                  'Jose Maria del Olmo'],
 'post_contents': ['<p>Swift code</p>\n'
                   '<pre><code>print(&quot;1&quot;, NSObject() == NSObject())\n'
                   'print(&quot;2&quot;, ObjectIdentifier(NSObject()) == '
          

             63222719],
 'profile_image_url': ['https://www.gravatar.com/avatar/b0267f35bc49691febef17386d54728c?s=128&d=identicon&r=PG&f=1',
                       'https://www.gravatar.com/avatar/ded581240d45e256d68e6807130c2582?s=128&d=identicon&r=PG&f=1',
                       'https://i.stack.imgur.com/6wFcg.jpg?s=128&g=1',
                       'https://www.gravatar.com/avatar/dc4c77dd26dded88dd3d196be106cd94?s=128&d=identicon&r=PG&f=1',
                       'https://www.gravatar.com/avatar/4057228c332e51257f3041ab597c7c73?s=128&d=identicon&r=PG',
                       'https://lh5.googleusercontent.com/-ebmDzzMuzRQ/AAAAAAAAAAI/AAAAAAAABNM/OuhnEXZ97-k/photo.jpg?sz=128',
                       'https://www.gravatar.com/avatar/3aae252b0cef6df16e7b8c2020b050da?s=128&d=identicon&r=PG',
                       'https://i.stack.imgur.com/8htyd.jpg?s=128&g=1',
                       'https://i.stack.imgur.com/2MOY0.png?s=128&g=1',
                       'https://www.gravatar.com/ava

## Are we really getting all of the data we asked for? 

If we look in the docs, we'll see that the default and max number of items to return from a request is 100. We'll also see that each request response has a key called `has_more`. This key tells us that we haven't got all of the data, just the first page. So we need to implement a way to keep maki

In [None]:
def get_all(endpoint):
    r = requests.get(endpoint) # make initial requests
    r = r.json() # to json

    page = 1
    results = r['items']
    
    while r['has_more']: #whils 
        page += 1
        e = f'{endpoint}&page={page}'
        print('making request to:', e)
        r = requests.get(e) # make requests
        r = r.json()
        results.extend(r['items'])
        # print(len(r['items']))
        # ssdcds

        if page > 10:
            break
    # pprint(results)
    print(f'i found {len(results)} results')
    return results

In [33]:
for dn, piu, pi, pc in zip(questions["display_name"], questions["profile_image_url"], questions["post_id"], questions["post_contents"]):
    print("Display Name:", dn)
    print("Profile Image URL:", piu)
    print("Post ID:", pi)
    print("Post Body:", pc)
    print()

Display Name: Dashing
Profile Image URL: https://www.gravatar.com/avatar/b0267f35bc49691febef17386d54728c?s=128&d=identicon&r=PG&f=1
Post ID: 63239300
Post Body: <p>Swift code</p>
<pre><code>print(&quot;1&quot;, NSObject() == NSObject())
print(&quot;2&quot;, ObjectIdentifier(NSObject()) == ObjectIdentifier(NSObject()))
let object3 = NSObject()
let object4 = NSObject()
print(&quot;3&quot;, object3, object4)
print(&quot;4&quot;, ObjectIdentifier(object3) == ObjectIdentifier(object4))
</code></pre>
<p>Console result</p>
<pre><code>1 false
2 true
3 &lt;NSObject: 0x600000d805f0&gt; &lt;NSObject: 0x600000d80610&gt;
4 false
</code></pre>
<p>ObjectIdentifier compares instances using their object identifiers and the identical-to operator <code>===</code>. Why NSObject() in print(&quot;1&quot;, ...) is two object, but in print(&quot;2&quot;, ...) is same object?</p>


Display Name: Liamdale
Profile Image URL: https://www.gravatar.com/avatar/ded581240d45e256d68e6807130c2582?s=128&d=identicon&r=PG