First, I'm going to save my application ID and application secret in some variables here. You can replace these values with the ones you've been provided after setting up your own app. 

In [1]:
APP_ID = ""
APP_SECRET = ""

Now, let's import the libraries we need. I'm also going to go ahead and define a function called ``pp``, which stands for "pretty print." **All** the items that we will retrieve from Facebook are in a format known as JSON (Javascript Object Notation). These objects are a bit complicated and it's not always intuitive how they're organized. We may have to print them out on the console so we can visually inspect them. This function I'm creating, ``pp``, is just an easy way to print JSON objects in the clearest fashion possible.

In [2]:
import facebook
import json

def pp(o):
    print(json.dumps(o, indent=4))

# Creating the Graph API Object

First, we'll make an instandce of Facebook's Graph API, and store it in the variable ``g``. To do this, we call the ``facebook`` module's method ``GraphAPI``. 

In [3]:
g = facebook.GraphAPI()

Next, we have to provide the API instance we have just made with the credentials to access Facebook. The way we do this is a bit strange: we set the value of the API's attribute ``access_token`` to the output of its method ``get_app_access_token``. We provide our ``APP_ID`` and ``APP_SECRET`` to that method. 

In [4]:
g.access_token = g.get_app_access_token(APP_ID, APP_SECRET)

At this point, our API is set up and our credentials are in place. We can begin using the object ``g``, which represents our access to Facebook's Graph API, to pull information from the site. 

# Pulling a Specific Entity from Facebook

The first method we'll cover is called ``get_object``. Using this method, we can grab any **publically available entity** on Facebook and pull it into our Python environment. To do so, we'll need a unique ID that represents that entity. Usually, Facebook entities have two forms of ID: a human-readable string, and a numeric ID. You can usually get an entity's human-readable ID by visiting their Facebook page in a browser. For example, IU's page on Facebook is at the following URL:

``https://www.facebook.com/IndianaUniversity``

The last part - after the final slash - is the entity's string ID. We can use this as an argument to ``get_object`` to pull IU's page into our environment. 

In [5]:
iu = g.get_object("IndianaUniversity")

So, what exactly is in this object, which we've saved in the variable ``iu``? It's a JSON object. Let's just go ahead and print it out and see what's in it. We'll use our handy function ``pp`` to print in with white space that makes it easier to read.

In [6]:
pp(iu)

{
    "name": "Indiana University",
    "id": "54210104594"
}


Okay, so it's a dictionary with two keys and two values. The key ``id`` gives us the numeric ID for IU's Facebook page (in the even that we needed it), and there's a second key ``name`` that gives us the name associated with the page.

Not very helpful.

# Requesting Specific Fields with an Entity

Unlike with Twitter's API, when you request an object from Facebook's Graph API, you **also** must specify which descriptive fields you want, as well. Otherwise, you won't get anything beyond the basics. 

[At Facebook's API documentation](https://developers.facebook.com/docs/graph-api/reference/page/), we can obtain a list of fields that are associated with an entity of type ``page``. The field ``about``, for example, gives us a short description of the page. The field ``founded`` tells us when the institution associated with the page was founded. There's a field called ``location`` that gives us the location of the entity, and there's a field called ``cover`` that gives us information about the page's cover, the image you see at the top of the page when you visit it. Let's summon the IU object again using ``get_object``, but this time, let's pass a ``fields`` parameter with a string that lists the fields we want separated by a comma.

In [7]:
iu = g.get_object("IndianaUniversity", fields="id,about,founded,location,cover")

Again, we use ``pp`` to print out the entire object, and we can see that there's a lot more information now that we specifically requested it.

In [8]:
pp(iu)

{
    "cover": {
        "offset_x": 0,
        "cover_id": "10155448242469595",
        "source": "https://scontent.xx.fbcdn.net/v/t31.0-8/s720x720/19466484_10155448242469595_1693554707084663366_o.jpg?oh=6b40da92662119f7f2f6dd33767ddb2e&oe=5A5F23C9",
        "offset_y": 36,
        "id": "10155448242469595"
    },
    "about": "Founded in 1820, Indiana University is the state's flagship university and Bloomington is its flagship campus. Located on a beautiful residential campus in a delightful small Midwestern city, IU offers the quintessential college experience.",
    "location": {
        "street": "107 S Indiana Ave",
        "country": "United States",
        "latitude": 39.168219326237,
        "zip": "47405",
        "longitude": -86.520571694297,
        "state": "IN",
        "city": "Bloomington"
    },
    "id": "54210104594",
    "founded": "1820"
}


A top-level JSON object is treated in Python as a dictionary. If we have a **key**, we can obtain the corresponding values. The keys in this case are simply the names of the fields we requested. For example, if we need to extract the year founded from this JSON object, it's easy:

In [9]:
iu['founded']

'1820'

## JSON Objects in JSON Objects

What happens if we use the ``location`` key? What do we get?

In [10]:
pp(iu['location'])

{
    "street": "107 S Indiana Ave",
    "country": "United States",
    "latitude": 39.168219326237,
    "zip": "47405",
    "longitude": -86.520571694297,
    "state": "IN",
    "city": "Bloomington"
}


Notice that IU's location is **itself** a JSON object. To clarify: ``iu`` is a JSON object. It stores several values under different keys. You retrieve the value you want by providing it the relevant key. Sometimes, the value stored under a key is **also** a JSON object. It follows the same rules. So by typing ``iu['location']`` we get the JSON object representing IU's location. By typing ``iu['location']['street']``, we get the street value of the **location** JSON object in the **IU** JSON object. Be prepared, when working with JSON you'll often have to deal with this Inception style of objects-in-objects-in-objects.

In [11]:
print(iu['location']['street'])
print(iu['location']['city'])

107 S Indiana Ave
Bloomington


# Getting Entities Linked to an Entity

So far, we can get an entity - in this case, IU's Facebook page - and extract piece of information associated with it. However, more often you'll want to explore the entitites that are *linked* to the current entity. IU's founding date is one thing - just a static, single descriptive piece of information. However, you're probably more interested in the **posts** that are made on IU's page, or the **events** that are being advertised. Posts and events aren't *attributes* of IU's page, they are separate entites on their own right that are *associated with* IU's page. 

If you have a **target entity** and you want to get other entities that are somehow associated with it, you use the API function ``get_connections``. 

The first argument is the ID of your target entity.

The second argument is the **associated class of object** that you want to retrieve.

So, for example, if your question is: what are the **events (associated object)** that **IU (target entity)** has posted?

In [12]:
iu_events = g.get_connections("IndianaUniversity", "events")

Weirdly enough, the value we receive and store in the ``iu_events`` is **not** a list or iterable. Instead, it's a single JSON object. The list of events is actually under the key ``data``. This is just a quirk of how Facebook's API is designed. If you want to iterate through the events you've retrieved, you have to find those events in ``iu_events['data']``. 

In [13]:
iu_events_data = iu_events['data']

Let's explore the first event. 

In [14]:
first_event = iu_events_data[0]
pp(first_event)

{
    "end_time": "2017-10-19T19:30:00-0400",
    "id": "579970422391946",
    "start_time": "2017-10-19T17:30:00-0400",
    "name": "This Muslim American Life lecture by Author Moustafa Bayoumi",
    "description": "Bayoumi will discuss what the War on Terror looks like from the vantage point of Muslim Americans, highlighting the profound effect that surveillance has had on how they live their lives. To be a Muslim American today often means to exist in an absurd space between exotic and dangerous, victim and villain, simply because of the assumptions people hold. Bayoumi exposes how contemporary politics, movies, novels, media experts and more have together produced a culture of fear and suspicion that not only willfully forgets the Muslim-American past, but also threatens all of our civil liberties in the present.  This Muslim American Life: Dispatches from the War on Terror was awarded the 2016 Evelyn Shakir Non-Fiction Arab American Book Award",
    "place": {
        "name": "IU 

We can see that this single event JSON object has several properties, such as ``description``, ``id``, ``name``, ``end_time``, ``start_time``, and ``place``. Look closely, and you'll notice that the property ``place`` **is itself a JSON object**. The place JSON object has a property called location that **is itself a JSON object**. More JSON inception. 

In [15]:
print("What is the name of the event?")
print(first_event['name'])
print("When does the event start?")
print(first_event['start_time'])
print("What is the name of the place where the event is being held?")
print(first_event['place']['name'])
print("What is the street of the location of the place of the event (JSON Inception, 3 Layers)?")
#print(first_event['place']['location']['street'])

What is the name of the event?
This Muslim American Life lecture by Author Moustafa Bayoumi
When does the event start?
2017-10-19T17:30:00-0400
What is the name of the place where the event is being held?
IU Bloomington Hodge Hall Room 2075
What is the street of the location of the place of the event (JSON Inception, 3 Layers)?


I was surprised to see that the most recent event on IU's page was from 2012. I double checked in the browser, and this is indeed the last event posted, so it looks like IU is no longer using Facebook to promote its own events. IU runs its own proprietary event management system, so a decision may have been made to use only that.

## Getting Other Linked Entities

Using the same ``get_connections`` method you can obtain the entities that ``IndianaUniversity`` likes, you can obtain the photos on the ``IndianaUniversity`` page, or you can obtain the posts on the ``IndaianaUniversity`` page. You just change the second argument to the method accordingly. There are **many** types of entities in the Facebook API. We can't cover them here, but you can find a comprehensive list at [Facebook's API documentation](https://developers.facebook.com/docs/graph-api/reference). 

In [16]:
iu_likes = g.get_connections("IndianaUniversity", "likes")
iu_posts = g.get_connections("IndianaUniversity", "posts")

print("What is the first item that the IU page has liked?")
print(iu_likes['data'][0]['name'])
print("*"*50)
print("What's the first post on the IU page?")
pp(iu_posts['data'][0])


What is the first item that the IU page has liked?
Indiana University Police Academy
**************************************************
What's the first post on the IU page?
{
    "message": "We wish to express our deep appreciation for all the hard-working men and women who make it possible for our students to receive a quality Indiana University education. Enjoy your Labor Day, Hoosiers! You've earned it.",
    "created_time": "2017-09-04T16:07:54+0000",
    "id": "54210104594_10155696080239595"
}


In [17]:
iu_posts['data'][0]

{'created_time': '2017-09-04T16:07:54+0000',
 'id': '54210104594_10155696080239595',
 'message': "We wish to express our deep appreciation for all the hard-working men and women who make it possible for our students to receive a quality Indiana University education. Enjoy your Labor Day, Hoosiers! You've earned it."}

# Crawling the Facebook Graph: Jumping from Entity to Entity

Like we discussed when exploring the Twitter API, the real power of navigating Facebook's Graph API comes from jumping among different entity types. It's in this fashion that we're able to extract a thematic subset of the data this is of interest to our research question. 

Let's say you're a social media coordinator for IU and you're interested in seeing **who likes the posts on the IU Facebook page**. You want to collect information about these users for research purposes. 

Our journey through the Graph would look something like this: 

* Start at the IU page node
* Get all the connections to posts (e.g. get IU's posts)
* For each retrieved post, get its connections to likes (e.g. get each post's likes)
* For each like, get the name of the user who made the like

Before this, we'd initialize an empty list to store each user's name as we encountered it. Here's how such code would look, in compact form (note, I'm only getting the first 10 of IU's posts just to make things run faster):

In [18]:
likers = []

iu_posts = g.get_connections("IndianaUniversity", "posts")

for post in iu_posts['data'][:10]:
    post_id = post['id'] # Get the post's ID so we can refer to it
    post_likes = g.get_connections(post_id, "likes") # For this post, we've gotten the likes
    for post_like in post_likes['data']: # Data in a collection is stored under the key 'data'
        likers.append((post_like['id'], post_like['name'])) # Get the id and name of the liker and store it  

If you take a look inside of the variable ``likers``, you can see that we've succesfully saved all the individuals who liked the first 10 posts on IU's page. Each person is stored in the form of a tuple, with the first value being the user's Facebook ID, and the second being their name.

In [19]:
likers[:5]

[('1365446470190484', 'Maryann Listman'),
 ('10214293819408533', 'Gabriela Harsanyi'),
 ('10155190629693736', 'Debbie Gembala'),
 ('1494007207328770', 'Jack Mccracken'),
 ('10210202766534745', 'Melinda Goldbaum Stephan')]

# Facebook Privacy

According to Facebook's API, a user entity has a very large amount of attributes that may be called. Attributes like ``birthday``, ``education``, and ``hometown`` are pretty self-explanatory. Let's get the ID of the first person in the list ``likers`` and download some information about them.

In [20]:
userid = likers[0][0] # The first item in the first tuple of likers is the first person's numeric ID
user = g.get_object(userid, fields="id,birthday,education,hometown")
pp(user)

{
    "id": "1365446470190484"
}


Notice that of the four fields I requested, only ID was returned. The user's birthday, education, and hometown are simply not in the returned object. Why? Facebook privacy controls.

Unless you happen to be Facebook friends with this individual, if you logged in on Facebook using a browser and searched for this person, all you'd be able to see is their name, and probably their cover and profile photo. Everything else is marked as private: only this person's Facebook friends can see their information. 

Privacy controls are entirely in place on the API side of things, as well. IU's Facebook page is **public**. If you like a post on a public facebook page, **your action** is public as well. So we can mine IU's Facebook page, and we can see who likes their posts. But if we try to find more about a person who likes a post, we can't find out anything since that information is privacy-protected.

For the purposes of this class, those of you who are interested in mining Facebook will have to be restricted to obtaining only information that is publically available. There are ways to *obtain* information behind privacy controls, but that involves making your app go "public" and having people install your app and consent to certain pieces of their information being available to it. Perhaps you've installed a Facebook app at some point in the past and saw a window informing you that "such-and-such app wants access to x, y, and z." Realistically, you're going to have a hard time convincing people to open their profiles to your app for data mining purposes, and even if you convince one or two people, you no longer have a representive sample that can properly mined by a data mining algorithm.

# Iterating through Lists


Like with Twitter, when you request a "list" or "iterable" of something, Facebook does not give it all to you at once. It provides it in "chunks," or "pages" which is a metaphor for the way that you browse a list of objects in a browser: you are presented with a limited amount of objects, then have to click on some "next" button to go to the "next page" of objects.

In [21]:
print(len(iu_posts['data']))

25


As you can see, there are only 25 posts returned when we ask the API to give us posts that the IU page has posted. Obviously, in total, there are more than 25. If you want to grab a certain number of posts, you have to *iterate* through the pages. 

Notice that when you summon a list of items from the API, in there are other keys in addition to the ``data`` key that you can access. One of those keys is ``paging``.

In [22]:
pp(iu_posts['paging'])

{
    "next": "https://graph.facebook.com/v2.10/54210104594/posts?access_token=221276718403933%7C3a8-IWfe189HJhUyeEGOypo6waQ&limit=25&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5QTFOREl4TURFd05EVTVORG90TnpBM05qYzBPREV5TXpnMk1ERTBNakF5T1E4TVlYQnBYM04wYjNKNVgybGtEeDAxTkRJeE1ERXdORFU1TkY4eE1ERTFOVFl4T1Rnd01UazVORFU1TlE4RWRHbHRaUVpaakUwWUFRPT0ZD",
    "cursors": {
        "before": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5QTFOREl4TURFd05EVTVORG90T0Rnd056azBOREkyTlRnek1qQTFOemd4T0E4TVlYQnBYM04wYjNKNVgybGtEeDAxTkRJeE1ERXdORFU1TkY4eE1ERTFOVFk1TmpBNE1ESXpPVFU1TlE4RWRHbHRaUVpaclhwYUFRPT0ZD",
        "after": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5QTFOREl4TURFd05EVTVORG90TnpBM05qYzBPREV5TXpnMk1ERTBNakF5T1E4TVlYQnBYM04wYjNKNVgybGtEeDAxTkRJeE1ERXdORFU1TkY4eE1ERTFOVFl4T1Rnd01UazVORFU1TlE4RWRHbHRaUVpaakUwWUFRPT0ZD"
    }
}


The value under `paging` is itself a JSON object with two keys, one called ``previous`` and the other called ``next``. These keys store an *entire URL* indicating *where the previous or next page of the items you requested is located*.

How do we use this URL to get to the next page? So far, we have learned the functions ``get_object`` and ``get_connections``. But these take object identifiers as their arguments. How can we pull something from the API using a URL? 

Fortunately, we have a method from the Facebook API available to us called ``request``. *Unfortunately*, the argument that ``requests`` takes is a little strange. When you look at the value you get for the "previous" and "next" page, you'll notice that it starts with this:

```
https://graph.facebook.com/v2.10
```

Basically, the ``request`` method requires that we provide it everything in the ``next`` URL that is *after* this part. This is a sort of hamfisted way to do it, but to accomplish this I'm going to take the ``next`` string, but remove its first 31 characters. This will give us the piece we need. 

In [23]:
len("https://graph.facebook.com/v2.10")

32

In [24]:
next_string = iu_posts['paging']['next'][32:] # Remove the first 31 characters
next_page_results = g.request(next_string)
print("The first post on the SECOND page of results:")
pp(next_page_results['data'][0])

The first post on the SECOND page of results:
{
    "message": "We are incredibly grateful for this remarkably generous gift from anonymous donors that will provide financial support to students at Indiana University's Kelley School of Business and Indiana University Lilly Family School of Philanthropy.",
    "created_time": "2017-08-08T23:20:00+0000",
    "id": "54210104594_10155616417199595"
}


Since we know that each iteratable returned to us by facebook has a ``paging`` key with a ``next`` URL inside of it, we can then use a loop to get the first ``x`` pages of results. Here's the code. 

In [25]:
# Get the first 5 pages of IU's posts

# We'll start an empty list to store our results
posts = []

# Get the first page of results the normal way.

posts_results = g.get_connections("IndianaUniversity", "posts")

for p in posts_results['data']:
    posts.append(p)
    
# This sets the next_url to the url pointing to the next page of results
next_url = posts_results['paging']['next'][32:]

# Iterate over the next 4 pages. Only 4, because we've gotten the first page, and we want a total of 5. 

for i in range(4): # A range is simply an easy way to make the following list with 4 numbers: [0, 1, 2, 3] 
    # Using the next_url we have, get the next page of results
    posts_results = g.request(next_url)
    # Iterate through results data and append items to our storage variable posts
    for p in posts_results['data']:
        posts.append(p)
    # Set next_url to this request's next page URL
    # Since we're in a loop, this code will run again
    # But we've updated the next_url variable, so it'll go to the next page
    next_url = posts_results['paging']['next'][32:]

Now we've captured 5 pages of posts. There are 125 posts in total, we got 25 posts per request.

In [26]:
len(posts)

125

Now we could iterat through the ``posts`` object and do what we need to do with each post.

In [27]:
for p in posts[:5]:
    pp(p)
    print("*"*50)

{
    "message": "We wish to express our deep appreciation for all the hard-working men and women who make it possible for our students to receive a quality Indiana University education. Enjoy your Labor Day, Hoosiers! You've earned it.",
    "created_time": "2017-09-04T16:07:54+0000",
    "id": "54210104594_10155696080239595"
}
**************************************************
{
    "message": "If there's a \"most beautiful college campus\" list, you know Indiana University is on it.",
    "created_time": "2017-09-02T12:15:00+0000",
    "id": "54210104594_10155688838274595"
}
**************************************************
{
    "message": "An Indiana University education can take you anywhere! And wherever Hoosiers go, we #RaiseTheFlag to show our school spirit. These are just a few of our favorite photos from August. Share your own photos and see more at http://raisetheflag.iu.edu/",
    "story": "Indiana University added 6 new photos.",
    "created_time": "2017-09-01T12:05:00+