# Facebook Data Crawling
In this notebook, we will be crawling data from Facebook using the Facebook Graph API. We will be using the facebook-scraper

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Install the required library
We will be using the facebook-scraper library to crawl data from Facebook. We will install this library using pip.

In [None]:
%pip install facebook_scraper pandas numpy

Collecting facebook_scraper
  Downloading facebook_scraper-0.2.59-py3-none-any.whl (45 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
Collecting dateparser<2.0.0,>=1.0.0 (from facebook_scraper)
  Downloading dateparser-1.2.0-py2.py3-none-any.whl (294 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting demjson3<4.0.0,>=3.0.5 (from facebook_scraper)
  Downloading demjson3-3.0.6.tar.gz (131 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.5/131.5 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting requests-html<0.11.0,>=0.10.0 (from facebook_scraper)
  Downloading requests_html-0.10.0-py3-none-any.whl (13 kB)
Collecting pyque

In [None]:
from facebook_scraper import get_posts
import pandas as pd
import numpy as np

## Crawl the data using facebook_scraper
Now we can get the data from Facebook using the facebook_scraper library. We will be using the get_posts function to get the posts from the fanpage. This function will return a list of dictionaries, where each dictionary represents a post. We will be saving this list of dictionaries to a json file. More information about what you can do with the facebook_scraper library can be found here: https://github.com/kevinzg/facebook-scraper

## Define variables
First we have to define some variables that we will be using throughout the notebook.
- FANPAGE_LINK: The link to the fanpage that we want to crawl data from. This can be found by going to the fanpage and copying the link from the address bar. For example, the link to the fanpage of the [Nintendo Switch](https://www.facebook.com/NintendoSwitch/) is https://www.facebook.com/NintendoSwitch/. We will be using this link as the value for FANPAGE_LINK.

- COOKIE_PATH: The path to the cookie file that we will be using to authenticate with Facebook. This cookie file can be obtained by logging into Facebook and copying the cookie from the browser. For example, in Chromium, use extension [Get cookies.txt LOCALLY](https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid) to get the cookie file. Then save the cookie to a file and use the path to this file as the value for COOKIE_PATH. <span style="color:red; font-weight:bold">USE COOKIE FROM A FAKE ACCOUNT, OTHERWISE YOUR REAL ACCOUNT MIGHT GET BANNED.</span>.


- FOLDER_NAME: The name of the folder that we will be saving the data to. This folder will be created in the same directory as this notebook.

In [None]:
FANPAGE_LINK ="LiverpoolFC"
FOLDER_PATH = "/content/drive/MyDrive/python"
COOKIE_PATH = "/content/drive/MyDrive/python/www.facebook.com_cookies (1).txt"

PAGES_NUMBER = 15 # Number of pages to crawl

In [None]:
post_list = []
import time
for post in get_posts(FANPAGE_LINK,
                    options={"comments": True, "reactions": True, "allow_extra_requests": True},
                    extra_info=True, pages=PAGES_NUMBER, cookies=COOKIE_PATH):
    print(post)
    post_list.append(post)
    if len(post_list) % 2 == 0:
      time.sleep(10)

Output hidden; open in https://colab.research.google.com to view.

## Convert list of dicts to df

Now we can convert the list of dictionaries to a pandas dataframe. We will be using the pandas library to do this. We will also be saving the dataframe to a xlxs or csv file.

In [None]:
# Initialize dataframe to scrape Facebook post
post_df_full = pd.DataFrame(columns=post_list[0].keys(), index=range(len(post_list)), data=post_list)

# To df
path=FOLDER_PATH + FANPAGE_LINK + ".csv"
post_df_full.to_csv(path, index=False)
print(path)

/content/drive/MyDrive/pythonLiverpoolFC6.csv


In [None]:
POST_IDS = post_df_full[100:500]['post_id'].values

In [None]:
POST_IDS

In [None]:
postids = pd.DataFrame(POST_IDS)

In [None]:
path=FOLDER_PATH  + "postids.csv"
postids.to_csv(path, index=False)
print(path)

In [None]:
POST_IDS=pd.read_csv("/content/drive/MyDrive/postids.csv")

In [None]:
id = POST_IDS[100:120].values.flatten()

In [None]:
id

array([894430725385093, 894427942052038, 894415262053306, 894407862054046,
       894402835387882, 894401828721316, 894395958721903, 894382052056627,
       894380925390073, 894375272057305, 894370892057743, 894368978724601,
       894363978725101, 894352248726274, 894346722060160, 894343502060482,
       894342355393930, 894337608727738, 894336522061180, 894334985394667])

In [None]:
post_list1 = []
import time
for post in get_posts(post_urls=id,
                    options={"comments": True, "reactions": True, "allow_extra_requests": True},
                    cookies=COOKIE_PATH):
    print(post)
    post_list1.append(post)
    if len(post_list1) % 2 == 0:
      time.sleep(20)

Output hidden; open in https://colab.research.google.com to view.

In [None]:
# Initialize dataframe to scrape Facebook post
post_df_full = pd.DataFrame(columns=post_list1[0].keys(), index=range(len(post_list1)), data=post_list1)

# To df
path=FOLDER_PATH + FANPAGE_LINK + "5.csv"
post_df_full.to_csv(path, index=False)
print(path)

In [None]:
datas1 = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC1.csv")
datas2 = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC2.csv")
datas3 = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC3.csv")
datas4 = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC4.csv")
datas5 = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC5.csv")

In [None]:
final1 = datas1[datas1['comments']>0]
final2 = datas2[datas2['comments']>0]
final3 = datas3[datas3['comments']>0]
final4 = datas4[datas4['comments']>0]
final5 = datas5[datas5['comments']>0]
final = pd.concat([final1,final2,final3,final4,final5],ignore_index=True)

  final = pd.concat([final1,final2,final3,final4,final5],ignore_index=True)


In [None]:
final['post_id'] = final['original_request_url']
final.drop('original_request_url',axis=1,inplace=True)
final


Unnamed: 0,post_url,post_id,text,post_text,shared_text,original_text,time,timestamp,image,image_lowquality,...,reaction_count,with,page_id,sharers,image_id,image_ids,video_ids,videos,was_live,fetched_time
0,https://facebook.com/story.php?story_fbid=pfbi...,901339178027581,Back in Premier League action tomorrow 🙌\n\nTr...,Back in Premier League action tomorrow 🙌\n\nTr...,,Back in Premier League action tomorrow 🙌,2023-11-24 11:50:00,1.700827e+09,https://scontent-lga3-1.xx.fbcdn.net/v/t39.308...,https://scontent-lga3-1.xx.fbcdn.net/v/t39.308...,...,31247.0,,6.792038e+10,,,[],[],[],False,2023-11-26 03:22:14.201844
1,https://facebook.com/story.php?story_fbid=pfbi...,901360151358817,Stevie G with the sublime 💫 On this day in 2007 🔴,Stevie G with the sublime 💫 On this day in 2007 🔴,,,2023-11-24 11:01:14,1.700824e+09,,https://scontent-lga3-1.xx.fbcdn.net/v/t15.525...,...,7412.0,,6.792038e+10,,,[],,,False,2023-11-26 03:24:33.043208
2,https://facebook.com/story.php?story_fbid=pfbi...,901339551360877,Mo on his magical moment against Manchester Ci...,Mo on his magical moment against Manchester Ci...,,,2023-11-24 10:01:22,1.700820e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,11212.0,,6.792038e+10,,,[],,,False,2023-11-26 03:25:58.090632
3,https://facebook.com/story.php?story_fbid=pfbi...,901303768031122,Two years ago today... simply STUNNING from Th...,Two years ago today... simply STUNNING from Th...,,,2023-11-24 08:22:26,1.700814e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,33732.0,,6.792038e+10,,,[],,,False,2023-11-26 03:27:30.182185
4,https://facebook.com/story.php?story_fbid=pfbi...,900881971406635,"On this day in 2019, the resilient Reds found ...","On this day in 2019, the resilient Reds found ...",,"On this day in 2019, the resilient Reds found ...",2023-11-23 12:35:05,1.700743e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,8498.0,,6.792038e+10,,,[],,,False,2023-11-26 03:29:20.102784
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,https://facebook.com/story.php?story_fbid=pfbi...,895548368606662,Every angle of Diogo Jota18's fine finish from...,Every angle of Diogo Jota18's fine finish from...,,Every angle of Diogo Jota18's fine finish from...,2023-11-14 13:11:27,1.699967e+09,,https://scontent-lga3-1.xx.fbcdn.net/v/t15.525...,...,70367.0,,6.792038e+10,,,[],,,False,2023-11-28 08:02:45.413059
75,https://facebook.com/story.php?story_fbid=pfbi...,895664538595045,Liverpool Legends will take on AFC Ajax Legend...,Liverpool Legends will take on AFC Ajax Legend...,LIVERPOOLFC.COM\nLFC Legends to face AFC Ajax ...,Liverpool Legends will take on AFC Ajax Legend...,2023-11-14 16:00:41,1.699978e+09,,https://external-lga3-2.xx.fbcdn.net/emg1/v/t1...,...,11818.0,,6.792038e+10,,,[],,,False,2023-11-28 10:11:05.900207
76,https://facebook.com/story.php?story_fbid=pfbi...,895615255266640,Virgil van Dijk is optimistic we can continue ...,Virgil van Dijk is optimistic we can continue ...,LIVERPOOLFC.COM\nVirgil van Dijk: It's going w...,,2023-11-14 15:07:44,1.699974e+09,,https://external-lga3-2.xx.fbcdn.net/emg1/v/t1...,...,8939.0,,6.792038e+10,,,[],,,False,2023-11-28 10:13:17.571370
77,https://facebook.com/story.php?story_fbid=pfbi...,895538875274278,Watch the extended action from Sunday's 3-0 wi...,Watch the extended action from Sunday's 3-0 wi...,,Watch the extended action from Sunday's 3-0 wi...,2023-11-14 12:09:18,1.699964e+09,,https://scontent-lga3-1.xx.fbcdn.net/v/t15.525...,...,25232.0,,6.792038e+10,,,[],,,False,2023-11-28 10:14:17.821346


In [None]:
datas = pd.read_csv("/content/drive/MyDrive/pythonLiverpoolFC.csv")
final0 = datas[datas['comments']>0]
final = pd.concat([final,final0],ignore_index=True)

  final = pd.concat([final,final0],ignore_index=True)


Unnamed: 0,post_url,post_id,text,post_text,shared_text,original_text,time,timestamp,image,image_lowquality,...,reaction_count,with,page_id,sharers,image_id,image_ids,video_ids,videos,was_live,fetched_time
0,https://facebook.com/story.php?story_fbid=pfbi...,901339178027581,Back in Premier League action tomorrow 🙌\n\nTr...,Back in Premier League action tomorrow 🙌\n\nTr...,,Back in Premier League action tomorrow 🙌,2023-11-24 11:50:00,1.700827e+09,https://scontent-lga3-1.xx.fbcdn.net/v/t39.308...,https://scontent-lga3-1.xx.fbcdn.net/v/t39.308...,...,31247.0,,6.792038e+10,,,[],[],[],False,2023-11-26 03:22:14.201844
1,https://facebook.com/story.php?story_fbid=pfbi...,901360151358817,Stevie G with the sublime 💫 On this day in 2007 🔴,Stevie G with the sublime 💫 On this day in 2007 🔴,,,2023-11-24 11:01:14,1.700824e+09,,https://scontent-lga3-1.xx.fbcdn.net/v/t15.525...,...,7412.0,,6.792038e+10,,,[],,,False,2023-11-26 03:24:33.043208
2,https://facebook.com/story.php?story_fbid=pfbi...,901339551360877,Mo on his magical moment against Manchester Ci...,Mo on his magical moment against Manchester Ci...,,,2023-11-24 10:01:22,1.700820e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,11212.0,,6.792038e+10,,,[],,,False,2023-11-26 03:25:58.090632
3,https://facebook.com/story.php?story_fbid=pfbi...,901303768031122,Two years ago today... simply STUNNING from Th...,Two years ago today... simply STUNNING from Th...,,,2023-11-24 08:22:26,1.700814e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,33732.0,,6.792038e+10,,,[],,,False,2023-11-26 03:27:30.182185
4,https://facebook.com/story.php?story_fbid=pfbi...,900881971406635,"On this day in 2019, the resilient Reds found ...","On this day in 2019, the resilient Reds found ...",,"On this day in 2019, the resilient Reds found ...",2023-11-23 12:35:05,1.700743e+09,,https://scontent-lga3-2.xx.fbcdn.net/v/t15.525...,...,8498.0,,6.792038e+10,,,[],,,False,2023-11-26 03:29:20.102784
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116,https://facebook.com/LiverpoolFC/posts/9014187...,901418754686290,Mohamed Salah believes his and Liverpool's lev...,Mohamed Salah believes his and Liverpool's lev...,LIVERPOOLFC.COM\nMohamed Salah on Man City v L...,,2023-11-24 14:00:11,1.700834e+09,,https://external-iad3-1.xx.fbcdn.net/emg1/v/t1...,...,7547.0,,6.792038e+10,,,[],,,False,2023-11-25 15:54:51.561276
117,https://facebook.com/LiverpoolFC/posts/6825586...,6825586720887614,We’re LIVE as Jürgen Klopp previews tomorrow’s...,We’re LIVE as Jürgen Klopp previews tomorrow’s...,,We’re LIVE as Jürgen Klopp previews tomorrow’s...,2023-11-24 13:27:25,1.700832e+09,,https://scontent-iad3-1.xx.fbcdn.net/v/t15.525...,...,12418.0,"[{'name': 'AXA', 'link': 'https://facebook.com...",6.792038e+10,,,[],,,False,2023-11-25 15:55:14.544979
118,https://facebook.com/LiverpoolFC/posts/8955783...,895578365270329,,,,,2023-11-14 14:18:40,1.699972e+09,https://scontent-iad3-1.xx.fbcdn.net/v/t39.308...,https://scontent-iad3-1.xx.fbcdn.net/v/t39.308...,...,13795.0,,6.792038e+10,,8.955784e+14,['895578365270329'],,,False,2023-11-25 16:14:40.881012
119,https://facebook.com/LiverpoolFC/posts/8948211...,894821158679383,,,,,2023-11-13 08:50:45,1.699865e+09,https://m.facebook.com/photo/view_full_size/?f...,https://scontent-iad3-1.xx.fbcdn.net/v/t39.308...,...,3382.0,,6.792038e+10,,,[],,,False,2023-11-25 16:18:13.695622


In [None]:
path=FOLDER_PATH + "final"  + ".csv"
final.to_csv(path, index=False)
print(path)

/content/drive/MyDrive/pythonfinal.csv
