# Data Requesting and Streaming

For requesting the posts of a hashtag, which are required for the project, the following are required:

- Facebook developer account

- A business Page

- Instagram Business account

- A custom App created through the Facebook development interface, which is linked to the three aforementioned bulletpoints

The following code is used for streaming the posts of a hashtag on instagram. To be able to run this code, the access token needs to be prepared by adding the following permissions:

![Picture title](docs/permissions.png)

 Afterwards, after linking the page to the app and the instagram business account to the page, the user can obtain the linked instagram accounts of the page by running the following request with the page API token:

https://graph.facebook.com/v17.0/me?fields=connected_instagram_account

The next step, while using the same API, is querying for the required hashtag id. Example:

graph.facebook.com/ig_hashtag_search
?user_id="obtained from the last request"&q="name of the hashtag"

After obtaining the hashtag ID, the recent posts are obtained with the following request:

graph.facebook.com/"Hashtag ID obtained from last query"/recent_media?access_token="ENTER_ACCESS_TOKEN"&fields=id,permalink,comments_count,like_count,media_type,media_url,timestamp,caption&user_id="ENTER_INSTAGRAM_ID&limit=50'

Such queries are also possible for top media by replacing "recent_media" with "top_media"

In [1]:
import json
import datetime
import pandas as pd
import time
import requests

In [2]:
# This is a Python blockimport requests

def getCreds() :

	creds = dict() # dictionary to hold everything
	creds['access_token'] = 'ENTER_ACCESS_TOKEN' # client id from facebook app IG Graph API Test
	creds['client_secret'] = 'ENTER_CLIENT_SECRET' # client secret from facebook app
	creds['graph_domain'] = 'https://graph.facebook.com/' # base domain for api calls
	creds['graph_version'] = 'v17.0' # version of the api we are hitting
	creds['endpoint_base'] = creds['graph_domain'] + creds['graph_version'] + '/' # base endpoint with domain and version
	creds['debug'] = 'no' # debug mode for api call
	creds['page_id'] = 'ENTER_PAGE_ID' # users page id after running the "get_user_facebook_page.py"
	creds['instagram_account_id'] = 'ENTER_ACCOUNT_ID' # users instagram account id after running the "get_user_instagram_page.py"
	creds['ig_username'] = 'ENTER_IG_USERNAME' # ig username to get details

	return creds


def makeApiCall( url, endpointParams, debug = 'no' ) :
	data = requests.get( url, endpointParams ) # make get request
	response = dict() # hold response info
	response['url'] = url # url we are hitting
	response['endpoint_params'] = endpointParams #parameters for the endpoint
	response['endpoint_params_pretty'] = json.dumps( endpointParams, indent = 4 ) # pretty print for cli
	response['json_data'] = json.loads( data.content ) # response data from the api
	response['json_data_pretty'] = json.dumps( response['json_data'], indent = 4 ) # pretty print for cli
    
	return response # get and return content

def displayApiCallData( response ) :
	""" Print out to cli response from api call """

	print ("\nURL: ") # title
	print (response['url']) # display url hit
	print ("\nEndpoint Params: ") # title
	print (response['endpoint_params_pretty']) # display params passed to the endpoint
	print ("\nResponse: ") # title
	print (response['json_data_pretty']) # make look pretty for cli

In [3]:
def debugAccessToken( params ) :

	endpointParams = dict() # parameter to send to the endpoint
	endpointParams['input_token'] = params['access_token'] # input token is the access token
	endpointParams['access_token'] = params['access_token'] # access token to get debug info on

	url = params['graph_domain'] + '/debug_token' # endpoint url

	return makeApiCall( url, endpointParams, params['debug'] ) # make the api call

params = getCreds() # get creds
params['debug'] = 'yes' # set debug
response = debugAccessToken( params ) # hit the api for some data!

print ("\nData Access Expires at: ") # label
print (datetime.datetime.fromtimestamp( response['json_data']['data']['data_access_expires_at'] )) # display out when the token expires

print ("\nToken Expires at: ") # label
print (datetime.datetime.fromtimestamp( response['json_data']['data']['expires_at'] )) # display out when the token expires


Data Access Expires at: 
2023-09-21 14:45:57

Token Expires at: 
2023-08-22 14:22:45


In [6]:
def getLongLivedAccessToken( params ) :

	endpointParams = dict() # parameter to send to the endpoint
	endpointParams['grant_type'] = 'fb_exchange_token' # tell facebook we want to exchange token
	endpointParams['client_id'] = "ENTER_CLIENT_ID" # client id from facebook app
	endpointParams['client_secret'] = params['client_secret'] # client secret from facebook app
	endpointParams['fb_exchange_token'] = params['access_token'] # access token to get exchange for a long lived token

	url = params['endpoint_base'] + 'oauth/access_token?' # endpoint url

	return makeApiCall( url, endpointParams, params['debug'] ) # make the api call

params = getCreds() # get creds
params['debug'] = 'yes' # set debug
response = getLongLivedAccessToken( params ) # hit the api for some data!

print ("\n ---- ACCESS TOKEN INFO ----\n") # section header
print ("Access Token:")  # label
print (response['json_data']['access_token']) # display access token



 ---- ACCESS TOKEN INFO ----

Access Token:
EAAETCS5I9KkBANFIqMkG6jBtxpabq5LbnpZBhe3riXQWaHwGCxNN0GIjzYbmGAmPydKlvndGW2aCwoF7siJfbryQP3mXFvCZB48jWCVvUlsLL5nDI0zvuKo3ZCwX5oCg0kMegKB6OoR29x8FQkYJUGk3EBO68cJUj8Mx3uiIvbZCH78988b4


In [3]:
params = getCreds() # get creds

In [5]:
def getHashtagInfo( params, ig_hashtag ) :
                        endpointParams = dict() # parameter to send to the endpoint
                        endpointParams['user_id'] = params['instagram_account_id'] # user id making request
                        endpointParams['q'] = ig_hashtag # hashtag name
                        endpointParams['fields'] = 'id,name' # fields to get back
                        endpointParams['access_token'] = params['access_token'] # access token

                        url = params['endpoint_base'] + 'ig_hashtag_search' # endpoint url

                        return makeApiCall( url, endpointParams, params['debug'] )
hashtag_id = getHashtagInfo( params, "fakenews" )['json_data']['data'][0]['id']

In [6]:
hashtag_id

'17843857336037659'

In [7]:
def getNewsPosts( params ) :

        endpointParams = dict() # parameter to send to the endpoint
        endpointParams['user_id'] = params['instagram_account_id'] # user id making request
        endpointParams['fields'] = 'id,permalink,comments_count,like_count,media_type,media_url,timestamp,caption' # fields to get back
        endpointParams['access_token'] = params['access_token'] # access token
        
        
        params['hashtag_id'] = hashtag_id #"example 17843857336037659"
        
        params['type'] = 'recent_media?fields=comments_count,caption,media_url' # set call to get top media for hashtag

        url = params['endpoint_base'] + params['hashtag_id'] + '/' + params['type'] # endpoint url

        return makeApiCall( url, endpointParams, params['debug'] ) # make the api call

params = getCreds() # get creds
response = getNewsPosts( params ) # get users media from the api


print (response['json_data']['data'][0])

{'id': '17999660677904260', 'permalink': 'https://www.instagram.com/reel/Ct3zPgsLwWm/', 'comments_count': 0, 'like_count': 0, 'media_type': 'VIDEO', 'media_url': 'https://scontent-iad3-1.cdninstagram.com/o1/v/t16/f1/m82/E64BF0CAFBFD2ECB9682F92E86C4419F_video_dashinit.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcyJ9&_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=108&vs=6420930774640143_409971131&_nc_vs=HBksFQIYT2lnX3hwdl9yZWVsc19wZXJtYW5lbnRfcHJvZC9FNjRCRjBDQUZCRkQyRUNCOTY4MkY5MkU4NkM0NDE5Rl92aWRlb19kYXNoaW5pdC5tcDQVAALIAQAVABgkR0pLSVBSWGtOWjd5eGlrTUFGM3RLeC16QXFKQmJxX0VBQUFGFQICyAEAKAAYABsBiAd1c2Vfb2lsATEVAAAmtuKkipLf7j8VAigCQzMsF0BS4hysCDEnGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHUAAA%3D%3D&ccb=9-4&oh=00_AfBkOSsZqDqD5A1mmy3YazaGmwrGqWvOP9466EsawlC7Aw&oe=6498B885&_nc_sid=1d576d&_nc_rid=1e7d8027bb', 'timestamp': '2023-06-24T11:45:19+0000', 'caption': 'FactCheck - The Viral Image Does Not Show The Wedding Ceremony Of Nirmala Sitharaman’s Daughter\n\nVerify it with factcrescendoindi

In [23]:
import time
import random

In [8]:
def postsLooper(url,count, posts):
    if count <1000:
        #time.sleep(random.uniform(1, 3))

        print(count)
        print(len(posts))
        response=makeApiCall(url,"","")
        
        try:
            print(response)
            next_page= response['json_data']['paging']['next']
        except Exception as e:
            print(e) 
            print(response)
            return posts
        print("Adding posts from batch")
        try:
            for post in response["json_data"]["data"]:
                posts.append(post)
        except Exception as e:
            print(e)
            print(response)
            
            return posts

    if count == 1000:
        return posts
    return postsLooper(next_page,count+1,posts)

Begins loop for obtaining posts. The loop iterates after every request and continues requesting based on the page links obtained in the json response.

In [9]:
posts3= postsLooper('https://graph.facebook.com/v17.0/'+hashtag_id+'/top_media?access_token=ENTER_ACCESS_TOKEN&fields=id%2Cpermalink%2Ccomments_count%2Clike_count%2Cmedia_type%2Cmedia_url%2Ctimestamp%2Ccaption&user_id=17841460143570104&limit=50',0,[])

aid,#prayfortrump,#prayforpresidenttrump,#Jesus,#pray,#prophetic,#prayfordonaldjtrump,#God,#Trumpwon,#trump2024,#trumpwon,#2000mules,#gop,#fakenews,#djt,#marjorietaylorgreene, #israel #trumpets,#trump #twitter #donaldtrump #cpac, #presidenttrumpnews,#elonmusk #kevinmccarthy,#mattgaetz, #karilake"\n        },\n        {\n            "id": "18001158424884191",\n            "permalink": "https://www.instagram.com/p/Ctuz7rDMp-M/",\n            "comments_count": 220,\n            "like_count": 1647,\n            "media_type": "IMAGE",\n            "media_url": "https://scontent-iad3-2.cdninstagram.com/v/t39.30808-6/355489936_654996886668595_2416544798251008753_n.jpg?_nc_cat=103&ccb=1-7&_nc_sid=8ae9d6&_nc_ohc=ngzOy8lCXbkAX99kT8E&_nc_ht=scontent-iad3-2.cdninstagram.com&edm=APCawUEEAAAA&oh=00_AfD_JsBeT0Je25C6Z6h9LqKQwGRMrAa9oXib4o_1jRHPlA&oe=649BC004",\n            "timestamp": "2023-06-20T23:57:04+0000",\n            "caption": "PL DAS FAKE NEWS | O ministro Alexandre de Moraes, do Supremo Tr

In [11]:
post_df2=pd.DataFrame(posts3)


In [12]:
post_df2.to_csv('fakenews_posts_top2.csv')

In [13]:
len(post_df2)

48046

In [54]:
post_df2

Unnamed: 0,id,permalink,comments_count,media_type,media_url,timestamp,caption,like_count
0,17995805485951380,https://www.instagram.com/p/CtzKB9Fqgzs/,27,IMAGE,https://scontent-iad3-2.cdninstagram.com/v/t51...,2023-06-22T16:27:07+0000,The submarine that went missing during a missi...,
1,18252464551088991,https://www.instagram.com/p/CtwWtb4qH8_/,31,CAROUSEL_ALBUM,,2023-06-21T14:20:12+0000,"The submarine carrying five individuals, inclu...",
2,17931593675701711,https://www.instagram.com/p/CtzFYGFB0J7/,305,CAROUSEL_ALBUM,,2023-06-22T15:46:27+0000,Swipe ⬅️ The oxygen supply of 96 hours on the ...,6162.0
3,17975632772357476,https://www.instagram.com/p/Ct0z_5oSoEV/,62,CAROUSEL_ALBUM,,2023-06-23T07:53:05+0000,Everything you should know about Aryans»\n.\n....,5756.0
4,17990473949033476,https://www.instagram.com/p/CtxHc42hyxK/,45,CAROUSEL_ALBUM,,2023-06-21T21:26:06+0000,It doesn’t sound like a ruff job. A US billion...,1564.0
...,...,...,...,...,...,...,...,...
18739,18077861137366922,https://www.instagram.com/p/CtL8d_NPt8y/,3,IMAGE,https://scontent-iad3-2.cdninstagram.com/v/t51...,2023-06-07T10:58:14+0000,#NewsUpdate #knowledgeofself #knowledgeispower,1092.0
18740,17940648821559487,https://www.instagram.com/p/Ctvu1Qupj_-/,0,CAROUSEL_ALBUM,,2023-06-21T08:31:44+0000,😍 #காரைக்காலில் இன்று 9வது சர்வதேச யோகா தின நி...,2822.0
18741,17985677819143795,https://www.instagram.com/p/Ctz8mVXLr9v/,0,IMAGE,https://scontent-iad3-1.cdninstagram.com/v/t51...,2023-06-22T23:49:00+0000,Perú tiene nueva Miss Grand y es la hermosa lu...,10.0
18742,18099416200325640,https://www.instagram.com/p/CtgOIgnMH_D/,1,IMAGE,https://scontent-iad3-1.cdninstagram.com/v/t51...,2023-06-15T07:57:24+0000,વાવાઝોડા અસરગ્રસ્ત વિસ્તારોમાં જો મોબાઈલ નેટવર...,815.0


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=4c8af7b1-f3b8-45ab-bbdc-6a32713107d1' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>