# Getting your data

To run any script and to query the API in general, you will need a token. A code is generated every time you install the facebook.tracking.exposed

You can use the test one or enter you own. Read this if you don't know how to get your token: link.

In [None]:
token = ""
print("Your token: "+token)

Import the necessary libraries. In this example we commented out the hierarchical configuration used to call scripts from the command line.

In [None]:
import pandas as pd
from src.lib import API, tools
# from src.lib.config import config
import datetime

print('Done!')

Now you can call the dataframe with `API.getDf()`. You need to specify `amount` and `skip`. This will determine the amount of entries you will get (each of them is a single post) and how many you are skipping. Defaults are `400` and `0`.

In [None]:
amount = 1000
skip = 0

df = API.getDf(token, 'summary', amount, skip)
print('Done!')

This is how the data looks like:

In [None]:
from IPython.display import display

display(df)

You can also download the data using:

```
df.to_csv('your_file')
```

## Manipulating dates

Now you can check the timeframe of the data you pulled.

In [None]:

df = tools.setDatetimeIndex(df)
maxDate = str(df.index.max())[:-6]
minDate = str(df.index.min())[:-6]
print('Information for timeframe: '+minDate+' to '+maxDate)

If you need, you can also cut it to get, in this example, the last 24 hours only.

In [None]:
start = datetime.datetime.today()-datetime.timedelta(days=1)
end = datetime.datetime.today()
df = tools.setTimeframe(df, str(start), str(end))
print('From '+str(start)+' to '+str(end)+'\n')

## Your stats

You can get useful insights for yourself, for example you can estimate the you time spent of facebook during that timeframe.

In [None]:
timelines = df.timeline.unique()
total = pd.to_timedelta(0)

for t in timelines:
    ndf = tools.filter(t, df=df, what='timeline', kind='or')
    timespent = ndf.index.max() - ndf.index.min()
    total += timespent
    
print('Total time spent on Facebook: '+str(total))

Or the time spent watching ads.

In [None]:
nature = df.nature.value_counts()

try:
    percentage = str((nature.sponsored/nature.organic)*100)[:-12]
except:
    nature['sponsored'] = 0
    percentage = str((nature.sponsored/nature.organic)*100)
    
print(percentage+'% of the posts are sponsored posts.')

timeads = (total.seconds)*(nature.sponsored/nature.organic)
print('You spent an estimate of '+str(datetime.timedelta(seconds=(timeads)))[:-7]+' watching ads on Facebook.')

You can also check which are the top news that are informing you.

In [None]:
n = 5
top = df.source.value_counts().nlargest(n)
print('Top '+str(n)+' sources of information are: \n'+top.to_string())

Of course, you can display this data graphically.

In [None]:
top.plot.pie(autopct='%.2f', fontsize=13, figsize=(6, 6))

## Experimenting with altair viz tools

Getting all posts with link

In [None]:
import altair as alt

# for the notebook only (not for JupyterLab) run this command once per session
alt.renderers.enable('notebook')

alt.Chart(df).mark_point().encode(
    x='impressionTime',
    y='LIKE',
    color='source'
).interactive()

In [None]:
alt.Chart(df).transform_calculate(
    url='https://www.facebook.com' + alt.datum.permaLink
).mark_point().encode(
    x='publicationTime:T',
    y='LIKE:Q',
    color='nature:N',
    href='url:N',
    tooltip=['source:N', 'url:N']
)
