Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep track of timeline collection times and size for pseudonym/list #119

Open
1 of 5 tasks
berli0z opened this issue Feb 28, 2019 · 0 comments
Open
1 of 5 tasks

keep track of timeline collection times and size for pseudonym/list #119

berli0z opened this issue Feb 28, 2019 · 0 comments

Comments

@berli0z
Copy link
Contributor

berli0z commented Feb 28, 2019

We want to extract and visualize data that helps monitoring so called "bot" facebook profiles as the ones used in the italian elections to see when are we capturing data and if everything is running smoothly.

  • In order to do this we will write a python script that will use APIs and given a user pseudonym will retrieve anonymous information about its activity (impressions collected and at what times).
  • Allow to keep track of different bots given a list of user pseudonyms
  • Add hierarchical configuration
  • visualize this via the telegram bot and maybe through a web intreface,
  • produce a document that explains how to use it. Details about this should be included in the texts related to setting up bots for facebook impressions collection in eu19.

Some code:

import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.dates as mdates
import numpy as np

date_format = "%Y-%m-%dT%H:%M:%S.%fZ"

#CSV file downloaded from "Your Data" page
file = 'file.csv'

df = pd.read_csv(file)

timelinelist = []
for index, row in df.iterrows():
    id = row['timeline']
    if not id in timelinelist:
        timelinelist.append(id)
        print(id)
    else:
        pass

maxlist = []
dateslist = []
for timeline in timelinelist:
    df2 = df[df['timeline'] == timeline]
    maxim = df2.impressionOrder.values.max()
    maxlist.append(maxim)
    minim = df2.impressionTime.values.min()
    dateslist.append(datetime.strptime(minim, date_format))
    del df2

results = list(zip(dateslist, maxlist))

data = pd.DataFrame(results, columns= ['time', 'impressions'])
today = datetime.now()
data = data[data['time'] > today.date()]
plt.plot(data.time, data.impressions)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.gca().xaxis.set_major_locator(mdates.AutoDateLocator())
plt.yticks(np.arange(0, data.impressions.values.max()+10, step=5))
plt.show()```
@berli0z berli0z changed the title monitoring timeline collection for single user keep track of timeline collection times and size for pseudonym/list Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant