# Ising Model Bot Detector

This notebook demonstrates how to apply the Ising model bot detector to a set of tweets.

In [1]:
import math
import datetime, time
import random
import numpy as np
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from networkClassifierHELPER import *
from sklearn import metrics



# Load Tweets into Dataframe

Read the tweets in the data file into a dataframe.  The tweets are in the file `"data/tweets_pizzagate.parquet.gz"`.

In [None]:
fname_tweets = "data/tweets_pizzagate.parquet.gz"

df_tweets = pd.read_parquet(fname_tweets)
print("Tweet dataframe loaded")

# Create Retweet Graph

Extract the source user for each retweet and create the retweet graph `Gretweet`.  The edge format is `(source,retweeter,num_retweets)`.

We save `Gretweet` to a pickle file so we don't have to create it everytime.

In [None]:
print("Building retweet network")
fname_Gretweet = "data/Gretweet_pizzagate_bot_detection.gpickle"

Gretweet = retweet_network_from_tweets_for_bot_detection(df_tweets)             
nx.write_gpickle(Gretweet,fname_Gretweet)

# Apply Ising Model Bot Detector

Use the function `ising_bot_detector` to calculate the bot probability of each node.  The probabilities are returned in a dictionary `bot_probability` where the key is the node screen name.

In [None]:
fname_Gretweet = "data/Gretweet_pizzagate_bot_detection.gpickle"
Gretweet = nx.read_gpickle(fname_Gretweet)
nv = Gretweet.number_of_nodes()
ne = Gretweet.number_of_edges()

print(f"Retweet graph has {nv} nodes and {ne} edges")
print("Find bots with Ising model algorithm")
bot_probability = ising_bot_detector(Gretweet)

Find bots
	Computing graph cut
	Calculating Ising bot probabilities
Node 0


# Bot Probability Histogram

Plot a histogram of the bot probabilities.  You can set a threshold probability that separates the bulk of the distribution from a separate high probability cluster.

In [None]:
df_botprob = pd.DataFrame({'screen_name':[v for v in bot_probability.keys()],
                            'bot_probability':[bot_probability[v] for v in bot_probability.keys()]})
fig = plt.figure(figsize=(8,6))
sns.histplot(data = df_botprob, x = 'bot_probability')
plt.xlabel("Bot probability",fontsize = 14)
plt.ylabel("Frequency",fontsize = 14)
plt.grid()
plt.show()
