# Homework - Network of Negativity 
#### Made by Emil Thorsbjerg (thorsemi)

 Identify the network of negativity. The dataset also contain attribute `LINK_SENTIMENT` which idetifies whether the the sentiment of the reference (post) is positive or neutral (+1) or negative (-1). Create a subgraph with only the negative links and analyse the graph. Find:

In [241]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display
# Reading the dataset
df = pd.read_csv("soc-redditHyperlinks-body.tsv", sep="\t")

### 1) Find the two subreddits which are the most likely to express negative view on each other.

I filter the dataset to keep only negative sentiment links (`LINK_SENTIMENT = -1`) between subreddits, allowing the identification of the most negatively interacting pairs.

In [124]:
df["LINK_SENTIMENT"] = pd.to_numeric(df["LINK_SENTIMENT"], errors="coerce")
df_neg = df[df["LINK_SENTIMENT"] == -1].copy()
print(df_neg.head()) 

   SOURCE_SUBREDDIT TARGET_SUBREDDIT POST_ID            TIMESTAMP  \
1        theredlion           soccer  1u4qkd  2013-12-31 18:18:37   
34  karmaconspiracy            funny  1u6fz3  2014-01-01 12:44:19   
43         badkarma         gamesell  1u6t4g  2014-01-01 16:42:14   
53       casualiama        teenagers  1u70s8  2014-01-01 17:09:46   
55        australia           sydney  1u71zd  2014-01-01 17:24:46   

    LINK_SENTIMENT                                         PROPERTIES  
1               -1  101.0,98.0,0.742574257426,0.019801980198,0.049...  
34              -1  186.0,182.0,0.741935483871,0.0376344086022,0.0...  
43              -1  262.0,258.0,0.725190839695,0.0381679389313,0.0...  
53              -1  91.0,91.0,0.78021978022,0.032967032967,0.04395...  
55              -1  2547.0,2158.0,0.801334903808,0.0051040439733,0...  


After i will I count the number of negative sentiment links between subreddit pairs and sort them to identify the most negatively interacting pairs.

In [128]:
pair_counts = (
    df_neg.groupby(["SOURCE_SUBREDDIT", "TARGET_SUBREDDIT"])
    .size()
    .reset_index(name="count")
)
pair_counts = pair_counts.sort_values("count", ascending=False)
pair_counts.head(10)  

Unnamed: 0,SOURCE_SUBREDDIT,TARGET_SUBREDDIT,count
14320,writingprompts,askreddit,56
2898,circlebroke,pics,48
2274,brokehugs,christianity,47
2953,circlebroke,videos,41
11807,streetfighter,sf4,39
2793,circlebroke,askreddit,39
2881,circlebroke,news,38
2833,circlebroke,funny,35
10799,shitghazisays,gamerghazi,35
6083,ggfreeforall,kotakuinaction,34


I identify subreddit pairs with mutual negative interactions by merging reversed links and summing their negative sentiment counts to find the most negatively interacting pairs.

In [131]:
merged = pair_counts.merge(
    pair_counts,
    left_on=["SOURCE_SUBREDDIT", "TARGET_SUBREDDIT"],
    right_on=["TARGET_SUBREDDIT", "SOURCE_SUBREDDIT"],
    suffixes=("_AB", "_BA")
)
merged["total_count"] = merged["count_AB"] + merged["count_BA"]
merged_sorted = merged.sort_values("total_count", ascending=False)
merged_sorted.head(10)  

Unnamed: 0,SOURCE_SUBREDDIT_AB,TARGET_SUBREDDIT_AB,count_AB,SOURCE_SUBREDDIT_BA,TARGET_SUBREDDIT_BA,count_BA,total_count
0,writingprompts,askreddit,56,askreddit,writingprompts,2,58
332,askreddit,writingprompts,2,writingprompts,askreddit,56,58
948,gamerghazi,shitghazisays,1,shitghazisays,gamerghazi,35,36
1,shitghazisays,gamerghazi,35,gamerghazi,shitghazisays,1,36
2,hearthstone,hearthstonecirclejerk,31,hearthstonecirclejerk,hearthstone,3,34
160,hearthstonecirclejerk,hearthstone,3,hearthstone,hearthstonecirclejerk,31,34
10,drama,subredditdrama,14,subredditdrama,drama,19,33
4,subredditdrama,drama,19,drama,subredditdrama,14,33
3,gamerghazi,kotakuinaction,29,kotakuinaction,gamerghazi,2,31
338,kotakuinaction,gamerghazi,2,gamerghazi,kotakuinaction,29,31


After, I find the two subreddits with the highest mutual negative interactions by summing their negative sentiment counts and selecting the most negatively interacting pair.

In [134]:
merged["total_count"] = merged["count_AB"] + merged["count_BA"]
merged_sorted = merged.sort_values("total_count", ascending=False)
most_negative_pair = merged_sorted.iloc[0] 
print(f"The most Interacting subreddits: {most_negative_pair['SOURCE_SUBREDDIT_AB']} and {most_negative_pair['TARGET_SUBREDDIT_AB']}")

The most Interacting subreddits: writingprompts and askreddit


### 2) Find the "hubs of negativity" - the subreddits with highest betweeness, pagerank, ... centrality and describe what they mean for the network.

To start with i will I create a directed graph using only negative links to model the flow of negativity between subreddits


In [145]:
G_neg = nx.from_pandas_edgelist(
    df_neg,
    source="SOURCE_SUBREDDIT",
    target="TARGET_SUBREDDIT",
    create_using=nx.DiGraph()
)

After, I compute key centrality measures (betweenness, PageRank, in-degree, out-degree) to identify the most central subreddits in the negative network.

In [None]:
betweenness = nx.betweenness_centrality(G_neg)
pagerank = nx.pagerank(G_neg, alpha=0.85)
in_degree = nx.in_degree_centrality(G_neg)
out_degree = nx.out_degree_centrality(G_neg)

I will here combine the results into a single table to compare different centrality scores for each subreddit.

In [152]:
centrality_df = pd.DataFrame({
    "subreddit": list(betweenness.keys()),
    "betweenness": list(betweenness.values()),
    "pagerank": [pagerank[n] for n in betweenness.keys()],
    "in_degree": [in_degree[n] for n in betweenness.keys()],
    "out_degree": [out_degree[n] for n in betweenness.keys()],
})

I sort subreddits by PageRank to reveal the top 10 most influential hubs of negativity in the network.

In [159]:
top_hubs = centrality_df.sort_values("pagerank", ascending=False).head(10)
top_hubs

Unnamed: 0,subreddit,betweenness,pagerank,in_degree,out_degree
37,askreddit,0.062299,0.019529,0.063275,0.013032
102,worldnews,0.0,0.00852,0.027634,0.0
25,iama,0.005974,0.007527,0.020411,0.003297
68,videos,0.0,0.006267,0.024023,0.0
36,news,7.1e-05,0.005973,0.024965,0.000157
42,todayilearned,0.0,0.00561,0.025907,0.0
3,funny,0.005874,0.005557,0.022138,0.000628
237,ukraine,0.0,0.005217,0.000314,0.0
38,pics,0.0,0.004929,0.023709,0.0
54,writingprompts,0.016115,0.004911,0.012247,0.016643


### 3) Identify positive subreddits - ones, that never get or give negative sentiment link.

I will start by retrieve all unique subreddits present in the dataset, considering both sources and targets of interactions.

In [183]:
all_subreddits = set(df["SOURCE_SUBREDDIT"]).union(set(df["TARGET_SUBREDDIT"]))

After i will extract all subreddits that are involved in negative interactions, either as senders or receivers.

In [186]:
neg_subreddits = set(df_neg["SOURCE_SUBREDDIT"]).union(set(df_neg["TARGET_SUBREDDIT"]))

I identify subreddits that never engage in negative interactions and count the total number of such subreddits.

In [222]:
positive_df = pd.DataFrame({"Positive Subreddits": list(positive_subreddits)})
positive_df.head(11)

Unnamed: 0,Positive Subreddits
0,overwatchcirclejerk
1,casualbattling
2,datasets
3,clashabsinthe
4,shbteam
5,solr
6,ligamx
7,housedarsk
8,ffgm
9,nobackspace


Here we get a list of 10 positive subreddits that are identified as those never involved in any negative sentiment links — neither sending nor receiving — while still being active in the dataset.


### 4) Find cliques of negativity (subsets of subreddits that have negative links between each other).

I start of by creating an undirected graph using only negative sentiment links to detect tightly connected negative groups.

In [229]:
G_neg = nx.from_pandas_edgelist(
    df_neg,
    source="SOURCE_SUBREDDIT",
    target="TARGET_SUBREDDIT",
    create_using=nx.Graph()  # Ikke-dirigeret graf for at finde cliques
)

I here detect maximal cliques in the negative network, where all subreddits are interconnected through negative sentiment links.

In [232]:
negative_cliques = list(nx.find_cliques(G_neg))

I after sort and display the largest negative cliques, revealing groups of subreddits that are fully connected by negative sentiment.

In [253]:
clique_df = pd.DataFrame(clique_data)
pd.set_option("display.max_colwidth", None) 
display(clique_df.head(10)) 

Unnamed: 0,Clique Number,Subreddits
0,Clique 1,"subredditdrama, drama, askreddit, circlejerkcopypasta, circlebroke, politics, news, conspiracy, iama, undelete, the_donald"
1,Clique 2,"subredditdrama, drama, askreddit, circlejerkcopypasta, bestofoutrageculture, undelete, conspiracy, iama, the_donald, news, politics"
2,Clique 3,"subredditdrama, drama, askreddit, copypasta, circlebroke, conspiracy, iama, politics, news, undelete, the_donald"
3,Clique 4,"subredditdrama, drama, askreddit, copypasta, bestofoutrageculture, conspiracy, undelete, iama, the_donald, news, politics"
4,Clique 5,"advice, askreddit, offmychest, legaladvice, self, depression, raisedbynarcissists, suicidewatch, relationships, relationship_advice"
