# <center> Simulator (custom)</center>

This code is used to create synthetic twitter datasets according to the model. We create a user graph and choose an activity pair $(\lambda,\mu)$ for each user . From there we can generate events of tweeting/retweeting where each user $i$ tweets with rate $\lambda_i$ and retweets from his newsfeed with rate $\mu_i$. The output consists of two `.txt` files, one being the adjacency list of the user graph and the other the list of tweets.

In [203]:
import numpy as np
from random import choice
from operator import itemgetter

Choose out folder where the results will be written.

In [204]:
out_folder = "../Datasets/Newman/"

## Setting parameters
Choose the number of users $N$, the number of events `nb_events` and the activity rates. The latter are in the form of two lists of length $N$: `Lambda` and `Mu` where `Lambda[i]` is the posting rate of user $i$ and `Mu[i]` is her reposting rate.

In [205]:
N = 50
nb_events = 10000

# Lambda = np.random.pareto(1.3, N)
# Mu = np.random.pareto(1.3, N)

# Lambda = [0.1 for n in range(N)]
# Mu = [0.1 for n in range(N)]

Lambda = np.random.random(N)
Mu = np.random.random(N)

## 1. User graph creation

We represent the user graph with a dictionary `Followers` where `Followers[i]` is the set of leaders of user $i$.

In [206]:
# example: graph Erdös-Rényi of parameter w
w = 0.1
Followers = {i:set() for i in range(N)}
for i in range(N):
    for j in range(N):
        if j != i and np.random.random() < w:
            Followers[i].add(j)
print("Number of edges: ", sum([len(Followers[i]) for i in range(N)]))

Number of edges:  255


Write adjacency list on file.

In [207]:
graph_out = open(out_folder + "adjList_rand+full.txt", "w")
for i in Followers:
    for j in Followers[i]:
        graph_out.write("{} {}\n".format(i,j))
graph_out.close()

## 2. Events creation

We generate a list `events` where the $i^{th}$ entry corresponds to the $i^{th}$ event occurring on the network. Each event is described as a tuple `twid timestamp userid rtid`, with
- `twid` is the unique id of the tweet, $\in \{1, \ldots, nb\_events\}$
- `timestamp` is the instant of occurence (seconds since the beginning)
- `userid` is the unique id $\in \{1, \ldots, N\}$ of the (re)tweeting user
- `rtid` is the id of the original tweet in case of retweet, else is set to -1.

In [208]:
news = {i:list() for i in range(N)} # initialization of the newsfeeds
M = 1 # newsfeeds max size
next_twid = 1 # id of the next post
time = 0 # time since the beginning
Events = list() # list of events (output)

while len(Events) < nb_events:
    
    # generate exponential variates of scale 1/lambda, 1/mu for each user
    posting_time = np.random.exponential([1/x for x in Lambda], N)
    reposting_time = np.random.exponential([1/x for x in Mu], N)
    
    # get closest posting time and reposting time ---> next event will be the closest between both
    min_post = np.min(posting_time)
    min_repost = np.min(reposting_time)
    
    # if the next event is a post
    if min_post < min_repost:
        time += min_post
        user = np.argmin(posting_time)
        new_post = (next_twid, time, user, -1) # create new post
    
    # if repost
    elif min_repost < min_post:
        time += min_repost
        user = np.argmin(reposting_time)
        if len(news[user]) == 0: # skip step if nothing to repost in the user's newsfeed
            continue
        else:
            retweeted = choice(news[user]) # choose what to retweet
            if retweeted[-1] == -1: # get original id
                rtid = retweeted[0]
            else:
                rtid = retweeted[-1]
            new_post = (next_twid, time, user, rtid) # create new_post
            
            
    # append new post to the events list and update next_twid
    Events.append(new_post)
    next_twid += 1

    # update newfeeds for followers of active user
    for j in Followers[user]:
        if len(news[j]) == M: # remove something at random if newsfeed is full
            news[j].remove(choice(news[j]))
        news[j].append(new_post) # add new post to newsfeed

Look at the first events.

In [209]:
Events[:10]

[(1, 0.10201348508006433, 20, -1),
 (2, 0.12094884559350029, 6, -1),
 (3, 0.1288907129115605, 25, 2),
 (4, 0.13907258070105855, 22, 1),
 (5, 0.1928669727552667, 10, -1),
 (6, 0.2865124262650053, 12, -1),
 (7, 0.2901194127828487, 17, -1),
 (8, 0.3379811614389025, 47, -1),
 (9, 0.3401934575074848, 43, -1),
 (10, 0.38232000000316807, 27, 6)]

Write events list to `outfolder/trace.txt`. Each line is an entry of the list.

In [210]:
out = open(out_folder + "trace_rand+full.txt", "w")
for e in Events:
    out.write("{} {} {} {}\n".format(e[0], e[1], e[2], e[3]))
out.close()

In [51]:
events = list()
for line in open(out_folder + "trace_cst+full.txt"):
    line = line.split()
    twid, ts, uid, rtid = int(line[0]), float(line[1]), int(line[2]), int(line[3])
    events.append((twid, ts, uid, rtid))

In [54]:
lambdas_ = {i:0 for i in range(N)}
mus_ = {i:0 for i in range(N)}
for e in events:
    uid, rtid = e[2], e[3]
    if rtid == -1:
        lambdas_[uid] += 1
    else:
        mus_[uid] += 1
T = e[1]
lambdas_ = {i:lambdas_[i]/T for i in lambdas_}
mus_ = {i:mus_[i]/T for i in mus_}

for i in range(N):
    print(Mu[i], mus_[i])

0.1 0.09698284162813663
0.1 0.08914584432485287
0.1 0.10873833758306228
0.1 0.10384021426850992
0.1 0.10677908825724135
0.1 0.10384021426850992
0.1 0.09894209095395758
0.1 0.12441233218962983
0.1 0.09208471831358428
0.1 0.08424772101030051
0.1 0.10775871292015181
0.1 0.09600321696522617
0.1 0.09600321696522617
0.1 0.08914584432485287
0.1 0.1116772115717937
0.1 0.09894209095395758
0.1 0.09404396763940522
0.1 0.08522734567321098
0.1 0.09894209095395758
0.1 0.09208471831358428
0.1 0.1116772115717937
0.1 0.1116772115717937
0.1 0.10188096494268899
0.1 0.09208471831358428
0.1 0.10090134027977851
0.1 0.10090134027977851
0.1 0.10188096494268899
0.1 0.0979624662910471
0.1 0.09600321696522617
0.1 0.09992171561686805
0.1 0.08326809634739003
0.1 0.08228847168447957
0.1 0.09502359230231569
0.1 0.09600321696522617
0.1 0.10873833758306228
0.1 0.11363646089761464
0.1 0.10188096494268899
0.1 0.08424772101030051
0.1 0.0646552277520911
0.1 0.11559571022343558
0.1 0.10384021426850992
0.1 0.103840214268509

In [75]:
Followers[3]

{45}