# <center> Simulator (custom)</center>

This code is used to create synthetic twitter datasets according to the model. We create a user graph and choose an activity pair $(\lambda,\mu)$ for each user . From there we can generate events of tweeting/retweeting where each user $i$ tweets with rate $\lambda_i$ and retweets from his newsfeed with rate $\mu_i$. The output consists of two `.txt` files, one being the adjacency list of the user graph and the other the list of tweets.

In [4]:
import numpy as np
from random import choice
from operator import itemgetter

Choose out folder where the results will be written.

In [5]:
out_folder = "../Datasets/Newman/"

## Setting parameters
Choose the number of users $N$, the number of events `nb_events` and the activity rates. The latter are in the form of two lists of length $N$: `Lambda` and `Mu` where `Lambda[i]` is the posting rate of user $i$ and `Mu[i]` is her reposting rate.

In [14]:
N = 100
w = 0.1
nb_events = 40*w*N*(N-1)

# Lambda = np.random.pareto(1.3, N)
# Mu = np.random.pareto(1.3, N)

Lambda = [0.1 for n in range(N)]
Mu = [0.1 for n in range(N)]

# Lambda = np.random.random(N)
# Mu = np.random.random(N)

## 1. User graph creation

We represent the user graph with a dictionary `Followers` where `Followers[i]` is the set of leaders of user $i$.

In [15]:
# example: graph Erdös-Rényi of parameter w
Followers = {i:set() for i in range(N)}
for i in range(N):
    for j in range(N):
        if j != i and np.random.random() < w:
            Followers[i].add(j)
print("Number of edges: ", sum([len(Followers[i]) for i in range(N)]))

Number of edges:  951


Write adjacency list on file.

In [16]:
graph_out = open(out_folder + "adjList_scaleTest.txt", "w")
for i in Followers:
    for j in Followers[i]:
        graph_out.write("{} {}\n".format(i,j))
graph_out.close()

## 2. Events creation

We generate a list `events` where the $i^{th}$ entry corresponds to the $i^{th}$ event occurring on the network. Each event is described as a tuple `twid timestamp userid rtid`, with
- `twid` is the unique id of the tweet, $\in \{1, \ldots, nb\_events\}$
- `timestamp` is the instant of occurence (seconds since the beginning)
- `userid` is the unique id $\in \{1, \ldots, N\}$ of the (re)tweeting user
- `rtid` is the id of the original tweet in case of retweet, else is set to -1.

In [17]:
news = {i:list() for i in range(N)} # initialization of the newsfeeds
M = 1 # newsfeeds max size
next_twid = 1 # id of the next post
time = 0 # time since the beginning
Events = list() # list of events (output)

while len(Events) < nb_events:
    
    # generate exponential variates of scale 1/lambda, 1/mu for each user
    posting_time = np.random.exponential([1/x for x in Lambda], N)
    reposting_time = np.random.exponential([1/x for x in Mu], N)
    
    # get closest posting time and reposting time ---> next event will be the closest between both
    min_post = np.min(posting_time)
    min_repost = np.min(reposting_time)
    
    # if the next event is a post
    if min_post < min_repost:
        time += min_post
        user = np.argmin(posting_time)
        new_post = (next_twid, time, user, -1) # create new post
    
    # if repost
    elif min_repost < min_post:
        time += min_repost
        user = np.argmin(reposting_time)
        if len(news[user]) == 0: # skip step if nothing to repost in the user's newsfeed
            continue
        else:
            retweeted = choice(news[user]) # choose what to retweet
            if retweeted[-1] == -1: # get original id
                rtid = retweeted[0]
            else:
                rtid = retweeted[-1]
            new_post = (next_twid, time, user, rtid) # create new_post
            
            
    # append new post to the events list and update next_twid
    Events.append(new_post)
    next_twid += 1

    # update newfeeds for followers of active user
    for j in Followers[user]:
        if len(news[j]) == M: # remove something at random if newsfeed is full
            news[j].remove(choice(news[j]))
        news[j].append(new_post) # add new post to newsfeed

Look at the first events.

In [18]:
Events[:10]

[(1, 0.19810903208170605, 50, -1),
 (2, 0.7140163339160908, 65, -1),
 (3, 0.8047981023603267, 50, 2),
 (4, 0.8853206869417127, 44, -1),
 (5, 0.9624986267202988, 56, 2),
 (6, 0.9751831801809541, 47, -1),
 (7, 1.0221095076712632, 79, -1),
 (8, 1.0474477197348255, 12, 7),
 (9, 1.1348709504679284, 41, -1),
 (10, 1.1472306672544499, 90, 4)]

Write events list to `outfolder/trace.txt`. Each line is an entry of the list.

In [19]:
out = open(out_folder + "trace_scaleTest.txt", "w")
for e in Events:
    out.write("{} {} {} {}\n".format(e[0], e[1], e[2], e[3]))
out.close()