# <center> Simulator </center>

This code is used to create synthetic twitter datasets according to the model. We create a user graph and choose an activity pair $(\lambda,\mu)$ for each user . From there we can generate events of tweeting/retweeting where each user $i$ tweets with rate $\lambda_i$ and retweets from his newsfeed with rate $\mu_i$. The output consists of two `.txt` files, one being the adjacency list of the user graph and the other the list of tweets.

In [1]:
import numpy as np
import random as random
from operator import itemgetter

Choose out folder where the results will be written.

In [2]:
out_folder = "./"

## Setting parameters
Choose the number of users $N$, the number of events `nb_events` and the activity rates. The latter are in the form of two lists of length $N$: `Lambda` and `Mu` where `Lambda[i]` is the posting rate of user $i$ and `Mu[i]` is her reposting rate.

In [3]:
N = 50
nb_events = 10000

Lambda = np.random.random(N)
Mu = np.random.random(N)

## 1. User graph creation

We represent the user graph with a dictionary `Followers` where `Followers[i]` is the set of leaders of user $i$.

In [4]:
# example: graph Erdös-Rényi of parameter w
w = 0.1
Followers = {i:set() for i in range(N)}
for i in range(N):
    for j in range(N):
        if j != i and np.random.random() < w:
            Followers[i].add(j)
print("Number of edges: ", sum([len(Followers[i]) for i in range(N)]))

Number of edges:  230


Write adjacency list on file `out_folder/adjList.txt`.

In [5]:
graph_out = open(out_folder + "adjList.txt", "w")
for i in Followers:
    for j in Followers[i]:
        graph_out.write("{} {}\n".format(i,j))
graph_out.close()

## 2. Timestamps creation

This section is to create a list of the events that will occurr on the network, each entry being of the form `userid timestamp event_type`, with
- `userid` is the unique id $\in \{1, \ldots, N\}$ of the (re)tweeting user
- `timestamp` is the instant of occurence (seconds since the beginning)
- `event_type` is a string that indicates if the event is an original post ('post') or a repost ('repost').

There are 3 functions:
- `exponential_timestamps` creates events with exponential inter-arrival times, i.e. the activity of any user follows a Poisson process
- `hyperexp_timestamps` creates events with hyper-exponential inter-arrival times
- `constant_timestamps` creates events with constant inter-arrival times.

For each one, a maximal number of events must be precised to control the size of the output list. Events are generated user by user, first posts then reposts.

### 2.1 Exponential timestamps
Waiting time between two posts (resp. reposts) from user $i$ is distributed with the density $\lambda_i e^{-\lambda_i x}$ (resp. $\mu_i e^{-\mu_i x}$).

In [None]:
def exponential_timestamps(lambdas, mus, nb_events):
    
    # init
    N = len(lambdas)
    events = list()
    
    # generate user by user
    for j in range(N):
        lambd, mu = lambdas[j], mus[j]
        
        # first posts
        if lambd > 0:
            time = 0
            for n in range(nb_events):
                dice = random.expovariate(lambdas[j])
                time += dice
                events.append((j, time, 'post'))
            
        # then reposts
        if mu > 0:
            time = 0
            for n in range(nb_events):
                dice = random.expovariate(mus[j])
                time += dice
                events.append((j, time, 'repost'))
        
    # end
    return sorted(events, key=itemgetter(1))[:nb_events]

### 2.2 Hyper-exponential timestamps
Waiting time before the next post from user $i$ is distributed as follows:
- with proba $p_i$ it is exponential of parameter $\lambda_i^{(1)}$
- with proba $1 - p_i$ it is exponential of parameter $\lambda_i^{(2)}$

Note that if we set $p_i=10/11$, $\lambda_i^{(1)} = 10 \lambda_i$ and $\lambda_i^{(2)} = 0.1 \lambda_i$ then the inter-arrival times have the same mean that if we used exponential distribution of parameter $\lambda_i$.

Behavior for reposts is similar, with $q$ instead of $p$.

<b> Warning! </b> If $\lambda_i^{(1)} > 0$ we require $\lambda_i^{(2)} > 0$ and reversely. Same for $\mu$.

In [None]:
def hyperexp_timestamps(lambdas1, lambdas2, mus1, mus2, p, q, nb_events):
    
    # init
    N = len(lambdas1)
    events = list()
    
    # generate user by user
    for j in range(N):
        p, q, lambd1, lambd2, mu1, mu2 = p[j], q[j], lambdas1[j], lambdas2[j], mus1[j], mus2[j]
        
        # first posts
        if lambd1 > 0:
            time = 0
            for n in range(nb_events):
                dice = random.random()
                if dice < p:
                    time += random.expovariate(lambd1)
                else:
                    time += random.expovariate(lambd2)
                events.append((n, time, 'post')
            
        # then reposts
        if mu1 > 0:
            time = 0
            for n in range(nb_events):
                dice = random.random()
                if dice < q:
                    time += random.expovariate(mu1)
                else:
                    time += random.expovariate(mu2)
                events.append((n, time, 'repost'))
        
    # end
    return sorted(events, key=itemgetter(1))[:nb_events]

### 2.3 Constant timestamps
Here we just have to choose an inter-arrival time for each user and it will always be the same. `inter_post` is the list of inter-posting times and `inter_repost` the list of inter-reposting times. Setting one of those to zero makes the user never take the corresponding action (post or repost).

In [None]:
def constant_timestamps(inter_post, inter_repost nb_events):
    
    # init
    N = len(inter_post)
    events = list()
    
    # generate user by user
    for j in range(N):
        t,s = inter_post[j], inter_repost[j]
        
        # first posts
        if t > 0:
            time = 0
            for n in range(nb_events):
                time += t
                events.append((j, time, 'post'))
            
        # then reposts
        if s > 0:
            time = 0
            for n in range(nb_events):
                time += s
                events.append((j, time, 'repost'))
        
    # end
    return sorted(events, key=itemgetter(1))[:nb_events]

We generate a list `Events` where the $i^{th}$ entry corresponds to the $i^{th}$ event occurring on the network. Each event is described as a tuple `twid timestamp userid rtid`, with
- `twid` is the unique id of the tweet, $\in \{1, \ldots, nb\_events\}$
- `timestamp` is the instant of occurence (seconds since the beginning)
- `userid` is the unique id $\in \{1, \ldots, N\}$ of the (re)tweeting user
- `rtid` is the id of the original tweet in case of retweet, else is set to -1.

In [6]:
news = {i:list() for i in range(N)} # initialization of the newsfeeds
M = 1 # newsfeeds max size
next_twid = 1 # id of the next post
time = 0 # time since the beginning
Events = list() # list of events (output)

while len(Events) < nb_events:
    
    # generate exponential variates of scale 1/lambda, 1/mu for each user
    posting_time = np.random.exponential([1/x for x in Lambda], N)
    reposting_time = np.random.exponential([1/x for x in Mu], N)
    
    # get closest posting time and reposting time ---> next event will be the closest between both
    min_post = np.min(posting_time)
    min_repost = np.min(reposting_time)
    
    # if the next event is a post
    if min_post < min_repost:
        time += min_post
        user = np.argmin(posting_time)
        new_post = (next_twid, time, user, -1) # create new post
    
    # if repost
    if min_repost < min_post:
        time += min_repost
        user = np.argmin(reposting_time)
        if len(news[user]) == 0: # skip step if nothing to repost in the user's newsfeed
            continue
        else:
            retweeted = choice(news[user]) # choose what to retweet
            if retweeted[-1] == -1: # get original id
                rtid = retweeted[0]
            else:
                rtid = retweeted[-1]
            new_post = (next_twid, time, user, rtid) # create new_post
            
            
    # append new post to the events list and update next_twid
    Events.append(new_post)
    next_twid += 1

    # update newfeeds for followers of active user
    for j in Followers[user]:
        if len(news[j]) == M: # remove something at random if newsfeed is full
            news[j].remove(choice(news[j]))
        news[j].append(new_post) # add new post to newsfeed

Look at the first events.

In [7]:
Events[:10]

[(1, 0.019944872312488128, 2, -1),
 (2, 0.06049742454360553, 28, -1),
 (3, 0.07310548686901136, 36, -1),
 (4, 0.12442863488915959, 28, -1),
 (5, 0.12514990876846757, 41, -1),
 (6, 0.13585355226380755, 41, 1),
 (7, 0.13700234596857155, 20, 3),
 (8, 0.14718610545218702, 28, -1),
 (9, 0.18037391661276286, 45, -1),
 (10, 0.18128343169803307, 47, -1)]

Write events list to `outfolder/trace.txt`. Each line is an entry of the list.

In [8]:
out = open(out_folder + "trace.txt", "w")
for e in Events:
    out.write("{} {} {} {}\n".format(e[0], e[1], e[2], e[3]))
out.close()