Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closeness centrality running without interruption #64

Closed
sopeKhadim opened this issue Apr 25, 2020 · 9 comments
Closed

Closeness centrality running without interruption #64

sopeKhadim opened this issue Apr 25, 2020 · 9 comments

Comments

@sopeKhadim
Copy link

sopeKhadim commented Apr 25, 2020

Hi,
I am using Teneto to calculate the closeness centrality. My code is running more than two days without stopping. what is the problem ?
Resources

  • Processor(s): 32
  • Threads par core: 2
  • Core per socket : 8
  • Model : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
  • Memory : 130 giga

Code:

import pandas as pd
import teneto as tn
import numpy as np
from teneto import TemporalNetwork

filename = 'data/Sociopatterns/primaryschool.csv'
df_PS = pd.read_csv(filename, names = ['t', 'i', 'j', 'Ci', 'Cj'], header=None, delimiter='\t')
df_PS_Day = df_PS[ df_PS['t'] < 63000] 
df_PS_final = df_PS_Day[[ 't', 'i', 'j']]
df_PS_final
t i j
0 31220 1558 1567
1 31220 1560 1570
2 31220 1567 1574
3 31220 1632 1818
4 31220 1632 1866
... ... ... ...
60618 62300 1719 1859
60619 62300 1720 1843
60620 62300 1739 1872
60621 62300 1767 1787
60622 62300 1800 1820

60623 rows × 3 columns

# The time resolution between two signal is 20 seconds.
# This function normalize it. 
def contact_graph(data, cfg=None, resolution=0): 
    
    nodes = pd.concat([data['i'], data['j']]) 
    nodes = nodes.sort_values()
    nodes = nodes.unique()
    nb_nodes = len(nodes)
    
    ## changes the label of each node to be the index of np.array
    keys = nodes
    values = list(range(nb_nodes))
    nodeslabels = dict(zip(keys, values))
   
    if resolution == 0 :
        times = data['t']
        times = times.sort_values()
        times = times.unique()
        n_times = len(times)
    else:
        times = range(min(data['t']), max(data['t'])+resolution, resolution)
        n_times = len(times)
    
    ## changes the time counter 
    keys_times = times
    items_times = list(range(n_times))
    timesCounter  = dict(zip(keys_times,items_times ))          
    
    df = data.copy()
    frame =pd.DataFrame()
    frame['i'] = df['i'].map(nodeslabels)
    frame['j'] = df['j'].map(nodeslabels)
    frame['t'] = df['t'].map(timesCounter)
                  
    contacts = [tuple([frame.iloc[i,0], frame.iloc[i,1], frame.iloc[i,2]]) for i in range(len(frame))]
    contacts = np.array(contacts)
    contacts = contacts[contacts[:, 2].argsort()]
    
    return contacts
contacts_PS = contact_graph(df_PS_final, resolution=20)
contacts_PS[:50]
array([[ 58,  63,   0],
       [ 59,  64,   0],
       [ 63,  66,   0],
       [ 85, 185,   0],
       [ 85, 209,   0],
       [101, 114,   0],
       [186, 194,   0],
       [186, 209,   0],
       [186, 194,   1],
       [183, 189,   1],
       [141, 187,   1],
       [101, 114,   1],
       [ 85, 185,   1],
       [ 63,  66,   1],
       [ 58,  63,   1],
       [ 85, 209,   1],
       [101, 114,   2],
       [179, 191,   2],
       [179, 181,   2],
       [159, 168,   2],
       [141, 187,   2],
       [ 85, 209,   2],
       [ 58,  62,   2],
       [ 62,  66,   2],
       [ 62,  63,   2],
       [ 59,  64,   2],
       [ 58,  63,   2],
       [ 63,  66,   2],
       [ 85, 186,   3],
       [179, 181,   3],
       [150, 152,   3],
       [101, 114,   3],
       [ 85, 185,   3],
       [ 85, 209,   3],
       [ 62,  66,   3],
       [ 62,  63,   3],
       [ 58,  63,   3],
       [ 58,  62,   3],
       [ 36,  51,   3],
       [ 63,  66,   3],
       [183, 189,   4],
       [150, 153,   4],
       [101, 114,   4],
       [ 85, 209,   4],
       [ 63,  66,   4],
       [ 58,  63,   4],
       [ 17,  39,   4],
       [ 80,  87,   4],
       [162, 164,   5],
       [158, 169,   5]])
tnetHS = TemporalNetwork(from_edgelist=list(contacts_PS))
tnetHS.network.head()
i j t
0 58 63 0
1 59 64 0
2 63 66 0
3 85 185 0
4 85 209 0
 closeness = tn.networkmeasures.shortest_temporal_path(tnetHS)
@wiheto
Copy link
Owner

wiheto commented Apr 26, 2020

Hi,

Unfortunately the current code in shortest_temporal_paths is currently quite slow in its current implementation, especially for larger networks (exponentially grows with size). I've only really used in on networks about 200 nodes in size and 400 time points. It is something on the todo list to increase the speed.

@wiheto
Copy link
Owner

wiheto commented Apr 26, 2020

So the code should finish eventually, but I can't say how long it will take.

@sopeKhadim
Copy link
Author

Thank you for your response,
Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?
I think other measures depend on it, like betweeness.

@wiheto
Copy link
Owner

wiheto commented Apr 30, 2020

Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?

I'll boost it up the priority queue cause it is something I've wanted to fix for sometime.

@zhao-snoe
Copy link

Have you solved your problem?At present, I also met the problem of using the same as you in the calculation

@zhao-snoe
Copy link

In your article, I saw that you used more than 200 nodes and more than 400 time points. How long did it take you to calculate the closness centrality? I think this may have time-reference significance for my current calculations. Because I calculate 60 nodes, 51 networks, and 10% sparsity. My code is running more than four days without stopping.

@wiheto
Copy link
Owner

wiheto commented Dec 23, 2020

Development on the toolbox stalled this year because of other reasons.

The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.

You can use an older version of the code which should be quicker:

I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5

Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.

@zhao-snoe
Copy link

zhao-snoe commented Dec 24, 2020 via email

@wiheto
Copy link
Owner

wiheto commented Feb 17, 2021

So now I have time to solve this issue. I'm gathering all the issues regarding speed into one issue. Follow #74 for more info about this.

@wiheto wiheto closed this as completed Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants