<b> This notebook convert a mailing list (or a set of mailing lists) into a network of interaction</b>

What it does:
-it creates a network of interaction between senders and receivers of emails, on one or more mailing lists
-it generates a .gexf file that can be imported in Gephi for visualization and analysis

Parameters to set options:
-it can look in one or more mailing lists, according to how many urls are set in the ‘urls’ variable; networks are aggregated across mailing lists
-it can filter the network by date; set the variable 'date_from' and 'date_to' with a date frame consistent with the data


In [1]:
%matplotlib inline



In [2]:
from bigbang.archive import Archive
import bigbang.parse as parse
import bigbang.graph as graph
import bigbang.mailman as mailman
import bigbang.process as process
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
from pprint import pprint as pp
import pytz

In [6]:
#Insert the list of urls (one or more) from which to gather the data
#e.g. urls = [urls = ["http://mm.icann.org/pipermail/cc-humanrights/", 
                    # "http://mm.icann.org/pipermail/wp4/", 
                    # "http://mm.icann.org/pipermail/ge/"]

        
urls = ["http://mm.icann.org/pipermail/africann-brussels.csv"]


archives= [mailman.open_list_archives(url) for url in urls]
archives_merged = pd.concat(archives)
archives_data = Archive(archives_merged).data

Opening 0 archive files




MissingDataException: 'No messages in archives under http://mm.icann.org/pipermail/africann-brussels.csv. Did you run the collect_mail.py script?'

In [18]:
from bigbang.archive import load

arx = load("/Users/yunkangyang/bigbang/archives/http:/mm.icann.org/pipermail/africann-brussels.csv")

In [17]:
arx.data

Unnamed: 0_level_0,From,Subject,Date,In-Reply-To,References,Body
Message-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
<C81AE567.2641E%staff@atlarge.icann.org>,staff at atlarge.icann.org (ICANN At-Large Staff),[Africann-brussels] Welcome to the mailing lis...,2010-05-20 11:11:03,,,"Dear All,\n\nWe have created a mailing list fo..."
<C81B4A83.14D07%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] Meeting Invitation / AfrIC...,2010-05-20 18:22:27,<C8088211.185D6%staff@atlarge.icann.org>,,"Dear All,\n\nPlease note that the next call to..."
<C820054D.14FCF%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] URGENT DOODLE / Next Joint...,2010-05-24 08:28:29,,,"Dear All,\n\nFurther to Friday?s call, please ..."
<C8208927.150BA%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] DOODLE Canx / Next Joint A...,2010-05-24 17:51:03,<C820054D.14FCF%staff@atlarge.icann.org>,,"Dear All,\n\nPlease note that this Doodle poll..."
<C821B652.151FC%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] DOODLE / next Joint AfrICA...,2010-05-25 16:16:02,,,"Dear All,\n\nPlease complete the attached Dood..."
<AANLkTimP1aYMifooVuyAs6nz9FZ5BBE374Izkr1k_-r_@mail.gmail.com>,fsylla at gmail.com (Fatimata Seye Sylla),[Africann-brussels] DOODLE / next Joint AfrICA...,2010-05-25 17:49:47,<C821B652.151FC%staff@atlarge.icann.org>,<C821B652.151FC%staff@atlarge.icann.org>,"Dear All,\n\nI will not be available to attend..."
<C822ACC8.15302%gisella.gruber-white@icann.org>,Gisella.Gruber-White at icann.org (Gisella Gru...,[Africann-brussels] REMINDER - DOODLE / next J...,2010-05-26 09:47:34,<C821B652.151FC%staff@atlarge.icann.org>,,"Dear All,\n\nPlease complete the attached Dood..."
<AANLkTimuVXjMc6NIFdTV8hmYtFdRPUkeI9ftY84ZcgtX@mail.gmail.com>,capdasiege at gmail.com (CAPDA CAPDA),[Africann-brussels] REMINDER - DOODLE / next J...,2010-05-26 11:38:16,<C822ACC8.15302%gisella.gruber-white@icann.org>,<C821B652.151FC%staff@atlarge.icann.org>\n\t<C...,"Bonjour,\n\nJe serais joignable au +237 777539..."
<C8235234.15494%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] Meeting Invitation / AfrIC...,2010-05-26 21:33:24,<C81B4A83.14D07%staff@atlarge.icann.org>,,"Dear All,\n\nPlease note that the next call to..."
<C8249687.155E5%staff@atlarge.icann.org>,staff at atlarge.icann.org (At-Large Staff),[Africann-brussels] REMINDER / AfrICANN/AFRALO...,2010-05-27 20:37:09,<C8235234.15494%staff@atlarge.icann.org>,,"Dear All,\n\nPlease note that the next call of..."


Set a valid date frame for building the network. 

In [None]:
#The oldest date and more recent date for the whole mailing lists are displayed, so you WON't set an invalid time frame 
print archives_data['Date'].min()
print archives_data['Date'].max()

In [None]:
#set the date frame
date_from = pd.datetime(2015,11,1,tzinfo=pytz.utc)
date_to = pd.datetime(2016,1,1,tzinfo=pytz.utc)

Filter data according to date frame and export to .gexf file

In [None]:
def filter_by_date(df,d_from,d_to):
    return df[(df['Date'] > d_from) & (df['Date'] < d_to)]

In [None]:
#create filtered network
archives_data_filtered = filter_by_date(archives_data, date_from, date_to)
network = graph.messages_to_interaction_graph(archives_data_filtered)

In [None]:
#export the network in a format that you can open in Gephi. 

#insert a valid path and file name (e.g. path = 'c:/bigbang/network.gexf')
path = '/users/yunkangyang/network_for_gephi.gexf'

nx.write_gexf(network, path)
    
    