# EIT 2018 Dataset

In [1]:
import pandas as pd
import numpy as np

In [2]:
eit_participants = pd.read_excel('eit2018.xlsx', sheetname = 'Participants')
eit_marketplace = pd.read_excel('eit2018.xlsx', sheetname = 'Marketplace')
eit_meetings = pd.read_excel('eit2018.xlsx', sheetname = 'Meetings')

In [3]:
eit_participants.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 295 entries, 0 to 294
Data columns (total 16 columns):
Activated                                                                                                                      295 non-null bool
Email                                                                                                                          295 non-null object
First Name                                                                                                                     295 non-null object
Last Name                                                                                                                      295 non-null object
Job description                                                                                                                295 non-null object
Organisation                                                                                                                   295 non-null object
Organisation Ty

In [4]:
eit_marketplace.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 356 entries, 0 to 355
Data columns (total 6 columns):
ID                  356 non-null int64
Participant ID      356 non-null int64
Opportunity Type    356 non-null object
Title               356 non-null object
Description         356 non-null object
Update At           356 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(3)
memory usage: 16.8+ KB


In [5]:
eit_meetings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1344 entries, 0 to 1343
Data columns (total 19 columns):
ID                    1344 non-null int64
Host ID               1344 non-null int64
Host Name             1344 non-null object
Host Organisation     1344 non-null object
Host Country          1344 non-null object
Guest ID              1344 non-null int64
Guest Name            1344 non-null object
Guest Organisation    1344 non-null object
Guest Country         1344 non-null object
Status                1344 non-null object
Date                  829 non-null datetime64[ns]
Time                  829 non-null datetime64[ns]
Table/Booth           829 non-null float64
Host Rating           0 non-null float64
Host Comments         0 non-null float64
Guest Rating          0 non-null float64
Guest Comments        0 non-null float64
Updated At            1344 non-null datetime64[ns]
Created At            1344 non-null datetime64[ns]
dtypes: datetime64[ns](4), float64(5), int64(3), object(7

## Observations/Notes
**Visual**
1. Sheets *Feedback* and *Payments* are empty and unusable
2. Sheet *Participants* has network data/surveyed information and has to be the primary dataset in creating insight
3. Sheet *Marketplace* has some nominal data sorted by Participant ID and no names
4. Sheet *Meetings* contain the ID-to-Participant Name keys
  - Possible Network map of Host to Guests could yield insight
5. The names in *Meetings* do not match the structure of the names in *Participants* so information on *Marketplace* and *Meetings* cannot be reliably merged with *Partcipants*

**Notes**
1. *Participants* data shows networking opportunity AND existing network.  This would mean the metrics yielded from this analysis would show POIs in present and future situations.  Next step would be to find commonality between POIs and relate it back to the population in general.
2. *Meetings* show Host-to-Guest relationship, but this is a heterogenous relationship.
3. Social Network Analysis metrics is the most effective in creating meaningful insight when the people/nodes are homogenously distributed, not heterogenously distributed.

**Potential Visualizations**
1. Who you want to meet
2. Who you know
3. Institutions you want to meet
4. Pillars
  a. Total Pillar
  b. Main Pillar
  c. Sub Pillar
5. Country
6. Areas of Activity
7. Organisation and Organisation Type

**Issues**
1. *Marketplace* and *Meetings* have participants and id pairs but these do not coincide with the *Participants* sheet which contain the bulk of the data from which to gather insight from.
2. Out of 295 *Participants*, ~56 filled out the survey.
```
From the list of people below, with whom do you have a reliable relationship or have worked together in the last 18 months?    58 non-null object
Which participants listed below would you be most interested in meeting?                                                       51 non-null object
Which insitutions listed below would you be most interested in meeting?                                                        53 non-null object
Please describe shortly the idea for the project you would like to submit (250 characters).                                    54 non-null object
```
  - However, those that filled out the survey have named more than 311 connections.  This issue stems from lack of consistency in name entrance.

## Wrangling

### Rename Columns

In [6]:
eit_participants.columns

Index(['Activated', 'Email', 'First Name', 'Last Name', 'Job description',
       'Organisation', 'Organisation Type', 'Areas of Activity', 'Country',
       'City', 'Website', 'Select your Pillars' sessions',
       'From the list of people below, with whom do you have a reliable relationship or have worked together in the last 18 months?',
       'Which participants listed below would you be most interested in meeting?',
       'Which insitutions listed below would you be most interested in meeting?',
       'Please describe shortly the idea for the project you would like to submit (250 characters).'],
      dtype='object')

In [7]:
eit_participants = eit_participants.rename(columns = {"Select your Pillars' sessions": 'Pillars',
                                   "From the list of people below, with whom do you have a reliable relationship or have worked together in the last 18 months?": 'Reliable Relationship',
                                   "Which participants listed below would you be most interested in meeting?":'w2m_person',
                                   "Which insitutions listed below would you be most interested in meeting?": 'w2m_institution',
                                   "Please describe shortly the idea for the project you would like to submit (250 characters).": 'project'})

In [8]:
eit_participants.columns

Index(['Activated', 'Email', 'First Name', 'Last Name', 'Job description',
       'Organisation', 'Organisation Type', 'Areas of Activity', 'Country',
       'City', 'Website', 'Pillars', 'Reliable Relationship', 'w2m_person',
       'w2m_institution', 'project'],
      dtype='object')

### Merge First and Last Names

In [9]:
eit_participants.insert(4, 'Name', eit_participants[['First Name', 'Last Name']].apply(lambda x:' '.join(x), axis = 1))

In [10]:
eit_participants['Name'].head()

0      Magda Krakowiak
1      Mikolaj Gurdala
2       Marco Pugliese
3      Sabine Schumann
4    Joanna Baranowska
Name: Name, dtype: object

In [11]:
eit_participants.columns

Index(['Activated', 'Email', 'First Name', 'Last Name', 'Name',
       'Job description', 'Organisation', 'Organisation Type',
       'Areas of Activity', 'Country', 'City', 'Website', 'Pillars',
       'Reliable Relationship', 'w2m_person', 'w2m_institution', 'project'],
      dtype='object')

### Expand Pillars Column

In [12]:
set([j.strip() for i in eit_participants['Pillars'].dropna() for j in i.split(',')])

{'ACCELERATOR - Bootcamp programs',
 'ACCELERATOR - Entrepreneurship programs',
 'ACCELERATOR - GoGlobal Programmes',
 'ACCELERATOR - Living Labs & Test Beds',
 'ACCELERATOR - Mentoring and Coaching network',
 'CAMPUS - Citizen Engagement Activities',
 'CAMPUS - E-Labs',
 'CAMPUS - Fellowship Programmes',
 'CAMPUS - Innovation Day/Student Competition',
 'CAMPUS - Innovative Education (incl. WE Health)',
 'CAMPUS - Professionals and Executive Training',
 'INNOVATION - Focus Areas',
 'INNOVATION - group matchmaking'}

In [13]:
set([j.split(' - ')[0].strip() for i in eit_participants['Pillars'].dropna() for j in i.split(',')])

{'ACCELERATOR', 'CAMPUS', 'INNOVATION'}

In [14]:
set([j.split(' - ')[-1].strip() for i in eit_participants['Pillars'].dropna() for j in i.split(',')])

{'Bootcamp programs',
 'Citizen Engagement Activities',
 'E-Labs',
 'Entrepreneurship programs',
 'Fellowship Programmes',
 'Focus Areas',
 'GoGlobal Programmes',
 'Innovation Day/Student Competition',
 'Innovative Education (incl. WE Health)',
 'Living Labs & Test Beds',
 'Mentoring and Coaching network',
 'Professionals and Executive Training',
 'group matchmaking'}

In [15]:
l = []
for i in eit_participants['Pillars']:
    if i==i:
        l2 = []
        for j in i.split(','):
            l2.append(j.strip().split(' - ')[0])
        l.append(list(set(l2)))
    else:
        l.append(['No Pillar'])
eit_participants.insert(13, 'Main Pillar', l)
eit_participants.columns

Index(['Activated', 'Email', 'First Name', 'Last Name', 'Name',
       'Job description', 'Organisation', 'Organisation Type',
       'Areas of Activity', 'Country', 'City', 'Website', 'Pillars',
       'Main Pillar', 'Reliable Relationship', 'w2m_person', 'w2m_institution',
       'project'],
      dtype='object')

In [16]:
l = []
for i in eit_participants['Pillars']:
    if i==i:
        l2 = []
        for j in i.split(','):
            l2.append(j.strip().split(' - ')[-1])
        l.append(list(set(l2)))
    else:
        l.append(['No Sub Pillar'])
eit_participants.insert(14, 'Sub Pillar', l)
eit_participants.columns

Index(['Activated', 'Email', 'First Name', 'Last Name', 'Name',
       'Job description', 'Organisation', 'Organisation Type',
       'Areas of Activity', 'Country', 'City', 'Website', 'Pillars',
       'Main Pillar', 'Sub Pillar', 'Reliable Relationship', 'w2m_person',
       'w2m_institution', 'project'],
      dtype='object')

In [17]:
eit_participants[['Pillars', 'Main Pillar', 'Sub Pillar']].head()

Unnamed: 0,Pillars,Main Pillar,Sub Pillar
0,,[No Pillar],[No Sub Pillar]
1,"INNOVATION - Focus Areas, INNOVATION - group m...",[INNOVATION],"[Focus Areas, group matchmaking]"
2,"ACCELERATOR - Bootcamp programs, ACCELERATOR -...","[INNOVATION, ACCELERATOR]","[Bootcamp programs, Mentoring and Coaching net..."
3,INNOVATION - Focus Areas,[INNOVATION],[Focus Areas]
4,"ACCELERATOR - Bootcamp programs, CAMPUS - E-La...","[CAMPUS, ACCELERATOR, INNOVATION]","[Bootcamp programs, group matchmaking, E-Labs]"


### Areas of Activity

In [18]:
set([j for i in eit_participants['Areas of Activity'].dropna() for j in i.split(',')])

{'Active Ageing / Overcoming Functional Loss',
 'Active Ageing / Workplace Interventions',
 'Ageing with a Healthy Brain',
 'Bringing Care Home',
 'Business Development',
 'Chronic Disease management',
 'Design / R&D / Engineering',
 'Diagnostics',
 'Education',
 'Improving Healthcare Systems',
 'Leveraging Enabling Technologies and Exploiting Big Data',
 'Leveraging Talents and Education',
 'Metabolic Health',
 'Mobility and Independence Throughout Life',
 'Motivate Active Personal Lifestyles',
 'Personalized Oncology and integrated Cancer Care',
 'Prevention',
 'Production',
 'Promote Healthy Living / Lifestyle Intervention',
 'Removing Barriers to Innovation',
 'Sales & Distribution',
 'Self-Management of Health',
 'Service / Maintenance / Supply',
 'Sustainable Continuum of Care to Support Active Living in Europe',
 'Testing & Analysis',
 'Therapy',
 'Treating and Managing Chronic Diseases',
 'Value from Data in Clinical and SubClinical Settings',
 'Well being'}

### Country

In [19]:
set(eit_participants['Country'])

{'Belgium',
 'Denmark',
 'Estonia',
 'France',
 'Germany',
 'Hungary',
 'Ireland',
 'Italy',
 'Netherlands',
 'Poland',
 'Portugal',
 'Spain',
 'Sweden',
 'Switzerland',
 'United Kingdom'}

### Organisation and Organisation Type

In [20]:
set(eit_participants['Organisation Type'])

{'Association/Agency',
 'Authority/Government',
 'Company',
 'EIT Health staff',
 'Other',
 'R&D Institution',
 'University'}

## Create Network Data for Gephi

### Who you know

In [21]:
net_dict = {'Source': [],
            'Target' : []}
for index, row in eit_participants.dropna(subset = ['Reliable Relationship']).iterrows():
    for cons in row['Reliable Relationship'].split(','):
        net_dict['Source'].append(row['Name'])
        net_dict['Target'].append(cons.strip())
wyn_edges = pd.DataFrame(net_dict)
wyn_nodes = pd.DataFrame({'Id' : list(set(wyn_edges['Source']) | set(wyn_edges['Target'])),
                          'Label' : list(set(wyn_edges['Source']) | set(wyn_edges['Target']))})
wyn_edges.to_csv('wyn_edges.csv', index = False)
wyn_nodes.to_csv('wyn_nodes.csv', index = False)

![](https://gyazo.com/ccab4c765d8ec61985a444c5c07b809b.png)

### Who you want to meet

In [22]:
net_dict = {'Source': [],
            'Target' : []}
for index, row in eit_participants.dropna(subset = ['w2m_person']).iterrows():
    for cons in row['w2m_person'].split(','):
        net_dict['Source'].append(row['Name'])
        net_dict['Target'].append(cons.strip())
w2m_edges = pd.DataFrame(net_dict)
w2m_nodes = pd.DataFrame({'Id' : list(set(w2m_edges['Source']) | set(w2m_edges['Target'])),
                          'Label' : list(set(w2m_edges['Source']) | set(w2m_edges['Target']))})
w2m_edges.to_csv('w2m_edges.csv', index = False)
w2m_nodes.to_csv('w2m_nodes.csv', index = False)                         

![](https://gyazo.com/8e0473b632a31f785e85e35fdd44a736.png)

**Notes**
1. Clustering of non-human networks represent closely related subjects of importance that may benefit from merging.

### Want to meet (Institution)

In [23]:
net_dict = {'Source': [],
            'Target' : []}
for index, row in eit_participants.dropna(subset = ['w2m_institution']).iterrows():
    for cons in row['w2m_institution'].split(','):
        net_dict['Source'].append(row['Name'])
        net_dict['Target'].append(cons.strip())
w2m_institution_edges = pd.DataFrame(net_dict)
w2m_institution_nodes = pd.DataFrame({'Id' : list(set(w2m_institution_edges['Source']) | set(w2m_institution_edges['Target'])),
                          'Label' : list(set(w2m_institution_edges['Source']) | set(w2m_institution_edges['Target']))})
w2m_institution_nodes['Class'] = ['Institution' if i in w2m_institution_edges['Target'].tolist() else 'Person' for i in w2m_institution_nodes['Id']]
w2m_institution_edges.to_csv('w2m_institution_edges.csv', index = False)
w2m_institution_nodes.to_csv('w2m_institution_nodes.csv', index = False)                         

### Areas of Activity

In [24]:
net_dict = {'Source': [],
            'Target' : []}
for index, row in eit_participants.dropna(subset = ['Areas of Activity']).iterrows():
    for cons in row['Areas of Activity'].split(','):
        net_dict['Source'].append(row['Name'])
        net_dict['Target'].append(cons.strip())
activities_edges = pd.DataFrame(net_dict)
activities_nodes = pd.DataFrame({'Id' : list(set(activities_edges['Source']) | set(activities_edges['Target'])),
                          'Label' : list(set(activities_edges['Source']) | set(activities_edges['Target']))})
activities_nodes['Class'] = ['Activity' if i in activities_edges['Target'].tolist() else 'Person' for i in activities_nodes['Id']]
activities_edges.to_csv('activities_edges.csv', index = False)
activities_nodes.to_csv('activities_nodes.csv', index = False)                         

### Pillars

For this one, we will connect to sub-pillar but color by Main pillar

In [25]:
net_dict = {'Source': [],
            'Target' : []}
for index, row in eit_participants.dropna(subset = ['Sub Pillar']).iterrows():
    for cons in row['Sub Pillar']:
        net_dict['Source'].append(row['Name'])
        net_dict['Target'].append(cons.strip())
pillars_edges = pd.DataFrame(net_dict)
pillars_nodes = pd.DataFrame({'Id' : list(set(pillars_edges['Source']) | set(pillars_edges['Target'])),
                          'Label' : list(set(pillars_edges['Source']) | set(pillars_edges['Target']))})
pillar_key = list(set([j.strip() for i in eit_participants['Pillars'].dropna() for j in i.split(',')]))
pillar_dict = {}
for i in pillar_key:
    mp, sp = i.split(' - ')
    pillar_dict[sp] = mp
pillar_dict
pillars_nodes['Class'] = [pillar_dict[i] if i in pillar_dict else 'Person' for i in pillars_nodes['Id'].tolist()]
pillars_edges.to_csv('pillar_edges.csv', index = False)
pillars_nodes.to_csv('pillar_nodes.csv', index = False)

### Country

In [26]:
country_edges = eit_participants[['Name', 'Country']].rename(columns = {'Name' : 'Source', 'Country' : 'Target'})
country_nodes = pd.DataFrame({'Id' : list(set(country_edges['Source'])|set(country_edges['Target'])),
                              'Label' : list(set(country_edges['Source'])|set(country_edges['Target']))
                             })
country_nodes['Class'] = ['Country' if i in country_edges['Target'].tolist() else 'Person' for i in country_nodes['Id']]
country_edges.to_csv('country_edges.csv', index = False)
country_nodes.to_csv('country_nodes.csv', index = False)

### Organisation and Organisation Type

In [27]:
org_type_edges = eit_participants[['Name', 'Organisation Type']].rename(columns = {'Name' : 'Source', 'Organisation Type' : 'Target'})
org_type_nodes = pd.DataFrame({'Id' : list(set(org_type_edges['Source'])|set(org_type_edges['Target'])),
                              'Label' : list(set(org_type_edges['Source'])|set(org_type_edges['Target']))
                             })
org_type_nodes['Class'] = ['Organisation Type' if i in org_type_edges['Target'].tolist() else 'Person' for i in org_type_nodes['Id']]
org_type_edges.to_csv('org_type_edges.csv', index = False)
org_type_nodes.to_csv('org_type_nodes.csv', index = False)

## Personality

In [30]:
eitpro = pd.read_excel('eitprofile.xlsx', header = 1)

In [31]:
eitpro.head()

Unnamed: 0,Event,User Type,Full Name,First Name,Last Name,Email,Industry,Company,Title,Level,...,Connections,Experience,Skills,Groups,Warmth,Linkedin_Summary_Tags,Linkedin_Headline_Tags,Personality Type,Personality summary,Key words/phrases
0,EIT Health 2018,Attendees,Adèle Tellez,Adèle,Tellez,adele.tellez@cea.fr,,CEA-List,Health Program Manager,,...,500,y,0.0,6,3.6,,CEA List,Is,"Adèle resists formal structure, has big ideas,...","Colorful, descriptive words|How can I help|How..."
1,EIT Health 2018,Attendees,Adrienne Perves,Adrienne,Perves,adrienne.perves@cea.fr,,CEATech - Leti-Health,Deputy Director LETI-Health,,...,500,y,14.0,6,3.8,"Experienced Project Management, International ...","Deputy Head, l'énergie atomique, énergies alte...",Ds,Adrienne may come off as too direct or blunt s...,...what your team thinks|ASAP|Focus on a singl...
2,EIT Health 2018,Attendees,Agnieszka Oćwieja,Agnieszka,Oćwieja,agnieszka.ocwieja@comarch.com,,Comarch Healthcare,Business Solutions Consultant,,...,153,y,15.0,6,3.5,,"Stanislawa Staszica, Górniczo-Hutnicza im, Kra...",Id,"Agnieszka is a decisive, creative influencer: ...","Bold claims|Casual greetings (""Hey"")|Colorful,..."
3,EIT Health 2018,Attendees,Aine Phelan,Aine,Phelan,aine@isax.ie,,ISAX,Head of Consumer Insight & Marketing,,...,500,y,50.0,1,5.0,"Ireland Smart Ageing, Business Model Transform...","Consumer Insight & Marketing, Head, ISAX",Di,Aine is an extremely direct communicator and m...,ASAP|Blunt language|Focus on a single point|I ...
4,EIT Health 2018,Attendees,Alain PUJOL,Alain,PUJOL,alain.pujol@angelssante.fr,,Angels Sante - Angels for Health,Board Member International Relations,,...,500,n,22.0,1,2.3,,,Ds,Alain may come off as too direct or blunt some...,...what your team thinks|ASAP|Focus on a singl...


In [35]:
eitpro = eitpro.dropna(subset = ['Collated Skills', 'Personality Type'])[['Full Name', 'Collated Skills', 'Personality Type']]

In [37]:
eitpro.head()

Unnamed: 0,Full Name,Collated Skills,Personality Type
1,Adrienne Perves,"Project Management,R&D,Innovation Management,I...",Ds
2,Agnieszka Oćwieja,"Matlab,Biomedical Engineering,English,Microsof...",Id
3,Aine Phelan,"Strategy,Management,Marketing Strategy,Marketi...",Di
4,Alain PUJOL,"Change Management,Pharmaceutical Industry,Stra...",Ds
5,Albena Saynova,"International Relations,Public Policy,Microsof...",Cs


In [45]:
net_dict = {'Source' : [],
            'Target' : []}
for index, row in eitpro.iterrows():
    for skill in row['Collated Skills'].split(','):
        net_dict['Source'].append(row['Full Name'])
        net_dict['Target'].append(skill)
nodes = list(set(net_dict['Source'])|set(net_dict['Target']))
li_skills_nodes = pd.DataFrame({'Id' : nodes,
                                'Label' : nodes})
li_skills_nodes['Class'] = ['Skill' if i in li_skills_edges['Target'].tolist() else 'Persons' for i in li_skills_nodes['Id'].tolist()]
li_skills_edges = pd.DataFrame(net_dict)
li_skills_nodes.to_csv('li_skills_nodes.csv', index = False)
li_skills_edges.to_csv('li_skills_edges.csv', index = False)

In [47]:
eitpro.head()

Unnamed: 0,Full Name,Collated Skills,Personality Type
1,Adrienne Perves,"Project Management,R&D,Innovation Management,I...",Ds
2,Agnieszka Oćwieja,"Matlab,Biomedical Engineering,English,Microsof...",Id
3,Aine Phelan,"Strategy,Management,Marketing Strategy,Marketi...",Di
4,Alain PUJOL,"Change Management,Pharmaceutical Industry,Stra...",Ds
5,Albena Saynova,"International Relations,Public Policy,Microsof...",Cs


In [53]:
per_dict = {'d' : 'Dominance',
            'i' : 'Influence',
            'c' : 'Conscientiousness',
            's' : 'Steadiness'}

In [63]:
def pers(x):
    return '|'.join([per_dict[i] for i in x.lower()])
eitpro['Personality Type'] = eitpro['Personality Type'].apply(pers)

In [65]:
net_dict = {'Source' : [],
            'Target' : []}
for index, row in eitpro.iterrows():
    net_dict['Source'].append(row['Full Name'])
    net_dict['Target'].append(row['Personality Type'])
nodes = list(set(net_dict['Source'])|set(net_dict['Target']))
personality_nodes = pd.DataFrame({'Id' : nodes,
                                'Label' : nodes})
personality_edges = pd.DataFrame(net_dict)
personality_nodes['Class'] = ['Personality' if i in personality_edges['Target'].tolist() else 'Persons' for i in personality_nodes['Id'].tolist()]
personality_nodes.to_csv('personality_nodes.csv', index = False)
personality_edges.to_csv('personality_edges.csv', index = False)