## Basic Ego Networks

Ego networks can essentially be thought of as a social network. Ego networks provide an organized and visualizable method of analyzing the social interactions between people, groups and organizations. This is usually done via determining some center node, or the *ego*, and figuring out their connecting nodes, or *alters*, and then performing analysis on this graph of connections such as interaction levels or connection density.
  
Below is a list of some terms relevant to ego networks.  
  
**Ego**: The focal point of a network. Effectively this is the "center" of any given graph of interactions. It's important to note however that in any given graph, there are as many egos as there are nodes. Egos are specifically used to denote a "base" node to construct the ego network from. For example, in a group of friends where each person is a node, to construct an ego network for person A, we look at all of person A's friends. However, this graph will be different from the friends that person B might have. The graph we construct for person A and person B may be different so we use the term "ego" to denote which person's graph we're referencing.  
**Alter**: Nodes surrounding the ego. These would be any friends in the above example.  
**Neighborhood**: This simply refers to the network of the ego and alters that surround it by some max distance from the ego. Generally, it is understood that a neighborhood will have a distance of 1, or that all alters in the neighborhood are directly connected to the ego. Distance refers to the max length path from the ego.  
**N-Step Neighborhood**: A specific way to refer to a neighborhood which has more than a distance of 1.  
  
  





In [1]:
import csv
import sqlite3
import re

def load_twitter_data_sqlite3(conn) :
    def insertData(vals):
        values = "("
        for val in vals:
            values += "?,"
        values = values[:-1]+")"
        query = "INSERT INTO tweets VALUES %s" % (values)
        conn.execute(query, tuple(vals))
        conn.commit()
    
    with open("tweets.csv", "r") as f:
        reader = csv.reader(f)
        colNames = reader.next()
        firstLine = reader.next()
        data = {}
        query = "CREATE TABLE tweets ("
        for i in xrange(len(colNames)):
            data[colNames[i]] = firstLine[i]
            try: 
                int(firstLine[i])
                t = "integer"
            except:
                t = "text"
            query += colNames[i] + " " + t + ","

        query = query[:-1] + ")"
        conn.execute(query)
        insertData(firstLine)
        for line in reader:
            insertData(line)
        conn.commit()

conn = sqlite3.connect(":memory:")
conn.text_factory = str
load_twitter_data_sqlite3(conn)


# Ego Networks on Twitter Users

We will reuse the tweet dataset we were handed out in homework 2 for simplicity's sake.

We will define an ego network on a twitter user based on the user, or ego, and all other twitter users, or alters, that surround the ego. Normally, we would want to limit how recently the ego interacted with a potential alter by recentness but since our tweet dataset only contains around 5000 tweets, we will not limit the recentness of interactions and account for all tweets and interactions.

We will write a function now that will get the alters of an ego and construct a graph for it as well as add some information. A common point of data we might want to associate in an ego network include number of interactions with their types. For our case, we might consider this to be the number of retweets vs number of mentions or favorites. However, since we do not have information on who favorited a tweet, we will disclude this information for our purposes.

In [2]:
def get_alters_and_interactions(conn,ego):
    #We will get any tweet mentioning the ego either as a retweet or mention.
    query = "SELECT * FROM tweets WHERE text LIKE '%%@%s%%'" % ego
    interactions = conn.execute(query).fetchall()
    
    graph = {}
    
    for interaction in interactions:
        screen_name = interaction[0]
        text = interaction[4]
        isRetweet = (text[:2] == "RT")
        
        # We'll skip any cases where the ego mentions themselves for whatever reason.
        if screen_name == ego:
            continue
            
        if screen_name not in graph:
            graph[screen_name] = {"RTs": 0, "mentions": 0}
        if isRetweet:
            graph[screen_name]["RTs"] += 1
        else:
            graph[screen_name]["mentions"] += 1
    
    # We will then add any interactions that the ego made with other people
    query = "SELECT * FROM tweets WHERE screen_name='%s'" % ego
    tweets = conn.execute(query).fetchall()
    
    regex = re.compile("\@\w+")
    for tweet in tweets:
        text = tweet[4]
        isRetweet = (text[:2] == "RT")
        finder = filter(lambda x: x[1:] != ego, regex.findall(text))
        
        
        if finder != []:
            user = finder[0][1:]
            if user not in graph:
                graph[user] = {"RTs": 0, "mentions": 0}
            if isRetweet:
                graph[user]["RTs"] += 1
            else:
                graph[user]["mentions"] += 1
            if len(finder) > 1:
                for user in finder[1:]:
                    user = user[1:]
                    if user not in graph:
                        graph[user] = {"RTs": 0, "mentions": 0}
                    graph[user]["mentions"] += 1
    return graph

print "a"
ego = "realDonaldTrump"
ego_network = get_alters_and_interactions(conn, ego)

for k,v in ego_network.items():
    print 
    print "@%s <-> @%s" % (ego,k)
    print "--- Retweets: %d" % v["RTs"]
    print "--- Mentions: %d" % v["mentions"]

a

@realDonaldTrump <-> @LaraLeaTrump
--- Retweets: 1
--- Mentions: 3

@realDonaldTrump <-> @parscale
--- Retweets: 2
--- Mentions: 0

@realDonaldTrump <-> @AnnCLauer
--- Retweets: 1
--- Mentions: 0

@realDonaldTrump <-> @TrumpDoonbeg
--- Retweets: 1
--- Mentions: 0

@realDonaldTrump <-> @DonaldJTrumpJr
--- Retweets: 4
--- Mentions: 0

@realDonaldTrump <-> @TiffanyATrump
--- Retweets: 3
--- Mentions: 0

@realDonaldTrump <-> @FoxNews
--- Retweets: 0
--- Mentions: 7

@realDonaldTrump <-> @MrsVanessaTrump
--- Retweets: 2
--- Mentions: 0

@realDonaldTrump <-> @GenFlynn
--- Retweets: 0
--- Mentions: 1

@realDonaldTrump <-> @JoefromDoonbeg
--- Retweets: 3
--- Mentions: 2

@realDonaldTrump <-> @seanhannity
--- Retweets: 0
--- Mentions: 5

@realDonaldTrump <-> @livelasvegas
--- Retweets: 2
--- Mentions: 0

@realDonaldTrump <-> @EricTrump
--- Retweets: 2
--- Mentions: 1

@realDonaldTrump <-> @DanScavino
--- Retweets: 4
--- Mentions: 5

@realDonaldTrump <-> @mdamelincourt
--- Retweets: 0
--- Men

Now with the ability to construct a list of alters and some basic information regarding their interaction, we'll be able to perform some analysis on the dataset. Some examples of things we might be able to do for example is to expand the neighborhood to an n-step neighborhood in conjunction with some basic information on the political affiliation of unknown users based on their ego networks.

## N-Step Neighborhoods and Interaction Levels

Let's take this a step further by constructing a 2-step neighborhood and apply weights to each edge based on interaction levels. For the purpose of this demonstration, we'll consider a retweet to have a weight of 1 and a mention to be a weight of 2. We'll also normalize our weights to values between 0 and 1.

In [3]:

def construct2step(centerEgo):
    egos = {}
    altersAndInteractions = get_alters_and_interactions(conn, centerEgo)
    egos[centerEgo] = altersAndInteractions
    
    for screenName in altersAndInteractions.keys():
        egos[screenName] = None
    for screenName in egos.keys():
        if screenName != centerEgo:
            egos[screenName] = get_alters_and_interactions(conn,screenName)
    
    #Let's then construct a matrix from the 2-step neighborhood and giving each edge a weight then normalize it
    matrix = {}
    def sumWeight(interactions):
        # Simply sums the interaction weight for retweets and mentions
        retweetWeight = 1
        mentionWeight = 2
        return interactions["RTs"]*retweetWeight + interactions["mentions"]*mentionWeight
    
    for screenName,alters in egos.items():
        if screenName not in matrix:
            matrix[screenName] = {}
        
        for alter,interactions in alters.items():
            matrix[screenName][alter] = sumWeight(interactions)
            
    sumWeightedMatrix = {}
    maxWeight = 0

    # And since interactions between user A to B might not be the same as B to A, we'll, for the sake of simplicity, combine the two.
    # Of course we only do this for those inside the neighborhood
    for ego,alters in matrix.items():
        for alter, weight in alters.items():
            if alter in matrix:
                
                sumWeight = weight + matrix[alter][ego] if ego in matrix[alter] else weight
                if maxWeight < sumWeight:
                    maxWeight = sumWeight
                
                if alter not in sumWeightedMatrix:
                    sumWeightedMatrix[alter] = {}
                if ego not in sumWeightedMatrix:
                    sumWeightedMatrix[ego] = {}

                sumWeightedMatrix[alter][ego] = sumWeight
                sumWeightedMatrix[ego][alter] = sumWeight
    
    for ego,alters in sumWeightedMatrix.items():
        for alter, weight in alters.items():
            sumWeightedMatrix[ego][alter] = weight/float(maxWeight)
            print ego, "-->", alter, " : ", weight/float(maxWeight)
    return sumWeightedMatrix
            
        
normalizedWeightMatrix = construct2step("realDonaldTrump")



LaraLeaTrump --> TeamTrump  :  0.0588235294118
LaraLeaTrump --> DonaldJTrumpJr  :  0.0392156862745
LaraLeaTrump --> KatrinaPierson  :  0.0588235294118
LaraLeaTrump --> DiamondandSilk  :  0.0980392156863
LaraLeaTrump --> TiffanyATrump  :  0.0392156862745
LaraLeaTrump --> MrsVanessaTrump  :  0.0392156862745
LaraLeaTrump --> realDonaldTrump  :  0.274509803922
LaraLeaTrump --> trumpwinery  :  0.117647058824
LaraLeaTrump --> LGlick1  :  0.078431372549
parscale --> DonaldJTrumpJr  :  0.0392156862745
parscale --> IngrahamAngle  :  0.0392156862745
parscale --> realDonaldTrump  :  0.117647058824
AnnCLauer --> MattLaDolcetta  :  0.0392156862745
AnnCLauer --> Trump  :  0.078431372549
AnnCLauer --> EricTrump  :  0.078431372549
AnnCLauer --> realDonaldTrump  :  0.0588235294118
TrumpDoonbeg --> Scottmarr8  :  0.627450980392
TrumpDoonbeg --> BrendanAMurphy  :  0.509803921569
TrumpDoonbeg --> Trump  :  0.509803921569
TrumpDoonbeg --> realDonaldTrump  :  0.0588235294118
TrumpDoonbeg --> JoefromDoonbeg 

To explain the above algorithm, we simply get the alters of the "real" ego. The real ego being the screen name provided to the function name. From there, we simply take the list of alters returned by the real ego and treat each of those alters as a "sub ego" and add the alter and interaction list for all of them. From there, we simply construct a "finalized" matrix with the information from A to B interactions and B to A interactions summed. This is necessary since if A mentions B, then B will have a mention score for A, but A will not count that as an interaction for themself. After this is done, we simply normalize the score to one.

# Visualizing results

Finally, let's visualize our small example ego network with the networkx library. Here, we're simply visualizing the data we've constructed.

In [4]:
import networkx as nx
import matplotlib.pyplot as plt

def makeGraph(matrix):
    g = nx.Graph()
    nodes = set()
    edges = set()
    for ego,alters in matrix.items():
        nodes.add(ego)
        for alter, weight in alters.items():
            edges.add((ego,alter,weight))
    
    for node in nodes:
        g.add_node(node)
    for edge in edges:
        fromNode,toNode,weight = edge
        g.add_edge(fromNode,toNode, weight=weight)
    
    position = nx.spring_layout(g)
    nx.draw_networkx_nodes(g, position, nodelist = nodes, node_color = (0.1, 0.4, 0.8, 0.8))
    nx.draw_networkx_edges(g, position)
    edge_labels=dict([((u,v,),d['weight'])
                             for u,v,d in g.edges(data=True)])
    nx.draw_networkx_edge_labels(g,position,edge_labels=edge_labels)
    nx.draw_networkx_labels(g, position)
    plt.show()
    
makeGraph(normalizedWeightMatrix)


  if rotation in ('horizontal', None):
  elif rotation == 'vertical':


If run on your local machine, the above code will output the following graph:
    
![](https://puu.sh/s4yYo/63715b8245.png)

The darker the line, the higher the interaction level we have calculated.


To reiterate, this graph shows the interaction levels between certain individuals, in this case Donald Trump's twitter account, and other people who immediately surround him. We construct a 2-step neighborhood surrounding Trump's twitter based on the tweet database we received in homework 2. From this, we were able to calculate the interaction levels between Trump, his followers and other figures Trump tweets with or at.



# Wrap Up

While the example displayed in this tutorial with ego networks is relatively simple and naive in many of its assumptions, ego networks can act as a useful method of surfacing data of interest. Ego networks are often used to make sense of small and large groups and their interactions with each other. 

One example of ego-networks I've used for is when I constructed a strategy for information dispersal through a social network. Ego networks were vital for planning a strategy for where to "plant" information using the heuristic that people with high interaction levels were likely to share information.

Another example of the usage of ego networks might be that when trying to categorize some person, group or demographic with little information on them purely based on their surrounding elements. For example, one might not have much information on a certain individual, but by generating an ego network around them, one might be able to take an educated guess at their political affiliation, financial wealth or other kinds of categorical classifiers.

