# TransNet: Translation-Based Network Representation Learning for Social Relation Extraction

## Overview

In this notebook, we will use TransNet(a NRL model) to learn a social network and predict the labels on the social relations. We present the task of Social Relation Extraction(SRE), which is designed to extract relations between social network vertices.

In TransNet, we focus on the problem of incorporating rich relation infomation on edgs into NRL(Network Representation Learning). 

<a data-flickr-embed="true"  href="https://www.flickr.com/photos/150924720@N04/35463916440/" title="framework"><img src="https://farm5.staticflickr.com/4266/35463916440_7444da7c8c.jpg" width="500" height="403" alt="framework"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>

As shown in above figure, TransNet consists of two critical components, translation part and edge representation part. In the following parts, we first show how to implement translation mechanism.  Then we introduce how to construct the edge representations. At last, we give the overall objective function of TransNet.

### Read datset

Before constructing TransNet, we should read dataset first. To simplify this notebook, we use `read_data_sets()` to read data. This function will return a objecct which can generate training batches and testing batches. We choose a sub-network of ArnetMiner. This network have 1000 nodes and about 400 kinds of labels on relations.

In [1]:
import numpy as np
import random
import tensorflow as tf
from input_data import read_data_sets

aminer = read_data_sets()
entity_total = aminer.entity_total
tag_total = aminer.tag_total

Here we set some parameters. We will explain them below.

In [12]:
# Parameters
learning_rate = 0.001
warm_up_epochs = 20
epochs = 66
batch_size = 100
eval_batch_size = 2000
display_step = 5

gamma = 1 # margin
alpha = 0.5
l2_lambda = 0.001 # regulizer weight
beta = 50.0
keep_prob = 0.5 # drop out prob
rep_size = 64 # representation size

hits_k = [1,5,10]

## Translation Mechnism

Motivated by translation mechanisms in word representations [Mikolov *et al*., 2013] and knowledge representations [Bordes *et al*., 2013], we assume that the interactions between vertices in social networks can also be portrayed as translations in the representation space.

Specially, for each edge $e = (u, v)$ and its corresponding label set $l$, the representation of vertex $v$ is expected to be close to the representation of vertex $u$ plus the representation of edge $e$. As each vertex plays two roles in TransNet: head vertex and tail vertex, we introduce two vectors $\bf{v}$ abd $\bf{v'}$ for each vertex $v$, corresponding to its heading representation and tail representation. After that, the translation mechanism among $u$, $v$ and $e$ can be formalized as
$$u + l \approx v'$$
Note that, $\bf{l}$ is the edge representation obtained from $l$, which will be introduced in the second part.

In [13]:
# input
# pos_h, pos_t, neg_h, neg_t are vertices' ID
pos_h = tf.placeholder(tf.int32, [None])
pos_t = tf.placeholder(tf.int32, [None])
pos_r = tf.placeholder(tf.float32, [None, tag_total])
pos_br = tf.placeholder(tf.float32, [None, tag_total])

neg_h = tf.placeholder(tf.int32, [None])
neg_t = tf.placeholder(tf.int32, [None])
neg_r = tf.placeholder(tf.float32, [None, tag_total])
neg_br = tf.placeholder(tf.float32, [None, tag_total])

Here we use the ID of vertex to get the head representation and tail representation of the vertex. We put these embeddings in `node_lookup` at next code block.

In [14]:
#embedding
node_lookup = {
    'int_embeddings': tf.Variable(tf.random_normal([entity_total, rep_size])),
    'adv_embeddings': tf.Variable(tf.random_normal([entity_total, rep_size])),
}

def lookup(pos_head, pos_tail, neg_head, neg_tail, lookup):
    pos_head_e = tf.nn.l2_normalize(
        tf.nn.embedding_lookup(lookup['int_embeddings'], pos_head), 1)
    pos_tail_e = tf.nn.l2_normalize(
        tf.nn.embedding_lookup(lookup['adv_embeddings'], pos_tail), 1)
    neg_head_e = tf.nn.l2_normalize(
        tf.nn.embedding_lookup(lookup['int_embeddings'], neg_head), 1)
    neg_tail_e = tf.nn.l2_normalize(
        tf.nn.embedding_lookup(lookup['adv_embeddings'], neg_tail), 1)
    return pos_head_e, pos_tail_e, neg_head_e, neg_tail_e

# pos_h_e, pos_t_e, neg_h_e, neg_t_e are representations of vertices
pos_h_e, pos_t_e, neg_h_e, neg_t_e = lookup(pos_h, pos_t,
                                           neg_h, neg_t, node_lookup)

## Edge Representation Construction

As shown in above figure, we employ a deep autoencoder to construct the edge representations. The encoder part composes of several non-linear transformation layers to transform the label set into a low-dimensional representation space. Moreover, the reconstruction process of the decoder part makes the representation preserve all the label information.

To simplified this model, we just set one hidden layer and use it as representation of label. There are input layer, hidden layer and reconstruction layer. At hidden layer, we choose tanh as our activiation function to make output between -1 and 1. At reconstruction layer, we choose sigmoid as activiation function to make output between 0 and 1 because original label vectors only have 0,1 value.

We store the weights and biases of the autoencoder in `relation_weights` and `relation_biases`. In order to prevent overfitting, we also employ dropout to generate the dge representations.

In [15]:
# autoencoder
relation_weights = {
    'encoder_w': tf.Variable(tf.random_normal([tag_total, rep_size])),
    'decoder_w': tf.Variable(tf.random_normal([rep_size, tag_total])),
}
relation_biases = {
    'encoder_b': tf.Variable(tf.random_normal([rep_size])),
    'decoder_b': tf.Variable(tf.random_normal([tag_total])),
}

def autoencoder(W,B,x):
    rep = tf.nn.dropout(
        tf.nn.tanh(tf.matmul(x, W['encoder_w'])+B['encoder_b']), keep_prob)
    decode_x = tf.nn.sigmoid(
        tf.matmul(rep, W['decoder_w'])+B['decoder_b'])
    return rep, decode_x

# xxx_rep refer to the value of hidden layer
# xxx_dec refer to the value of reconstruction layer
pos_r_rep, pos_r_dec = autoencoder(relation_weights, relation_biases, pos_r)
neg_r_rep, neg_r_dec = autoencoder(relation_weights, relation_biases, neg_r)

## Overall Architecture

### Reconstruction Loss:

Autoencoder aims to minimize the distance between inputs and the reconstructed outputs. The reconstruction loss is shown as:

$$L_{rec} = ||s - \hat{s}||$$

Here, we also adopt L1-norm to measure the reconstruction distance. However, due to the sparsity of the input vector, the number of zero elements in s is much larger than that of non-zero elements. That means the autoencoder will tend to reconstruct the zero elements rather than non-zero ones, which is incompatible with our purpose. Therefore, we set different weights to different elements, and re-defined the loss function as follows:

$$L_{ae} = ||(s-\hat{s})\odot x||$$

Where x is a weight vector and $\odot$ means the Hadamard product. For $x=\{x_i\}_{i=1}^{|T|}, x_i=1$ when $s_i=0$ and $x_i=\beta > 1$ otherwise. Here we set $\beta = 50.0$.

We can look back the code block of input, `pos_br` and `neg_br` are the $x$ mentioned here. To get a powerful autoencoder before training translation model, we design a warm-up training for autoencoder. In warm-up training, we use label vectors from dataset as training data.

With the utilization of deep autoencoder, the edge representation not only remains the critical information of corresponding labels, but also has the ability of predicting the relation (labels) between two vertices.

To prevent overfitting, we define an L2-norm regularizer as:

$$ L_{reg} = \sum_{i=1}^{K}(||W^{(i)}||^2_2+||b^{(i)}||^2_2)$$

Note that K is the number of the autoencoder layers.

In [16]:
# loss
# L2-norm regularizer
relation_ae_l2_loss = tf.nn.l2_loss(relation_weights['encoder_w'])+\
                        tf.nn.l2_loss(relation_weights['decoder_w'])+\
                        tf.nn.l2_loss(relation_biases['encoder_b'])+\
                        tf.nn.l2_loss(relation_biases['decoder_b'])
# L1-norm to measure the reconstruction distance(reconstruction loss)
relation_loss = tf.reduce_sum(tf.abs(tf.multiply(pos_r_dec-pos_r, pos_br)))+\
                tf.reduce_sum(tf.abs(tf.multiply(neg_r_dec-neg_r, neg_br)))
# warm-up reconstruction loss
relation_pos_r_loss = tf.reduce_sum(tf.abs(tf.multiply(pos_r_dec-pos_r, pos_br))) +\
                        l2_lambda*relation_ae_l2_loss


#### Translation loss

We employ a distance function $d(u+l, v')$ to estimate the degree of $(u,v,l)$ that matches. In practice, we simply adopt $L_1$-norm. With the above definitions, for each $(u,v,l)$ and its negative sample $(\hat{u}, \hat{v}, \hat{l})$, the translation part of TransNet aims to minimize the hinge-loss as follows:
$$L_{trans}=max(\gamma + d(u+l,v')-d(\hat{u}+\hat{l}, \hat{v}', 0)$$

In [17]:
# positive distance
pos = tf.reduce_sum(tf.abs(pos_h_e + pos_r_rep - pos_t_e), 1, keep_dims = True)
# negative distance
neg = tf.reduce_sum(tf.abs(neg_h_e + neg_r_rep - neg_t_e), 1, keep_dims = True)
trans_loss = tf.reduce_sum(tf.maximum(pos - neg + gamma, 0))

#### Overall loss

To preserve the translation mechanism among vertex and
edge representations, as well as the reconstruction ability of
edge representations, we combine the objectives and propose a unified NRL model TransNet. For each $(u,v,l)$ and its negative sample $(\hat{u}, \hat{v}, \hat{l})$, TransNet jointly optimizes the objective as follows:

$$L = L_{trans} + \alpha|L_{ae}(l)+L_{ae}(\hat{l})|+\eta L_{reg}$$

In [18]:
loss = trans_loss+alpha*relation_loss+l2_lambda*relation_ae_l2_loss
# warm-up optimizer
relation_optimizer = tf.train.AdamOptimizer(learning_rate).minimize(relation_pos_r_loss)
# overall loss optimizer
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)
# Initializing the variables
init = tf.global_variables_initializer()

## Prediction and Evaluation

With the learnt vertex representations and the edge autoencoder, TransNet is capable of predicting the labels on the edges.We can get the approximate edge representation through $l=v'-u$. Naturally, we decode the edge representation l with the decoder part to obtain the predicted label vector $\hat{s}$. A large weight $\hat{s_i}$ indicates $t_i$ is more possible in $l$.

We employ *hits@k* and *MeanRank*[Borders *et al.*, 2013] as evaluation metrics. Here, *MeanRank* is the mean of prediced ranks of all annotated labels, while *hits@k* means the propotion of correct labels ranked in the top $k$. Note that, the above metrics
will under-estimate the models that rank other correct labels in the same label set high. Hence, we can filter out these labels before ranking. But here we don't filter out these labels to simplify the notebook.

Here we set $k=1,5,10$.

In [19]:
# evaluate
relation_sum = tf.reduce_sum(pos_r)
pos_r_minus = pos_t_e - pos_h_e
# pos_r_minus_dec is the predicted label vector hat s
pos_r_minus_dec = tf.nn.sigmoid(
    tf.matmul(pos_r_minus, relation_weights['decoder_w'])+relation_biases['decoder_b'])
hits = []
for k in hits_k:
    # find the indices of top k labels
    topk_indices = tf.nn.top_k(pos_r_minus_dec, k=k).indices
    # transform indices to 0,1 vectors whose size is total label size
    pred = tf.reduce_sum(tf.one_hot(topk_indices, tag_total), 1)
    # compare preds and original relation labels to get correct numbers
    hits.append(tf.reduce_sum(tf.multiply(pred, pos_r)))
    

###  Launch the graph

In [20]:
sess = tf.Session()
sess.run(init)
total_batch = int(aminer.train.num_examples / batch_size)
test_total_batch = int(aminer.test.num_examples / eval_batch_size)

# initialize relation
print "Starting warm-up relation training"
for epoch in range(warm_up_epochs):
    # loop over all batches
    sum_loss = 0.0
    for i in range(total_batch):
        vecs, bs = aminer.train.next_autoencoder_batch(batch_size, beta)
        _, cur_loss = sess.run([relation_optimizer, relation_pos_r_loss],
                              feed_dict={pos_r: vecs, pos_br: bs})
        sum_loss += cur_loss
    print 'Warm-up relation epoch: ', epoch, 'sum of loss', sum_loss

for epoch in range(epochs):
    sum_loss = 0.0
    for i in range(total_batch):
        pos_h_batch, pos_t_batch, pos_r_batch, pos_b_batch,\
        neg_h_batch, neg_t_batch, neg_r_batch, neg_b_batch = aminer.train.next_batch(batch_size, beta)
        _, cur_loss = sess.run([optimizer, loss],
                               feed_dict={pos_h: pos_h_batch, pos_t: pos_t_batch,
                                         pos_r: pos_r_batch, pos_br: pos_b_batch,
                                         neg_h: neg_h_batch, neg_t: neg_t_batch,
                                         neg_r: neg_r_batch, neg_br: neg_b_batch})
        sum_loss += cur_loss
    print 'Train TransNet epoch: ', epoch, 'sum of loss', sum_loss
    if epoch % display_step == 0:
        print 'Evaluating...'
        hits_ = [0]*len(hits_k)
        all_count = 0.0
        for i in range(test_total_batch):
            pos_h_batch, pos_t_batch, pos_r_batch = aminer.test.next_test_batch(eval_batch_size)
            cur_hits, cur_sum = sess.run([hits, relation_sum],
                                        feed_dict={pos_h: pos_h_batch,
                                                  pos_t: pos_t_batch,
                                                  pos_r: pos_r_batch})
            hits_ = list(map(lambda x: x[0]+x[1], zip(hits_, cur_hits)))
            all_count +=cur_sum
        r = [hit/all_count for hit in hits_]
        for j, hit in enumerate(hits_k):
            print 'Hits@{:2d}: {}'.format(hit, r[j])


Starting warm-up relation training
Warm-up relation epoch:  0 sum of loss 5207895.86523
Warm-up relation epoch:  1 sum of loss 3475534.05664
Warm-up relation epoch:  2 sum of loss 2433169.40234
Warm-up relation epoch:  3 sum of loss 1828799.39893
Warm-up relation epoch:  4 sum of loss 1482831.60449
Warm-up relation epoch:  5 sum of loss 1267588.05176
Warm-up relation epoch:  6 sum of loss 1096929.78125
Warm-up relation epoch:  7 sum of loss 974408.318604
Warm-up relation epoch:  8 sum of loss 876401.463135
Warm-up relation epoch:  9 sum of loss 788701.8125
Warm-up relation epoch:  10 sum of loss 727057.040039
Warm-up relation epoch:  11 sum of loss 675831.755371
Warm-up relation epoch:  12 sum of loss 632049.754517
Warm-up relation epoch:  13 sum of loss 589350.636597
Warm-up relation epoch:  14 sum of loss 555669.37439
Warm-up relation epoch:  15 sum of loss 521739.079102
Warm-up relation epoch:  16 sum of loss 490935.845703
Warm-up relation epoch:  17 sum of loss 465631.240845
Warm-u