## Building concordance matrix

Suppose we have 200 companies, and for each company, we have constructed its market vector and technology vectors, respectively. To simulate this case, we are going to use numpy to randomly generate market vectors and technology vectors for these companies.

**Note:** M, N are lists containing 200 items, and each item is a 300d vector. Company_ids is unique identifiers for each companies (for the sake of simplification, we generate a sequence of numbers up to 200 using **range** function).

In [16]:
import numpy as np

M = np.random.random((200, 300))
T = np.random.random((200, 300))

Company_ids = range(200)

M = list(M)
T = list(T)

M_dict = dict(zip(Company_ids, M))
T_dict = dict(zip(Company_ids, T))

print('The length of M is :', len(M))
print('The length of T is :', len(T))

The length of M is : 200
The length of T is : 200


In [17]:
print('The market vector for the 1st company: ', M_dict[Company_ids[0]])
print('The technology vector for the 1st company: ', T_dict[Company_ids[0]])

The market vector for the 1st company:  [4.14119885e-01 8.14707724e-01 6.55967345e-01 3.77237501e-01
 9.79273426e-01 9.63288582e-01 7.66990194e-01 6.74640872e-01
 7.80924022e-01 9.73029324e-01 5.88517334e-01 3.91016730e-02
 3.01780178e-01 5.84168299e-01 4.17947776e-01 3.19350547e-01
 3.60474206e-01 1.85149974e-01 3.21713689e-01 1.07509597e-01
 1.88005271e-01 6.14792377e-01 9.90431454e-01 7.16647647e-01
 2.33173114e-01 9.03261748e-01 3.70825422e-01 2.20217737e-01
 8.50122151e-01 4.57710383e-01 2.49403374e-01 2.13756121e-01
 7.43894673e-01 2.85588032e-01 9.56516845e-01 2.21813243e-02
 6.66949389e-03 8.33781832e-01 6.13170218e-01 8.94071388e-01
 1.69366744e-01 7.31489159e-02 2.75569870e-01 7.78758713e-01
 9.87762735e-01 8.90797455e-01 8.59343375e-01 5.19138177e-01
 5.72995073e-01 7.52481540e-01 2.51458464e-01 7.25372261e-01
 5.87796963e-01 1.01631766e-01 8.88869950e-02 3.19329991e-01
 5.61488068e-02 7.25663629e-01 6.40863417e-01 3.37013836e-01
 7.96715719e-02 2.50792251e-01 7.47884557e-01

Next, we are going to use pytorch, which is a commonly used deep learning package, to generate the concordance matrix. Patent2Product is used to train the concordance matrix $W$ which converts **Technology vectors** to **Market vectors**, such that $m = Wt$.

**Patent2Product**

In [36]:
import torch
import torch.nn as nn
import torch.optim as optim

class Patent2Product(nn.Module):
    def __init__(self, hidden_size=300):
        super(Patent2Product, self).__init__()
        self.W = nn.Linear(hidden_size, hidden_size, bias=False)
        
    def forward(self, patent_emb):
        return self.W(patent_emb)


model = Patent2Product()
optimizer = optim.Adam(model.parameters(), lr=2e-6, eps=1e-8)
loss = nn.MSELoss()


for epoch in range(90):
    print('##########' + str(epoch) + '##########')
    loss_values = 0
    n = len(Company_ids)
    for idx in Company_ids:
    
        patent_rep = T_dict[idx] #rep -> representation
        product_rep = M_dict[idx]
    
        patent_rep = torch.tensor(patent_rep)
        product_rep = torch.tensor(product_rep)
    
        pred_product_rep = model(patent_rep.float())
    
        output = loss(pred_product_rep, product_rep.float())
    
        output.backward()
        optimizer.step()
        loss_values += output.item()
    print('loss": ', loss_values/n)
    

##########0##########
loss":  0.40203569412231444
##########1##########
loss":  0.32211015120148656
##########2##########
loss":  0.2582720617949963
##########3##########
loss":  0.210417812243104
##########4##########
loss":  0.17619522608816623
##########5##########
loss":  0.15317783549427985
##########6##########
loss":  0.13926016755402087
##########7##########
loss":  0.1326439692080021
##########8##########
loss":  0.13163605596870184
##########9##########
loss":  0.13470597714185714
##########10##########
loss":  0.14039706587791442
##########11##########
loss":  0.14728813104331492
##########12##########
loss":  0.1542786018550396
##########13##########
loss":  0.16074289724230767
##########14##########
loss":  0.1664618057012558
##########15##########
loss":  0.17146515615284444
##########16##########
loss":  0.17579629838466646
##########17##########
loss":  0.17933218620717525
##########18##########
loss":  0.18177206993103026
##########19##########
loss":  0.18272810913622

In [38]:
import torch
import torch.nn as nn
import torch.optim as optim

class Product2Patent(nn.Module):
    def __init__(self, hidden_size=300):
        super(Product2Patent, self).__init__()
        self.W = nn.Linear(hidden_size, hidden_size, bias=False)
        
    def forward(self, product_emb):
        return self.W(product_emb)


model = Product2Patent()
optimizer = optim.Adam(model.parameters(), lr=2e-6, eps=1e-8)
loss = nn.MSELoss()


for epoch in range(90):
    print('##########' + str(epoch) + '##########')
    loss_values = 0
    n = len(Company_ids)
    for idx in Company_ids:
    
        patent_rep = T_dict[idx] #rep -> representation
        product_rep = M_dict[idx]
    
        patent_rep = torch.tensor(patent_rep)
        product_rep = torch.tensor(product_rep)
    
        pred_patent_rep = model(product_rep.float())
    
        output = loss(pred_patent_rep, patent_rep.float())
    
        output.backward()
        optimizer.step()
        loss_values += output.item()
    print('loss": ', loss_values/n)

##########0##########
loss":  0.4069836397469044
##########1##########
loss":  0.3267474329471588
##########2##########
loss":  0.2627543643862009
##########3##########
loss":  0.2149646704643965
##########4##########
loss":  0.18085583582520484
##########5##########
loss":  0.15809414960443974
##########6##########
loss":  0.14453076124191283
##########7##########
loss":  0.13786610692739487
##########8##########
loss":  0.1361248792335391
##########9##########
loss":  0.13788298938423396
##########10##########
loss":  0.14194734521210195
##########11##########
loss":  0.14733989078551532
##########12##########
loss":  0.15335064630955458
##########13##########
loss":  0.15937602568417789
##########14##########
loss":  0.16488921631127595
##########15##########
loss":  0.16955834299325942
##########16##########
loss":  0.17323443289846183
##########17##########
loss":  0.17586947567760944
##########18##########
loss":  0.17746883533895016
##########19##########
loss":  0.1781368339061

To show the trained matrix, and convert it to a numpy format: 

**Note: model -> Product2Patent**

In [44]:
W = model.W.weight.detach().numpy()
W

array([[-0.00984566, -0.01518547,  0.02941971, ...,  0.01251339,
         0.01823718, -0.00455915],
       [ 0.04339917, -0.03286355,  0.02020139, ..., -0.02475692,
        -0.01887817, -0.03694284],
       [ 0.0175809 , -0.01054933, -0.00878088, ..., -0.00897275,
        -0.01198722,  0.03364225],
       ...,
       [ 0.02596598,  0.06618389,  0.01389562, ...,  0.01874409,
         0.03300765,  0.01045165],
       [ 0.02547042,  0.05104292,  0.04433365, ..., -0.05745807,
        -0.018433  ,  0.02381825],
       [-0.01368219,  0.01941934,  0.02370436, ...,  0.06494612,
         0.00156734,  0.01528808]], dtype=float32)

If you are familiar with pytorch, you can direct use it by, for example

In [50]:
m = M_dict[0]
t = T_dict[0]

m = torch.tensor(m)
t_est = model(m.float())

print('The estimated technology vector: ', t_est)
print('The true technology vector: ', t)

The estimated technology vector:  tensor([ 0.6946,  0.1388,  0.5107,  0.3835,  0.4541,  0.4891,  0.5140,  0.6047,
         0.3468,  0.3065,  0.6402,  0.3535,  0.5290,  0.6260,  0.2615,  0.3217,
         0.3615,  0.4065,  0.6230,  0.7885,  0.6999,  0.3399,  0.3948,  0.5525,
         0.7400,  0.6989,  0.5550,  0.5431,  0.4952,  0.1844,  0.2025,  0.5989,
         0.7192,  0.5874,  0.3918,  0.3840,  0.5281,  0.8388,  0.5468,  0.6795,
         0.3342,  0.4795,  0.4885,  0.3230,  0.8537,  0.6447,  0.4012,  0.4875,
         0.6578,  0.6398,  0.2538,  0.4650,  0.4954,  0.6666,  0.5745,  0.6280,
         0.6229,  0.5106,  0.6595,  0.5387,  0.6717,  0.2463,  0.3253,  0.3853,
         0.3710,  0.3988,  0.5328,  0.5303,  0.3543,  0.4153,  0.6800,  0.4707,
         0.5667,  0.7581,  0.5241,  0.5138,  0.2660,  0.6601,  0.1587,  0.6814,
         0.7396,  0.3213,  0.3678,  0.7020,  0.5474,  0.3194,  0.6759,  0.7029,
         0.4347,  0.3614,  0.4995,  0.6427,  0.3803,  0.5238,  0.5263,  0.3667,
      

**Caveats: we did not split train and test sets in this toy example, because these vectors are randomly generated. One needs to split train and test sets in real cases.**