# Preparations

### Imports

In [1]:
import numpy as np
from numpy.linalg import inv
from scipy.linalg import sqrtm, eigh
import pandas as pd

### Read Data

In [6]:
example1 = './data/example1.dat'
test = './data/test.dat'

counter = 1
with open(example1) as file:
  lines = [line.rstrip().split(',') for line in file]

lines = np.array(lines, dtype=int) - 1
lines

array([[  0,   1],
       [  0,   2],
       [  0,   3],
       ...,
       [215, 234],
       [212, 234],
       [240, 234]])

# K-eigenvector Algorithm

## 1. Form Affinity Matrix A

Affinity Matrix is another word for Adjacency Matrix if I understand it correctly? https://math.stackexchange.com/questions/3275579/the-similarity-matrix-of-graph-laplacian-matrix-has-different-names-whats-the

In [7]:
maxID = lines.max()

A = np.zeros(shape=(maxID + 1, maxID + 1))

for edge in lines:
  fromNode = edge[0]
  toNode = edge[1]

  A[fromNode][toNode] = 1

A

array([[0., 1., 1., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

## 2. Construct Laplacian Matrix L

Hopefully I constructed Normalized Laplacian Matrix correctly :))))    

Please double check :')

[[0. 1. 1. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [12]:
D = np.diag(A.sum(axis=1))
print(D)
L = ((sqrtm(inv(D))).dot(A)).dot(sqrtm(inv(D)))

def check_symmetric(a, rtol=1e-05, atol=1e-08):
    return np.allclose(a, a.T, rtol=rtol, atol=atol)

check_symmetric(L)

[[7. 0. 0. ... 0. 0. 0.]
 [0. 8. 0. ... 0. 0. 0.]
 [0. 0. 9. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 2. 0. 0.]
 [0. 0. 0. ... 0. 5. 0.]
 [0. 0. 0. ... 0. 0. 2.]]


True

## 3. Form Matrix X of the k largest eigenvectors of L

Eigenvectors are returned in the right form: each column is one eigenvector if I understand it correctly. I have not chosen "orthogonal" vectors if eigenvalues are repeated -> `TODO`!  

Also, what in the world do they mean with "find the k `largest` eigenvectors"? Is it the eigenvectors corresponding to the largest eigenvalues, as I have picked now?

In [41]:
k = 5
eigenVals, eigenVecs = eigh(L, eigvals=(len(L) - k - 1, len(L) - 1))
X = eigenVecs
X

array([[ 0.        ,  0.        ,  0.        , -0.0867576 ,  0.        ,
         0.        ],
       [ 0.        ,  0.        ,  0.        , -0.09274778,  0.        ,
         0.        ],
       [ 0.        ,  0.        ,  0.        , -0.09837388,  0.        ,
         0.        ],
       ...,
       [-0.28481427,  0.        ,  0.        ,  0.        , -0.08543577,
         0.        ],
       [ 0.04531353,  0.        ,  0.        ,  0.        , -0.13508581,
         0.        ],
       [ 0.00518985,  0.        ,  0.        ,  0.        , -0.08543577,
         0.        ]])

## 4. Form Matrix Y by renormalizing X

I belive the `normalizationFactor` is calculated correctly, but should be double checked!

In [39]:
Y = np.zeros(shape=X.shape)
for i in range(Y.shape[0]):
  renomalizationFactor = np.sqrt(np.sum(np.square(X[i])))
  for j in range(Y.shape[1]):
    Y[i][j] = X[i][j] / renomalizationFactor

len(Y)

241

## 5. Run K-means clustering algorithm on Y

K-means returns 241 (same number as #nodes) assigned colors. Each color `colors[i]` is meant to be assigned to `node[i]` if I understand it correctly.

In [47]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=k)
kmeans.fit(Y)

colors = kmeans.labels_

print("Clusters:", colors)

Clusters: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 4 4 4 0 0 0 4 4 4 4 0 0]


## 6. Assign original points to clusters

What to do here? :))