# Non-Linearity Index
## Author: Tom Kerby
## Date updated: November 2021
___
This function takes a look at the get_nonlinearity_indexic dimension of a data set and compares that number of pca components for variance explained. If the value is close to 1 it means that the data is very linear, and if it is closer to 0 then it is very nonlinear.

In [13]:
import skdim
import numpy as np
from sklearn.decomposition import PCA

def get_nonlinearity_index(data):
    int_dim = skdim.id.TwoNN().fit(data)
    pca = PCA(n_components=int(np.floor(int_dim.dimension_)))
    pca.fit(data)
    return sum(pca.explained_variance_ratio_)

In [14]:
from keras.datasets import mnist

(train_X, train_y), (test_X, test_y) = mnist.load_data()
test_X = test_X.reshape(10000,784)
get_nonlinearity_index(test_X)

0.5566058383867205

In [15]:
import tensorflow as tf

(train_X, train_y), (test_X, test_y) = tf.keras.datasets.fashion_mnist.load_data()
test_X = test_X.reshape(10000,784)
get_nonlinearity_index(test_X)

0.7588318658094958

In [16]:
data = np.zeros((10000,1000))
data[:,:1000] = skdim.datasets.hyperBall(n = 10000, d = 1000, radius = 1, random_state = 0)
get_nonlinearity_index(data)

0.3108437654271855

In [17]:
data = skdim.datasets.swissRoll3Sph(n_swiss=4000,n_sphere=2000, h=2, random_state = 0)
get_nonlinearity_index(data)

0.7592734981253877