In [1]:
import numpy as np
import pandas as pd
import theano
import theanets # autoencoders
from sklearn.cross_validation import train_test_split
from scipy import misc
import math

In [2]:
def crop(img):
    top, bottom, left, right = 200, -131, 100, -100
    return img[top:bottom, left:right]

In [3]:
def distance(c1, c2):
    (r1,g1,b1) = c1
    (r2,g2,b2) = c2
    return math.sqrt((r1 - r2)**2 + (g1 - g2) ** 2 + (b1 - b2) **2)

In [4]:
hit_map = {(255, 255, 255) : 0, # white nothing
           (0, 0, 0) : 0,       # homerun. not defensible
           (255, 0, 0) : 1,     # linedrive, laser
           (0, 255, 0) : 2,     # groundball  
           (0, 0, 255) : 3,     # flyball
           (160, 32, 240) : 4}  # blooper

colors = hit_map.keys()
    
def norm_color(rgb):
    rgb_key = tuple(rgb)
    
    if rgb_key in colors:
        return hit_map[rgb_key]
    else:    
        sc = sorted(colors, key=lambda color: distance(color, rgb))
        return hit_map[sc[0]]

In [5]:
mb = crop(misc.imread('charts/pros/Mookie_Betts.png', mode='RGB'))

In [6]:
mb = [[norm_color(rgb) for rgb in row] for row in mb]

In [7]:
mb = np.array(mb)

In [8]:
mb.shape

(649, 780)

## One Hidden Layer

Autoencoder with only one hidden layer, the dimensions in the data captured by the
autoencoder model approximate the results of Principal Component Analysis (PCA). However, an autoencoder behaves much differently if there is non-linearity involved. And this csae study is very much non-linear. The autoencoder will detect different latent factors that PCA will never be able to detect.

In [11]:
model = theanets.Autoencoder([780, (78,'relu'), 780])
dAE_model = model.train([mb], algo='rmsprop', input_noise=0.1, hidden_l1=.001, sparsity=0.9, num_updates=1000)
mb_dAE = model.encode(mb)
mb_dAE = np.asarray(mb_dAE, 'float32')
mb_dAE.shape

valid: 21 of 21 mini-batches from (649, 780)
train: 21 of 21 mini-batches from (649, 780)
downhill: compiling evaluation function
downhill: compiling RMSProp optimizer
downhill: setting: rms_halflife = 14
downhill: setting: rms_regularizer = 1e-08
downhill: setting: patience = 5
downhill: setting: validate_every = 10
downhill: setting: min_improvement = 0
downhill: setting: max_gradient_norm = 0
downhill: setting: max_gradient_elem = 0
downhill: setting: learning_rate = TensorConstant{0.0001}
downhill: setting: momentum = 0
downhill: setting: nesterov = False
downhill: validation 0 loss=2.059793 err=2.059152 *
downhill: RMSProp 1 loss=1.859053 err=1.858411
downhill: RMSProp 2 loss=1.530177 err=1.529347
downhill: RMSProp 3 loss=1.195426 err=1.194333
downhill: RMSProp 4 loss=0.972947 err=0.971652
downhill: RMSProp 5 loss=0.870457 err=0.869055
downhill: RMSProp 6 loss=0.823238 err=0.821782
downhill: RMSProp 7 loss=0.795459 err=0.793993
downhill: RMSProp 8 loss=0.778789 err=0.777316
downhi

(649, 78)

In [27]:
deep_features = model.find(1, 'w') # get deep features from hidden layer weights

In [30]:
print(deep_features.get_value())

[[ -1.62856655e-02   1.55826058e-02  -3.10872431e-03 ...,   3.99920489e-03
   -1.08326664e-02   6.62070141e-03]
 [ -4.09289000e-03  -7.36113827e-03   8.72672402e-03 ...,  -5.29473354e-03
   -5.70536946e-03   3.59003003e-02]
 [ -2.38483094e-02  -1.13283730e-02  -1.80755110e-03 ...,  -1.34162216e-03
   -1.23714148e-02  -2.40278328e-02]
 ..., 
 [  1.66781311e-02  -1.28154663e-02  -5.38442855e-02 ...,  -5.99715937e-03
   -1.96369397e-02   1.77364597e-02]
 [  1.25076353e-02   1.24950525e-02   2.78020631e-03 ...,  -1.14334099e-02
    4.08178309e-03  -1.21067084e-03]
 [ -2.24594425e-03  -1.62810792e-03  -1.68194820e-02 ...,   7.49590166e-05
   -9.45500708e-03   1.44203926e-02]]


## Deeper Jeeper Creepers

More hidden layers, where the "black box" mystique of deep learners come in play. How they get the deep features or what these features even mean, is at best, a deep dark mystery. After some devoted research we can make head and tails of it. Even make a more sophiscated deep learner to figure it out, but then again, we won't know how it got it conclusions. This is the "devil in the details" dilemma, they are always one step ahead of us.