# Lex Fridman: Karpathy Lecture

![](https://cs.stanford.edu/people/karpathy/me2.jpg)

Andrej Karpathy

Example NN (Alexnet - 2012):

```
input: 227x277x3

Conv layer: 96 11x11 w/stride of 4
output: (227-11)//4+1 = 55 => 55x55x96
parameters to train: (11*11*3)*96 = 35k

Pool layer: 3x3 w/stride of 2
output: (55-3)//2+1 = 27 => 27x27x96
parameters to train: 0

...
```

Four things that hold us back but are getting better:

1. Increasing amount of data available
1. Compute: GPU speedup over CPU
1. Algorithms
    - Deeper: adding more layers
    - Fancy regularization (dropout)
    - Fancy nonlinearity (ReLU)
1. Infrastructure: CUDA supporting efficient matrix ops

Architectures

- AlexNet 2012
    - 8 layers
- ZFNet 2013
- GoogLeNet 2014
    - 5M parameters
    - low memory footprint
- VGGNet 2014 (not the best)
    - simple and uniform
    - 138M parameters
- ResNet 2015
    - ultra-deep, 152 layers
    - residual layers
    - performance increases as more layers
    - additions distribute gradents (grad decent back prop)
    - reduce or remove pooling layers because they remove info and you want to learn as much as possible
- 2016
    - shallower and wider (more channels) also work better, *not* just deeper

Frameworks

1. Keras is a high level layer over Tensorflow
1. Tensorflow if you need more control, but more complex
1. Torch is light weight and good too

## References

- Lex Fridman: [Deep Learning for Computer Vision (Andrej Karpathy, OpenAI)](https://www.youtube.com/watch?v=u6aEYuemt0M)
- [Keras.io](https://keras.io/)
- https://arxiv-sanity.com or https://arxiv-sanity-lite.com/
- [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)

In [7]:
import numpy as np
from matplotlib import pyplot as plt
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions

In [25]:
"""
Find max prediction from a list of predictions. Data looks like:
Predicted: [
    [('n02504458', 'African_elephant', 0.88681036), ('n01871265', 'tusker', 0.04044406), 
    ('n02504013', 'Indian_elephant', 0.034975458), ('n03743016', 'megalith', 0.014869816), 
    ('n01704323', 'triceratops', 0.0076157046)]
    ]
"""
findmax=lambda a:max(a[0],key=lambda item:item[2])

model = ResNet50(weights="imagenet")

img = image.load_img("tiger.jpg", target_size=(224,224))
x = image.img_to_array(img)
x = np.expand_dims(x,axis=0)
x = preprocess_input(x)

preds = model.predict(x)

dec = decode_predictions(preds)
ans = findmax(dec)
print(f"All possibilities: {dec} \n")
print(f"Highest prediction: {ans[1]} at {ans[2]*100:.0f}%")

All possibilities: [[('n02504458', 'African_elephant', 0.88681036), ('n01871265', 'tusker', 0.04044406), ('n02504013', 'Indian_elephant', 0.034975458), ('n03743016', 'megalith', 0.014869816), ('n01704323', 'triceratops', 0.0076157046)]] 

Highest prediction: African_elephant at 89%
