# General Notes


## Jupyter notes
To attach to an existing python kernel, you run this command at the console:
```
$ jupyter console --existing
```
But then, when you exit, the kernel stops! This is probably not what you want. So exit like this:
```
exit(keep_kernel=True)
```
Ctrl-D also works, it asks if you want to keep the kernel alive, and it does as asked


To get information about the kernel, you run the following pragma in the notebook

In [1]:
%connect_info

{
  "shell_port": 45977,
  "iopub_port": 49741,
  "stdin_port": 33997,
  "control_port": 45215,
  "hb_port": 33623,
  "ip": "127.0.0.1",
  "key": "2a788679-908fa47b81655e5a37c3b625",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": ""
}

Paste the above JSON into a file, and connect with:
    $> jupyter <app> --existing <file>
or, if you are local, you can connect with just:
    $> jupyter <app> --existing kernel-45b0e02a-94d2-4cac-8ca4-c1076bd666ee.json
or even just:
    $> jupyter <app> --existing
if this is the most recent Jupyter kernel you have started.


Jupyter is modal (yay, vi gets the last laugh) and supports a lot of commands in the command-mode. Here are some of them that I found useful

 * Esc: Go to command mode.
 * *p*: Show the command (P)rompt, where you can type the name of a command (not all commands are mapped to keystrokes)
 * *m*: Change cell to (M)arkup mode.
 * *y*: Change cell to code or P(Y)thon mode.
 * *a/b*: Add cell (A)bove/add cell (B)elow
 * *Shift-O*: Toggle scr(O)lling for the output, so you can expand the full output.
 * Enter: Go to edit mode
 * *j*/*k*: Usual vi down/up movement.
 * *h*: Show (H)elp.




## Interesting machine learning models.

Read about [GPT-3](https://www.jesuisundev.com/en/gpt-3-the-gigantic-artificial-intelligence/). Much more [information about GPT-3](https://www.gwern.net/GPT-3#william-shakespeare)

# Music based learning

Links for either training based on music, extracing music features or generating music

[Extracting music features](https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d)

# Web data based learning

[Getting stock information](https://medium.com/@andy.m9627/the-ultimate-guide-to-stock-market-apis-for-2020-1de6f55adbb) and [this company](https://finnhub.io/) has a great free product for getting open/high/low/close information over a time period.
[Scraping the web for arbitrary information](https://github.com/alirezamika/autoscraper)


[An example of using Yahoo's stock api](https://github.com/sombandy/stock-market/blob/master/stock_performance.ipynb)

# Git notes

This is how you configure git
```
git config --global credential.helper cache
git config --global credential.helper 'cache --timeout=9999999999'
git config --global user.email "vikram@eggwall.com"

```


# Dimensionality


In [None]:
import numpy as np

# Run later
d={}
runs = 1000000

for dimensions in (1, 2, 3, 4, 10, 100, 1000, 1000000):
    sum_distance = 0.0

    for i in range(runs):
        a=np.random.rand(dimensions)
        b=np.random.rand(dimensions)
        sum_distance += np.linalg.norm(a-b)
    
    sum_distance /= (1.0*runs)
    print ("Mean distance in dimension %d is %f" % (dimensions, sum_distance))
    d[str(dimensions)] = sum_distance

Mean distance in dimension 1 is 0.333250
Mean distance in dimension 2 is 0.521256
Mean distance in dimension 3 is 0.661650
Mean distance in dimension 4 is 0.777662
Mean distance in dimension 10 is 1.267396
Mean distance in dimension 100 is 4.075047
Mean distance in dimension 1000 is 12.907584


In [9]:
d

{'1': 0.3332496908264852,
 '2': 0.5212556131792213,
 '3': 0.6616503043959507,
 '4': 0.7776623425731862,
 '10': 1.2673959751112913,
 '100': 4.075047152565091,
 '1000': 12.907583942040752,
 '1000000': 408.24854896803464}

# Tensorflow

Testing that it exists, now that I've compiled it from source!

And I needed to compile Tensorflow because pre-packaged binaries emit AVX instructions
which my old machine doesn't support.

In [25]:
import tensorflow as tf
print("Tensorflow version = ", tf.__version__)
print("Keras version = ", tf.keras.__version__)

import pydot


Tensorflow version =  2.3.0
Keras version =  2.4.0


In [1]:
import matplotlib.cm as cm
from matplotlib.image import imread
# import matplotlib as mpl
import matplotlib.pyplot as plt
# import mpl_toolkits.mplot3d.axes3d as p3

import numpy as np

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

from sklearn.metrics import accuracy_score
from sklearn.metrics import silhouette_samples
from sklearn.metrics import silhouette_score

from sklearn.datasets import fetch_california_housing
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow import keras


In [2]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
print (X_train_full.shape)
print (X_train_full.dtype)

(60000, 28, 28)
uint8


In [3]:
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000]        , y_train_full[5000:]
X_test = X_test / 255.0

In [4]:
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(300, activation="relu"),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
             optimizer="sgd",
             metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid),
                    verbose=0)

In [30]:
keras.utils.plot_model(model)

('Failed to import pydot. You must `pip install pydot` and install graphviz (https://graphviz.gitlab.io/download/), ', 'for `pydotprint` to work.')


# Edge TPU

Edge TPU devices don't run full TensorFlow, they only run tflite. For this, we need to create tflite models rather than normal TF models.

[This page talks about creating TFlite models](https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf1.ipynb).  The [TFLite converter class](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter) is the one that creates the converted model. Let's save our model so that we can practice with it later.

I need to save the model, and then run a converter, and then load them up on the Edge TPU device.

In [11]:
saved_model = 'saved_models/fashion.h5'
model.save(saved_model)

converter = tf.lite.TFLiteConverter.from_keras_model(model)

converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

with open('saved_models/fashion_viki_test.tflite', 'wb') as f:
    f.write(tflite_model)

INFO:tensorflow:Assets written to: /tmp/tmphhyeosbg/assets


INFO:tensorflow:Assets written to: /tmp/tmphhyeosbg/assets


And then we need to convert the tflite models to edge TPU models. WTH, folks. This ought to be simpler than this. These instructions come from [the Coral page](https://coral.ai/docs/edgetpu/compiler/#system-requirements):

```
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

sudo apt-get update

sudo apt-get install edgetpu-compiler

edgetpu-compiler model.tflite
```

But of course, that doesn't work (were you expecting it to? You naive creature).

For that, we have to quantize both the weights and the activation values. For that you have to [provide a representative dataset as this colab notebook points out](https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf1.ipynb#scrollTo=w9ydAmHGHUZl&line=2&uniqifier=1). And of course, there [is some information on how to do this](https://www.tensorflow.org/lite/performance/post_training_quantization) but it is relatively slim on detail.

In [21]:
saved_model = 'saved_models/fashion.h5'
model.save(saved_model)

converter = tf.lite.TFLiteConverter.from_keras_model(model)
def representative_dataset_gen():
    num_calibration_steps = 5
    for p in range(num_calibration_steps):
        # Get sample input data as a numpy array in a method of your choosing.
        sample = X_train[p]
        sample = tf.cast(sample, tf.float32)
        yield [sample]


# Set the representative dataset for the converter so we can quantize the activations
converter.representative_dataset = representative_dataset_gen

converter.optimizations = [tf.lite.Optimize.DEFAULT]

# This ensures that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Set the input and output tensors to uint8
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

# Turn off MLIR
converter.experimental_new_converter = False

tflite_model = converter.convert()

with open('saved_models/fashion_viki_test.tflite', 'wb') as f:
    f.write(tflite_model)

INFO:tensorflow:Assets written to: /tmp/tmpiujw_zaq/assets


INFO:tensorflow:Assets written to: /tmp/tmpiujw_zaq/assets


Good lord. This is messy beyond belief. The [edgetpu-compiler itself crashes sometimes](https://github.com/google-coral/edgetpu/issues/168). No useful error messages, no information on how to provide it input, what format it is looking for. How the hell do people develop for this?

[Someone else's view on how to get it working](https://towardsdatascience.com/solutions-to-issues-with-edge-tpu-32374310e732). This thing is a certified loony-town.

# Other notes
Links for Google Cloud Courses
[The single video](https://www.coursera.org/lecture/gcp-fundamentals/why-choose-google-cloud-platform-vXwU1)
and the full course:
[You have to select single courses to view the content for free](https://www.coursera.org/specializations/gcp-architecture?action=enroll&authType=google&completeMode=existingCourseraAccount#courses)

[These are someone's notes on this exam](https://medium.com/@sathishvj/notes-from-my-google-cloud-professional-cloud-architect-exam-bbc4299ac30)

# Python notes

Python is an interesting language, and the list/tuple/nparray takes a while getting used to. Here's some lessons from what I have picked up.


Allocating large amounts of memory: 

In [23]:
s=np.array([40000, 100000, 10029100, 202002], dtype=np.float32)
s=bytearray(51200000*100)
s=''

Getting help on a specific method:

In [13]:
np.array?

Looking up values 

In [None]:
a

In [None]:
import numpy as np

In [None]:
%connect_info