<a href="https://colab.research.google.com/github/thunguyen177/Big-data-resources/blob/master/cheat_every_thing_sheet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- **Download & extract data**:
download and extract: `tf.keras.utils.get_file()` or 
```
import tensorflow_datasets as tfds

dl_manager = tfds.download.DownloadManager
train_dir = dl_manager.download_and_extract('https://abc.org/train.tar.gz')
```

 Parallel download: list -> list
```
image_files = dl_manager.download(
    ['https://a.org/1.jpg', 'https://a.org/2.jpg', ...])
```
 Parallel download: dict -> dict
```
data_dirs = dl_manager.download_and_extract({
   'train': 'https://abc.org/train.zip',
   'test': 'https://abc.org/test.zip',
})
data_dirs['train']
data_dirs['test']
```
- **Preprocessing data**: 

`train_df = tf.data.Dataset.from_tensor_slices((Xtrain, ytrain)).shuffle(TRAIN_BUF).batch(BATCH_SIZE)`


- **Checkpoint**:

```
def scheduler(epoch):
  if epoch < 40:
    return 0.001
  else:
    return 0.0005

callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
checkpoint_path = "cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 monitor = 'loss',
                                                 save_best_only = True,
                                                 save_weights_only=True,
                                                 verbose=1)

cp_callback = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss',
                                               patience = 100,
                                               restore_best_weights = True)

history = model.fit(X_train, y_train,
                    batch_size = BATCH_SIZE,
                    epochs=EPOCHES,
                    callbacks=[callback,cp_callback]) 
```
- **loading weights from check points**:
```
checkpoint_path = "drive/My Drive/Colab Notebooks/checkpoints_conv/cp.ckpt"
model.load_weights(checkpoint_path)
```

## Functions:

- `@tf.function`: decorator to run function as a tensorflow graph

- `tf.map_fn`: map function to a vector. **super slow**

- `tf.vectorized_map`: Parallel map on the list of tensors unpacked from elems on dimension 0. (does not allow index mapping. For index mapping, use python built in `map()` instead)

## Gradient

- **Gradient clipping**: 

`tf.clip_by_value`

`tf.clip_by_norm`

`tf.clip_by_global_norm` 

```
  with tf.GradientTape() as tape:
    loss = compute_loss(model, x,y)
  gradients = tape.gradient(loss, model.trainable_variables)
  # gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
  gradients = [tf.clip_by_value(gr, 0.0001,1) for gr in gradients]
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
```


## Colab

- **Download data directly**:

In [None]:
# raw=true is important so you download the file rather than the webpage.
!wget https://github.com/snagcliffs/PDE-FIND/blob/master/Datasets/kuramoto_sivishinky.mat?raw=true
# rename the file
!mv kuramoto_sivishinky.mat\?raw\=true kuramoto_sivishinky.mat
# For zipped files
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00433/Flowmeters.zip
!unzip Flowmeters.zip

- **connect with drive**
```
from google.colab import drive
drive.mount('/content/drive')
```
- **Upload file**:
```
from google.colab import files
uploaded = files.upload()
```
- **Download files**:
```
from google.colab import files
files.download('checkpoint')
files.download('cp.ckpt.data-00000-of-00001') 
files.download('cp.ckpt.index')
```



## Tensorflow adds on
```
import tensorflow_addons as tfa
```

## Pandas
- It's possible to use `pop()` with column numbers. Ex: `df.pop(48)` to get the 48th column

## set random seed

```
from numpy.random import seed
seed_value = 0
tf.random.set_seed(0)
# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
np.random.seed(seed_value)
```


<script type="text/javascript" src="//s7.addthis.com/js/300/addthis_widget.js#pubid=ra-5cf9633718532342"></script>
<div class="addthis_inline_share_toolbox"></div>
    <div id="fb-root"></div>
<script async defer crossorigin="anonymous" src="https://connect.facebook.net/en_GB/sdk.js#xfbml=1&version=v3.3"></script>
<div class="fb-comments" data-href="https://ellienguyen.style/aii/index.html" data-width="" data-numposts="5"></div>
    <footer id="main-footer">
        <hr color = "black" width="400px">
            		Copyright &copy; 2019 Ellie Nguyen. All rights reserved.
	</footer>

