# Notebook Title <a class="tocSkip">

Make some notes here for the thought process, steps taken, TODOs.


---

# Imports

## Import deps
Map all dependencies under categories, for easier tracking / readability.

Also check: https://github.com/xR86/ml-stuff/tree/master/scripts

In [1]:
# BASE ------------------------------------
from datetime import datetime as dt
nb_start = dt.now()

# Be mindful when you have this activated.
# import warnings
# warnings.filterwarnings('ignore')

import json
from pathlib import Path

from time import sleep

# Display libs
from IPython.display import display, HTML

from tqdm import tqdm, tqdm_notebook
tqdm.pandas()

SEED = 24

In [2]:
%%time

# ETL ------------------------------------
import numpy as np
import pandas as pd

# VIZ ------------------------------------
import matplotlib.cm as cm
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

import plotly.express as px
import plotly.io as pio
from plotly.tools import mpl_to_plotly

CPU times: user 847 ms, sys: 75.8 ms, total: 923 ms
Wall time: 928 ms


In [3]:
# NETWORK ANALYSIS ------------------------------------
import networkx as nx
import community as community_louvain

## Inserts for Jupyter
Any kind of IPython/Jupyter related stuff:
- classes that leverage the <kbd>display</kbd> module,
- javascript inserts,
- HTML/CSS inserts to be reused in multiple displays for highlighting,
- [table of contents](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/toc2) marking, etc.

In [4]:
# https://stackoverflow.com/a/37124230
import uuid
from IPython.display import display_javascript, display_html, display
import json

class RenderJSON(object):
    def __init__(self, json_data):
        if isinstance(json_data, dict):
            self.json_str = json.dumps(json_data)
        else:
            self.json_str = json_data
        self.uuid = str(uuid.uuid4())

    def _ipython_display_(self):
        display_html('<div id="{}" style="height: 100%; width:100%;"></div>'.format(self.uuid), raw=True)
        display_javascript("""
        require(["https://rawgit.com/caldwell/renderjson/master/renderjson.js"], function() {
        document.getElementById('%s').appendChild(renderjson(%s))
        });
        """ % (self.uuid, self.json_str), raw=True)

In [5]:
%%javascript
/*Increase timeout to load properly*/
var rto = 120;
console.log('[Custom]: Increase require() timeout to', rto, 'seconds.');
window.requirejs.config({waitSeconds: rto});

<IPython.core.display.Javascript object>

In [6]:
%%html

<style>
    /* font for TODO */
    @import url('https://fonts.googleapis.com/css?family=Oswald&display=swap');
    
    .hl {
        padding: 0.25rem 0.3rem;
        border-radius: 5px;
    }
    /* used: https://www.color-hex.com/color-palette/87453 */
    .hl.hl-yellow  { background-color: rgba(204,246,43,0.5); /*#fdef41;*/ }
    .hl.hl-orange  { background-color: rgba(255,150,42,0.5); }
    .hl.hl-magenta { background-color: rgba(244,73,211,0.5); }
    .hl.hl-blue    { background-color: rgba(80,127,255,0.5); }
    .hl.hl-violet  { background-color: rgba(149,47,255,0.5); }
    
    .todo {
        font-family: 'Oswald', sans-serif;
        font-size: 2rem;
    }
    
    input.checkmark {
        height: 1.5rem;
        margin-right: 0.5rem;
    }
                    
    kbd.cr {
        padding: 2px 3px;
        background-color: red;
        color: #FFF;
        border-radius: 5px;
    }

    kbd.xmltag {
        background-color: #ff8c8c;
        color: #FFF;
    }
    kbd.xmltag.xmltag--subnode {
        background-color: #9f8cff;
        color: #FFF;
    }
    kbd.xmltag.xmltag--subsubnode {
        background-color: #de8cff;
        color: #FFF;
    }
</style>

<!-- ========================================== -->
<h3 style="margin-top:1rem; margin-bottom:2rem"> Examples: </h3>
    
<div>Highlighted text in:
    <span class="hl hl-yellow">yellow</span>,
    <span class="hl hl-orange">orange</span>,
    <span class="hl hl-magenta">magenta</span>,
    <span class="hl hl-blue">blue</span>,
    <span class="hl hl-violet">violet</span>,
</div>

<br/>

<div class="todo">TODO</div>  
<input class="checkmark" type="checkbox" checked="checked" disabled>Finished TODO text.  
<input class="checkmark" type="checkbox" disabled>TODO text.

<br/><br/>

Tags: <kbd class="cr">CR</kbd> (CR for Camera-Ready, graphs/sections that are important)

## Import data
Make sure to check if the data is present in the targeted scope, and that the size is readable with current RAM.  
Feel free to have subheadings for multiple datasets, data descriptions etc.

In [7]:
%%bash
ls -l tests/

total 24
-rwxrwxrwx 1 1000 1000  854 Mar 13  2018 README.md
drwxrwxrwx 2 1000 1000 4096 Jul 14  2018 __pycache__
-rwxrwxrwx 1 1000 1000 1416 Nov 23  2017 pytest_sample.py
-rwxrwxrwx 1 1000 1000  283 Dec 26  2017 spec_project_config.md
-rwxrwxrwx 1 1000 1000 1372 Mar 13  2018 test_git.py
-rwxrwxrwx 1 1000 1000 2615 Mar 13  2018 test_github_community.py


In [8]:
%%time
# df = pd.read_csv()
# df.info()

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.72 µs


In [9]:
# df.head()

---

# Main
Pick a more flat structure (h1 headings with #) or nested structure appropriately - not too nested, not too cluttered.

## Use the bash, Luke

In [10]:
%%bash
ls -l tests/

total 24
-rwxrwxrwx 1 1000 1000  854 Mar 13  2018 README.md
drwxrwxrwx 2 1000 1000 4096 Jul 14  2018 __pycache__
-rwxrwxrwx 1 1000 1000 1416 Nov 23  2017 pytest_sample.py
-rwxrwxrwx 1 1000 1000  283 Dec 26  2017 spec_project_config.md
-rwxrwxrwx 1 1000 1000 1372 Mar 13  2018 test_git.py
-rwxrwxrwx 1 1000 1000 2615 Mar 13  2018 test_github_community.py


In [11]:
%%bash
ls data/raw | wc -l | xargs printf '%s files'
du -h data/raw | cut -f1 | xargs printf ', total of %s'

ls data/raw/ | head -n 4 | xargs printf '\n\t%s'
ls data/raw/ | tail -n 4 | xargs printf '\n\t%s'

0 files, total of 
	
	

ls: cannot access 'data/raw': No such file or directory
du: cannot access 'data/raw': No such file or directory
ls: cannot access 'data/raw/': No such file or directory
ls: cannot access 'data/raw/': No such file or directory


## Make use of IPython stuff

F-strings ([>= Python 3.6](https://www.python.org/dev/peps/pep-0498/)) can be combined with IPython's <kbd>display</kbd> module for fun and profit.

In [12]:
tm = (dt.now() - nb_start).total_seconds()

display(HTML(f'Started notebook <span class="hl hl-yellow">{tm:.0f}s</span> ago.'))
# If you use type specifiers, don't put space after the specifier
# display(HTML(f'{ tm:.0f}'))  # works
# display(HTML(f'{ tm:.0f }')) # breaks

For more nested json's or dictionaries, it's best to use something interactive like RenderJSON.

In [13]:
RenderJSON({
    'a': {
        'c': 0
    },
    'b': 1
})

Use [slides](https://github.com/damianavila/RISE), and decouple declaration from run (if you're not using ["Hide codecell inputs"](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/hide_input_all)).

In [14]:
slide_1 = HTML("""
<h3>Lex Fridman<br/><br/>
Deep Learning Basics: Introduction and Overview<br/>&nbsp;</h3>
<iframe width="560" height="315" src="https://www.youtube.com/embed/O5xeyoRL95U" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
""")

In [15]:
slide_1

Use [tqdm](https://github.com/tqdm/tqdm) for every run that takes more than a couple of seconds and can be tracked by some iterator.

In [16]:
# For most simple stuff, use tqdm
files = list(range(10))
for file in tqdm_notebook(files):
    sleep(0.1)


This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`



HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))




In [17]:
# When you need more control over the progress bar, 
# use decoupled tqdm
with tqdm_notebook(total=len(files)) as pbar:
    for file in files:
        sleep(0.1)
        pbar.update(1)


This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`



HBox(children=(FloatProgress(value=0.0, max=10.0), HTML(value='')))




In [18]:
nb_end = dt.now()
print('Time elapsed: %s' % (nb_end - nb_start))

Time elapsed: 0:00:05.598512


In [19]:
'Time elapsed: %.2f minutes' % (
    (nb_end - nb_start).total_seconds() / 60
)

'Time elapsed: 0.09 minutes'

---

# Bibliography
Keep this section for code tricks, further reading, next steps.  
Make sure you have 1 of each empty cell (markdown and code) - useful to copy-paste (<kbd>C</kbd>, <kbd>V</kbd>) empty cells with the "Skip" slide type (<kbd>View</kbd> -> <kbd>Cell Toolbar</kbd> -> <kbd>Slideshow</kbd>)

Useful tricks:
+ https://stackoverflow.com/questions/18873066/pretty-json-formatting-in-ipython-notebook/37124230#37124230


Useful reading:
+ https://www.semanticscholar.org/paper/Reproducible-Research-Environments-with-Repo2Docker-Forde-Head/7c015f96c0545e2b68866769c082a46362381774

---