<a href="https://colab.research.google.com/github/vmtang11/ids705_ml_ta/blob/main/session1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IDS705 Machine Learning: Session 1

## Agenda

### Environments
- Local
- Duke Containers
- Google Colab

### Jupyter basics
- Keyboard shortcuts
- When to use markdown vs. commented code vs print statements
- Markdown basics: headers, bold/italics, lists
- Resources:
  - Kyle’s youtube video: https://www.youtube.com/watch?v=IMdfXGHzz5g
  - Cheatsheet: https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf

### Latex
- Inserting equations
- Fractions
- Subscript and superscript
- Greek letters
- Integrals and sums
- Matrix
- Resources: https://www.math.ubc.ca/~pwalls/math-python/jupyter/latex/ 

### Submission tips
- Make a copy before starting
- Run all code from the top before submission
- Do not wait till last min to export to PDF
- Look at pdf before submission

***




# Environments for Jupyter


## Local: Jupyter Notebook
- run `jupyter notebook` in terminal

__Pro__: easy to access; directly linked with local file systems

__Con__: may not have GPU

__Best For__: most of the individual assignments

## Duke Containers: Jupyter Lab
- https://vm-manage.oit.duke.edu/containers
- GPUscavanger

__Pro__: attached semi-permanent storage (reserved for whole semester); have decent GPU; insolated environment

__Con__: use upload/download to transfer files, or setup github; shared GPU resources

__Best For__: standalone environment for all IDS705 related work

## Google Colab: Jupyter Notebook

__Pro__: good GPU, even better with Pro account; able to share notebooks

__Con__: temporary storage; able to link with Google Drive as permanent storage, or clone github repo everytime; no bash terminal

__Best For__: team project that involves everyone working collboratively on same notebook

## GitHub Repos
- __sp2021__: https://github.com/kylebradbury/ids705
- __sp2020__: https://github.com/kylebradbury/ids705_sp2020

git clone commands: <br>
`git clone https://github.com/kylebradbury/ids705.git` <br>
`git clone https://github.com/kylebradbury/ids705_sp2020.git`


# Jupyter Basics



## Keyboard Shortcuts
- __Esc + a__: insert cell above
- __Esc + b__: insert cell below 
- __Esc + dd__: delete cell (does not work on Colab)
- __Esc + m__: change to markdown cell
- __Ctrl + ]__: indent
- __Ctrl + [__: dedent

- __Enter__: enter edit mode
- __Shift + Enter__: run cell, select below

Check python function documentation:
- __Shift + Tab__ within brackets (Jupyter notebook on local or remote)
- __Place cursor__ within brackets (Colab)

## Markdown vs. Commented Code vs. Print Statements
- __Markdown__: explanations, answers to questions, etc.
- __Commented code__: keeping track of what you're doing when you write code, helps us know also
- __Print statements__: incorporate variable values in written answer.

In [None]:
# For example, we don't want to have to find your answers to questions in commented code like this. 
# It is difficult to find when you have lots of code.

# But if you're explaining something like...

# import necessary packages
import pandas as pd

# then that's ok! This can help tell us what you're doing.

- __Printing__: Sometimes, you may have numerical values that are complex or change every time you run the code. You do not want to be copying and pasting into markdown block each time. Or even worse when you forgot to change them. In this case, it's often best to `print` your answer.

In [None]:
ans = 60 * 60 * 24
print("There are {} seconds in a day.".format(ans))

There are 86400 seconds in a day.


## Markdown basics

- __Headers__: Use `#` to create headers. The number of hashtags changes the size of your header. (Click into the cell to see the formatting)

Example:
# Header 1: Title
## Header 2: Subtitle
### Header 3: Sub-subtitle

- __Emphasizing stuff__: If you want to __bold__ or _italicize_ text, you can do so by wrapping the text in `__` or `**` for bold and `_` or `*` for italics. 

This is helpful when you're explaining answers and need to show that you actually answered a specific part of the question. 

Example: The AUC was __0.89__.

- __Lists__: You can create unordered (bulleted) lists using `-` or ordered (numbered) lists with `1.`, `2.`, etc.

Example: <br>
Models tested:
- linear regression
- k-nearest neighbors
- random forest

Insights:
1. blah
2. blah blah
3. blah blah blah

# Latex

- __Inserting equations__

**In line** <br>
You can insert equations in line by wrapping equations in single dollar signs: `$`. <br>
Example: The distribution $f(x)$ blah blah.

**New line** <br>
If you want your equation(s) to show up on their own line, you can wrap it in double dollar signs: `$$`. <br>
Example: The distribution $$f(x)$$ blah blah.

**Longer Equations**
For longer equations, such as when there are multiple parts or you need to show work in the format of several equations, it can be helpful to use this format:

```
\begin{equation}
\end{equation}
```

Example:
\begin{equation}
\begin{split}
a & = b + c \\
d & = e + f + g
\end{split}
\end{equation}

Formatting:
- Use `&` before each equal sign to align them. This makes it easier for us to read. Use with `begin{split}` and `end{split}`
- Use `\\` to enter a new line within the equation formatting. 

- **Fractions**: `\frac{numerator}{denominator}` <br>
Example: $\frac{1}{3}$

- **Brackets around fractions**: `\left(...\right)` or `\left[...\right]`. This can also be used around intergrals or summations. <br>
Example: $\left(\frac{1}{3}\right)$

- **Subscript**: `_` <br>
Example: $x_1$

- **Superscript**: `^` <br>
Example: $x^2$

- **Greek letters**: Use backslash `\` followed by the name of the letter `alpha`. Generally, capitalizing the first letter of the name of the letter will render the capitalized Greek letter. <br>
Examples: <br>
Lowercase alpha: $\alpha$ <br>
Uppercase sigma: $\Sigma$

- **Infinity**: `\infty` <br>
Example: $\infty$ or -$\infty$

- **Integrals**: `\int`. To add limits `\int_lower^upper` <br>
Example: $\int$ or $\int_a^b$

- **Sums**: `\sum`. To add limits `\sum_{lower}^{upper}` <br>
Example: $\sum$ or $\sum_{n=0}^{\infty}$

- **Matrices**: `matrix` for no brackets, `pmatrix` for round brackets `()`, `bmatrix` for square brackets `[]`

```
No Brackets
\begin{matrix} 
\end{matrix}

Round Brackets
\begin{pmatrix} 
\end{pmatrix}

Square Brackets
\begin{bmatrix} 
\end{bmatrix}
```

Example: <br>
No brackets: $\begin{matrix} a & b \\ c & d \end{matrix}$ <br> <br>
Round brackets: $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$ <br> <br>
Square brackets: $\begin{bmatrix} a & b \\ c & d \end{bmatrix}$

## Latex Tips
- Work in sections. It is easier to get smaller parts to render properly. When working with larger equations, it is difficult to know which part is causing an error. Split it up into smaller parts, make sure they each render properly, then piece them together. 
- Do fractions separately then paste into the larger equation.
- Work one line at a time.
- If you don't know how to write something in Latex, there are a lot of resources on Google.

Example: <br>
\begin{equation}
\int_0^{10} \frac{x^2 + 5}{\alpha_1 - 10} - x^2
\end{equation}

**Part 1:** Fraction <br>
$\frac{x^2 + 5}{\alpha_1 - 10}$ <br> 
<br>
**Part 2:** Integral <br>
$\int_0^{10}$ <br>
<br>
**Part 3:** Put it together <br>
$\int_0^{10} \frac{x^2 + 5}{\alpha_1 - 10} - x^2$

# Submission Tips

- __MAKE A COPY BEFORE STARTING__. In case you mess up and need to refer to the original copy. This is also avoids file name conflict with the question copy, and prevents git from overwritting your answers.
- __RUN ALL CODE FROM THE TOP__. It helps us make sure that your code is properly working and isn't based on a cell you've deleted. You can do this by resetting the kernel and running all cells.
- __DO NOT WAIT TILL LAST MIN TO EXPORT TO PDF__. Latex errors may not always show up in Jupyter, but can cause pdf export to fail. Make sure you leave sufficient time before submission to debug.
- __LOOK AT YOUR PDF__. If you can't read it, we can't either. Make sure all text and code appears fully and graphs are not cut off. 

***