<h1>Tutorial: Fancy Tools for Exploring Data Science with Python</h1>

*To open in Colab, click the badge below!*

<a href="https://colab.research.google.com/github/teboozas/python_tutorial_for_data_science/blob/master/Eng/Tutorial_Ch1(Colaboratory).ipynb" target="_parent"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Colaboratory


## 1.1 What is Google Colaboratory(Colab)?

**Colab is a Jupyter notebook environment based on the Google Drive environment**

Jupyter Notebook is widespreading interactive development environment, which can simultaneously run codes and check their output, and easily be documented via Markdown syntax. In addition, Colab is becoming the most popular material for data science / machine learning education using Python, because of its compatibility with Python and related packages.

Colab is running on the Google Drive environment, installation and configurations are basically not needed. Including this, Colab provides high performance CPU / GPU for free, thus it is arguably sutable platform for data science / machine learning education.

## 1.2 Basics of Jupyter Notebook

At first, let's figure out basics of Jupyter Notebook environment, including file format and components.

<h4>1) Jupyter notebook file format: .ipynb</h4>

Documents produced using Jupyter Notebook are stored into `.ipynb` format. This is because Jupyter Notebook (and Jupyter Project) inherited IPython project. And `.ipynb` format is different from ordinary Python script. (`.py` format)

Thus, altought it is possible to work with `.py` script opening within Jupyter Notebook environment, the opposite is impossible. If you want to store documents into Python script format, you have to follow a procedure below:
```
Colab menu bar → File → Download .py
```

<h4>2) components of notebook document: cell</h4>

Notebooks(`.ipynb` formatted files) consist of *cells*. And cells can be categorized into *text cells* and *code cells*, which perform documentization of notebook / writing and executing codes respectively.

* **text cells**

    Text cells are cells that can documentize notebook format via Markdown syntax. With Markdown syntax, users can make technical documents much easier than other platforms. That's why many of the researchers and developers use Jupyter Notebook environment.

    Also, text cells support LaTex syntax(based on MathJax engine). So users can write formula in the middle of sentences(like $y=ax+b$), or insert formula boxes as below:
    $$
    f(x;\mu,\sigma^2)=\cfrac{1}{\sqrt{2\pi \sigma^2}}\exp{\left[ -\cfrac{(x-\mu)^2}{2\sigma^2} \right]},\ -\infty < \mu < \infty,\ \sigma > 0
    $$
    
    More on Markdown syntax is available on links at the end of this document.

* **code cell**

    Code cells are the cells that users can write codes and run it in an instance.

    Code cells in a same documents are all related, thus objects and variables defined on cells can be used in other code cells. This functionality reduces unnecessary codes and uniformly maintains a whole document.

    Currently, Jupyter Notebook in Colab supports only Python(both version 2 and 3) language, and basically provides frequently used data science / machine learning packages; NumPy, Pandas, Scikit-learn, Tensorflow, etc. Above this, users can use most of the existing Python packages via Python package installer(`pip`).

In [0]:
# This is an example of code cell.
# Try "Ctrl+Enter" (or Cmd+Enter) for code execution!
# You can see the result instantly right below this code cell.

print("hello world!")

## 1.3 Configurations for Colab and Jupyter Notebook environment

In this section, we will breifly find out to create/open new notebook, runtime setting, and exploring runtime environments including pre-installed Python packages.

<h4>1) Create / open `.ipynb` format files in Colab</h4>

* **To create a new notebook document in Colab**

    > **`access to Google Drive → 'New' → 'More' → 'Google Colaboratory'`<br>**
    > ※ If `Google Colaboratory` option doesn't appear, click 'Connect more apps' tab and search 'Google Colaboratory' to connect it.
    
    > **`Colab menu bar → 'File' → 'New Python 3 notebook'`**<br>
    > ※ Support of Python version 2 is sceduled to be suspended from 2020. Thus, using Python version 3 is recommended for further research.
    
    Notebooks created by second method are stored in your personal Google Drive folder, named 'Colab Notebooks'.

* **To open existing `.ipynb` notebooks in Colab**
    > **`access to Google Drive → click file (→ 'Open with Google Colaboratory'`**
    
    >**`Colab menu bar → 'File' → 'Open notebook...'`**
    
    You can open `.ipynb` notebooks that not only stored in your own Google Drive or by upload, but also stored in public GitHub repositories.

<h4>2) runtime setting</h4>
The term 'runtime' can be understood as virtual machine and computational resources, especially provided by Colab in our case. By using runtime, we can write and execute Jupyter Notebook document without additional installations of Python, Jupyter, and related packages. Runtime in Colab also provides computing resources such as CPU, RAM, even GPU with relatively high-performance.

Connecting runtime can simply be done with executing any code cell. And users are also be able to set runtime type before use. In this option, version of Python engine and hardware accelarator(CPU/GPU/TPU) are selectable.
* **To set up runtime type**
> **`Colab menu bar → 'Runtime' → 'Change runtime type'`**

You can use your own computational resources by the option `Connect to local runtime`, but it will not be covered in this session.

<h4>3) Looking into runtime environment</h4>
Runtime can roughly be expressed as personal computer. Therefore it has forder tree structures likewise PC. Code cells below are examples to check folder structure running on runtime environment.


`!pwd` is a command to check current folder, which is abbreviation of 'print working directory'.

`!ls` is a command to list folder trees, which is abbreviation of 'List'.

`!` is the character to specify that a certain command is 'shell command'. (More on shell command in available in [here](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html))

In [0]:
# 'pwd' shell command prints a path of current working directory(folder)
!pwd

/content


In [0]:
# 'ls' shell command outputs a list of files/folders in current directory
!ls 

sample_data


In [0]:
# '/' option means 'root' directory of runtime machine.
# note that 'content'(current working directory) is a subfolder of the root.
!ls /

bin	 datalab  home	 lib64	opt   run   swift		tmp    var
boot	 dev	  lib	 media	proc  sbin  sys			tools
content  etc	  lib32  mnt	root  srv   tensorflow-2.0.0b1	usr


Main shell commands are stored in `bin` folder.

In [0]:
# run this code cell to check shell commands stored in runtime
!ls /bin

You can access to the folder that Python(version 3) packages are installed via shell command with the path below. Packages installed with `!pip install` shell command are stored in this directory.

('`!pip install package_name`' command is to install external Python package, whose name is '`package_name`'. Most of the packages needed for data science / machine learning are pre-installed in Colab environment, you can use this shell command to install another package for certain purpose.)

In [0]:
# listing pre-installed Python(version 3) packages in Colab
!ls /usr/local/lib/python3.6/dist-packages

<h4>※ Specs of runtime provided by Colab (run code cells below)</h4>

In [0]:
# OS check
!cat /etc/issue.net

In [0]:
# CPU spec
!head /proc/cpuinfo

In [0]:
# Memory spec
!head -n 3 /proc/meminfo

In [0]:
# disk spec
!df -h

In [0]:
# GPU spec
# You have to change 'runtime hardware accelerator' into GPU to run this code
!nvidia-smi

## 1.4 Working with Google Drive on Colab


Colab is based on virtual environment, especially on Google Drive. So basically, you have to mount and access to Google Drive storage to handle notebooks and files.

Colab users usually upload most of the files in need on Google Drive storage. In here, we briefly check how to mount(connect) Google Drive on Colab and set directory to work on.

<h4>1) Mounting Google Drive as storage</h4>
To save and access to files stored in Google Drive, you have to mount Google Drive on Colab first (Google Drive is not automatically mounted with initiation). This is the simplest way to mount Google Drive.

* **To mount Google Drive**
> **`Open left panel(click` ![left_panel](https://github.com/teboozas/python_tutorial_for_data_science/blob/master/left_panel.png?raw=True) `button on the upper-left side of the screen) → 'Files' → 'MOUNT DRIVE'`**<br>**`→ execute followed code cell → click url → (select a Google account to use) → 'Allow' → copy code`**<br>**`→ paste code at the blank under the message 'Enter your authorization code:' → press Enter`**

Or, just execute the code cell below (exactly the same with the process above)

After finish this process and click 'REFRESH' button at the left panel, you can see that folder `drive` is newly created. This is your Google Drive storage, and you can directly import or edit notebooks stored in Google Drive via Colab.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

<h4>2) Changing working directory(folder)</h4>

If you want to work on specific directory(folder) in Google Drive, 'magic command' `%cd` is needed.

* **To change working directory**
> `%cd directory_path`

You can just run code cell below, after change `your_own_dir` to certain folder name or subpath.

In [0]:
%cd ./drive/'My Drive'/'your_own_dir'

※ *Magic commands are used when you have to control system in Python kernel, or to use other commands in external languages (JAVA, R, etc.).
<br>Some of the shell scripts are included in magic commands, and you have to place `%` symbol in front of magic commands.*

※ *Certification process for mounting drive is needed every time. This can be understood as a kind of log-in sequence.*

## Reference (Colaboratory)

* [Colab official introduction page](https://colab.research.google.com/notebooks/welcome.ipynb) - more details on Colab are described in this page.
* [Basic guides on Github style Markdown syntax](https://help.github.com/en/articles/basic-writing-and-formatting-syntax) - It might be useful if you are not familiar with Markdown.
* [additionals on magic commands](https://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb) - Explanations of magic commands provided in Jupyter Notebook (IPython environment)