<table class="m01-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/yy/netsci-course/blob/master/m01-getready/Python%20Introduction%20Assignment.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a href="https://github.com/yy/netsci-course/blob/master/m01-getready/Python%20Introduction%20Assignment.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View on Github</a>
  </td>
</table>

# Setting up Python environments

## Why?

Setting up and maintaing Python environments can be painful! Python project has not been particularly good with dependency management systems. Python is easy to pick up and start coding, but it probably contributes to the messiness of the Python ecosystem. They are lots of users and usecases; for use cases where performance is vital (e.g., machine learning and scientific computing), because Python is a slow, interpreted language, people had to write performance-critical code in C/C++/Fortran and wrap them in Python. This is a good thing, but it makes dependency management even more complicated.

<img src="https://yyiki.s3.us-east-2.amazonaws.com/public/imgs/xkcd-my_python_environment_so_degraded.png" width="400px" />

(Python Environment, by Randall Munroe, https://xkcd.com/1987/)

Although learning how to set up and manage Python environments is not the most exciting thing, it is often a critical skill for a data scientist. You may be lucky enough to work with a dedicated DevOps team that takes care of this for you, or you may be in a situation where you can exclusively use cloud-based services like Google Colab, but you may also be in a situation where you have to set up your own environment, to use cutting-edge packages or due to the constraints of your organization! 

In such situations, it may not be how well you write code nor how well you understand your data and statistical methods, but it may well boil down to how well you can set up your environments, manage dependencies, and install critical packages. It can also help you learn how your computer and Python work under the hood. Therefore, I encourage you to learn how to set up and manage Python environments and packages. 

At the same time, struggling to learn how to manage Python environments on top of completing the weekly assignments may be too much, especially if you are new to programming. So, while I encourage everyone to learn the basics of Python environment management, it will not be a requirement. Please do feel free to use Google Colab or other cloud-based services.

## A general principle: use virtual environments!

First, this principle is not just for Python, but applicable for any programming language or software environments. 

Imagine a data scientist, Alice, who works on two very different projects. In one project, she is working on a machine learning model, which requires a package that is super cutting edge that depends on the most recent versions of other foundational packages (e.g., the lastest version of `numpy`). In another project, she is debugging and maintaining a legacy codebase that breaks if she uses a recent version of `numpy`. What should she do in this situation?




## Anaconda

- [Download Anaconda](https://www.anaconda.com/products/distribution)

Anaconda is one of the de-facto standard Python distributions for data science. It comes with a lot of packages pre-installed and allows you to easily install scientific packages that are not written in Python. 

It also comes with a package manager called `conda` that allows you to create multiple Python environments (with potentially different Python versions). 

`Conda` is  a nice Python distribution for data analysis and visualization. It is called [Anaconda][conda] and you can freely download and use. It is usually the easiest solution to install and maintain necessary Python packages for data analysis, regardless of your platform. Here is the download link:


After installing it, you can keep it updated by executing `conda`. 

```sh
conda update conda
conda update anaconda
```

With `conda`, you can also install many Python packages. For instance:

```sh
conda install pandas
```


1. If you have not already, download and install Anaconda (Python3) on your laptop. If you are more comfortable with installing Python pacakges by yourself with `pip`, you can certainly do that too. 
  
1. Run `jupyter notebook` from terminal. What do you see? Can you create a new notebook?

1. Run the "Hello world!" code below in your created notebook

```sh
print('Hello world!')
```

1. Read the following tutorials and run any code that you do not understand

* https://docs.python.org/3.7/tutorial/introduction.html 
* https://docs.python.org/3.7/tutorial/controlflow.html

1. Import `networkx`. 
 
```sh
import networkx as nx
```

It should already be installed with Anaconda, if it isn't then you will get an error message, after which you can install from the commandline with: 

   ```
   conda install networkx. 
   ```
   
If running the import command produces no output then you should be good to continue. 

1. Rename your notebook as this `pysetup_lastname_firstname` and submit to Canvas in the Python Setup assignment section.

# Installing Python for data analysis and visualization

## Anaconda


## Without Anaconda

If you use Mac or Linux and does not want to use Anaconda, you can install packages by using Python's `pip` package.  Install Python using either [homebrew][brew], [pyenv](https://github.com/pyenv/pyenv), or the [official Python download][python-download]. Use `pip` (or `pip3`) to install necessary packages. You can run

     pip3 install numpy scipy networkx jupyter jupyterlab ipython pandas matplotlib seaborn bokeh scikit-learn

to install most packages that you can use for data analysis and visualization.

## JupyterLab 

Once you have `Jupyter lab` (or notebook) (`Anaconda` creates a shortcut), you can simply run 

    jupyter lab 
or 

    jupyter notebook 

in the shell or use the launcher to launch ipython notebook. A browser window will appear and show the `IPython notebook` interface. From here, you can create your notebooks and load other notebooks.  

# Cloud options

There are also a variety of cloud-based options where you can run Jupyter notebook on the cloud. Probably the best option will be Google Colaboratory. It lets you use a virtual machine and a Jupyter notebook. With colaboratory, you'll less likely to suffer from dependency issues and you'll be able to work whenever you have an access to the web. 

## Google's colaboratory

- https://colab.research.google.com/notebook


[conda]: 
[python-download]: https://www.python.org/downloads/
[brew]: http://brew.sh/
[continuum]: https://anaconda.org/anaconda/continuum-docs
[wakari]: https://wakari.io
