# Create your data-science environment

In this exercice, we are asking you create your own data-science environment from scratch! At the end of it, you will be able to start a Jupyter Notebook, fit a model (or do whatever data-ish related activity you need to do) while saving your files in your computer.

We will provide you a guideline to achieve this, but never forget that there isn't only one way. Your container may be different from your neighbour and it is okay.

And last, but not least, this exercice is intended to make you practice Docker by creating your container from a basic image. Some of you will find easier to pull already existing images – and they would be right since they are really good images available all around. But, we ask you to follow the instructions in order to maximizing your learning of Docker.

## First part

In this first part, we are going to play a little bit with Docker.

1) Create a new folder with a `Dockerfile`.

2) Find a Python base image to start with on <a href="https://hub.docker.com/" target="_blank">Docker Hub</a>. Start filling your `Dockerfile` with the first instruction `FROM ...`.

You can start with version `3.9.0` or `3.8.0`.

3) Create a `requirements.txt` next to the `Dockerfile`. This simple text file can be used by `pip` (the Python package manager) to bulk install packages.

Here is an example of what you could put inside:

```text
jupyterlab
pandas
numpy
sklearn
matplotlib
seaborn
```

Be sure to put at least `jupyterlab`!

4) Update the `Dockerfile` so we install all the libraries you put inside `requirements.txt`. Then, build your container. Do not forget to give it a name, it is a good practice,

5) Run it. What happens? Nothing? It is normal! For now we haven't set any `CMD` inside the `Dockerfile` nor provided instructions with command line. Find how to run `/bin/bash` inside your container!

In order to avoid you some headache with the next questions, we strongly advise you to add the arguments `-p 8888:8888` which map port `8888` to `8888`. This port is actually used by Jupyter Lab to communicate with the client.

<details>
    <summary>
        Stucked?
    </summary>
    You can add the flags <code>-it</code>.
</details>

6) You must be in your container's terminal. Create a directory called `notebooks` and `cd` into it.

7) Finally let's run Jupyter notebook!

<details>
    <summary>
        Stucked?
    </summary>
    To do this, the base command is <code>jupyter notebook</code>. However if you run it directly like this, it won't work.
    <details>
        <summary>
            Stucked?
        </summary>
        You need to specify <code>--ip=*</code> flag as well as <code>--port=8888</code>.
        <details>
            <summary>
                Stucked?
            </summary>
            And finally add the flag <code>--allow-root</code>.
        </details>
    </details>
</details>

You should see an output like the following one:

```shell
[W 14:03:59.578 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 14:03:59.623 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 14:03:59.623 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 14:03:59.630 NotebookApp] Serving notebooks from local directory: /opt/notebooks
[I 14:03:59.632 NotebookApp] The Jupyter Notebook is running at:
[I 14:03:59.635 NotebookApp] http://ced8d85f8136:8888/?token=c59a323f085e6a889b398704e48948de9d752a4987afa721
[I 14:03:59.637 NotebookApp]  or http://127.0.0.1:8888/?token=c59a323f085e6a889b398704e48948de9d752a4987afa721
[I 14:03:59.639 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 14:03:59.651 NotebookApp] No web browser found: could not locate runnable browser.
[C 14:03:59.652 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///root/.local/share/jupyter/runtime/nbserver-44-open.html
    Or copy and paste one of these URLs:
        http://ced8d85f8136:8888/?token=c59a323f085e6a889b398704e48948de9d752a4987afa721
     or http://127.0.0.1:8888/?token=c59a323f085e6a889b398704e48948de9d752a4987afa721
```

Copy the second URL and paste it into your browser. It should open Jupyter Lab!

## Second part

Wouldn't be useful if you were able to start your Jupyter Lab running inside your Docker container in just one command line? Yes!

1) Update your `Dockerfile` in order to add the appropriates instructions.

<details>
    <summary>
        Stucked?
    </summary>
    You can use something like this:
    
    ```dockerfile
    FROM python:3.9.0

    COPY requirements.txt .

    RUN pip install -r requirements.txt

    WORKDIR /notebooks

    EXPOSE 8888

    CMD jupyter lab --ip=* --port=8888 --allow-root --no-browser
    ```
</details>

2) Build and run you newly updated image.

<details>
    <summary>
        Stucked?
    </summary>
    You still need to publish the port when running the container using <code>-p 8888:8888</code>.
</details>

3) You can play with it a bit. Once you are done playing, find a way to stop your container.

4) We have one thing left to do: create a volume so our notebooks are saved on our computer. To do so you can run your Docker with the parameter `-v`. Can you figure out how to fill this parameter?

<details>
    <summary>
        Stucked?
    </summary>
    The volume should be precise like this: <code>-v SOURCE:TARGET</code>. Where SOURCE is the folder you want to share from your computer and TARGET is the folder where you can access SOURCE from inside the container.
</details>

🎇 Congratulations! You have created a full data-science environment that you could run anywhere, even on a remote server if you would like to.

If you wish to continue here some suggestions for follow up:

- Push your `Dockerfile` to a Github repository where you store all your environments
- Add more libraries
- Put your container on a remote server