# Achieving 100% reproducibility

**2023.03.21, 2023.03.28**

In this handsd-on session we will continue to use the test repository you just created and make it 100% reproducible with the help of [mybinder.org](https://mybinder.org/).

## Goals

1. Add some executable code pieces to your test repository, commit those changes, and push the change to Github
2. Binderize the repository (i.e., making it runnable on the cloud)

## [MyBinder.org](https://mybinder.org/)

![myBinder](https://mybinder.org/static/logo.svg)
By Project Jupyter Contributors, [BSD-3-Clause license](https://github.com/jupyterhub/binderhub/blob/main/LICENSE)

[MyBinder.org](https://mybinder.org/) is an online service that allows users to share their code with its computing environment. It reads the content and the information specifying the computing environment from a supported repository (e.g., GitHub), and then MyBinder initializes a server with the necessary pacakges installed. Finally, users can access the fully execuable *in-situ* content through the Jupyter-based GUI.

The technology behind MyBinder.org is called the **BinderHub**. You can find more details about it [here](https://mybinder.readthedocs.io/en/latest/introduction.html). Using the same technology, it is possible to create many independent online services that are similar to MyBinder.org.

BinderHub (e.g., MyBinder.org) itself focuses on building a [Docker image](https://jfrog.com/knowledge-base/a-beginners-guide-to-understanding-and-building-docker-images/) from a repository. BinderHub relies on JupyterHub (e.g., Callysto Hub) to create the pod instance that serves the Docker image. 

![BinderHub workflow](https://binderhub.readthedocs.io/en/latest/_images/architecture.png) 
By Project Jupyter Contributors, [BSD-3-Clause license](https://github.com/jupyterhub/binderhub/blob/main/LICENSE)

## Goal 1 procedure

[Callysto Hub portal](https://hub.callysto.ca/)

Go to the test repositry you created in the [previous session](H2-FAIRdata). Create a new file named `scipy_test.py` with the following content:

In [None]:
#### This sample code was auto-generated using Bing's ChatGPT AI (GPT-4).

import numpy as np
from scipy import optimize

# Define the function that we want to fit
def test_func(x, a, b):
    return a * np.sin(b * x)

# Generate some data with noise to fit
x_data = np.linspace(0, 4 * np.pi, 100)
y_data = 3.0 * np.sin(1.5 * x_data) + 0.5 * np.random.normal(size=100)

# Fit the data with the function
params, params_covariance = optimize.curve_fit(test_func, x_data, y_data, p0=[2, 2])

# Print the results
print(params)

You can try to execute the file using the terminal by typing `python scipt_test.py` and see how it goes. When this script runs, Python will try to load the numpy and scipy libraries, which are already installed on Callysto. You should see a successful re result printed on the screen.

Now create a new Jupyter Notebook file named `ipyleaflet.ipynb` with the following content:

In [None]:
#### This sample code was modified from ipyleaflet's demo examples.
from ipyleaflet import Map, basemaps, basemap_to_tiles, Marker

center = (24.9677, 121.1870)

m = Map(
    basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
    center=center,
    zoom=15
    )
marker = Marker(location=center, draggable=False)
m.add_layer(marker);
m

You can try to run this Notebook, but it will most likely report an error because the `ipyleaflet` package is not pre-installed on Callysto. We have to install it manually, and here's one of the many methods:

In [None]:
!pip install ipyleaflet

Now you can restart the kernel and run the Notebook again to see the correct output. (*Note: you will need to install ipyleaflet every time when you start a Callysto Hub session.*)

When you are done adding these files, you can commit the changes and push them to Github.

## Goal 2 procedure 

To binderize the repository, you need to add another file called `environment.yml` with the following content:

```
name: binder-test

channels:
  - conda-forge
  - defaults
dependencies:
  - numpy
  - scipy
  - ipyleaflet
```

This file follows the [**YAML**](https://en.wikipedia.org/wiki/YAML) format and tells **conda** (one of the package managers Binder supports) what packages to be installed. Based on the code we added above, we will need `numpy`, `scipy`, and `ipyleaflet`. We will also need Python to execute the code, but it is optional to list Python here because it will be installed as a dependency of all three packages.

Commit the change and push them to Github. Now you can go to [MyBinder.org](https://mybinder.org/) and enter the repository URL to initialize the server with all the reproducible content from your repository:

![Binder](figs/Binder-scr1.png)

MyBinder will take some time building the Docker image based on `environment.yml` and the other files in the repository. When it is finished, you should be able to see a JupyterLab (or Jupyter Notebook tree view) GUI with all the content (specifically `scipy_test.py` and `ipyleaflet.ipynb`) ready to be executed! 

## More resources

If you want to learn Python, there are tons of websites with a cloud executable environment so that you can code on your browser. In addition to [Callysto's learning modules](https://www.callysto.ca/learning-modules/), some of the executable-in-situ Python tutorials (free+paid plans) include but not limited to [Brilliant](https://brilliant.org/), [Codecademy](https://www.codecademy.com/), and [DataCamp](https://www.datacamp.com/). You might also want to check out the unfinished *Learning Python with Jupyter* book at https://learnpythonwithjupyter.com/ with downloadable Jupyter Notebook exercises. (*Disclaimer: I have no conflict of interest with any of the companies/organizations mentioned here.*)