# PyWren RISECamp, 2017

Welcome to the hands-on tutorial for PyWren.

This tutorial consists of a set of exercises that will have you working directly with PyWren:
- basic exercises that introduce you to PyWren APIs (covered in this notebook)
- data analysis on a wikipedia dataset (see analyze-wikipedia.ipynb)
- matrix multiplication with PyWren (see matrix.ipynb)
- hyperparameter optimization (see hyperparameter-optimization.ipynb)




## Introduction to PyWren

(You can find solutions for this notebook at:
https://github.com/ucbrise/risecamp/tree/master/pywren/solution/pywren-intro-solution.ipynb)

For this tutorial, we have already installed PyWren in the docker container where this jupyter notebook is running.
PyWren provides a command line tool that provides basic functionalities for creating AWS IAM roles, configuring PyWren environment, deploying/updating Lambda functions, etc. We have also done that for you.

Before we go into the exercises, let's use the command line tool to test if PyWren works properly. 

**Run the cell below (select the cell, click Cell -> Run Cells or use Ctrl + Enter).**

If PyWren is correctly installed, you should see ***`function returned: Hello world`*** after a few seconds.

In [None]:
!pywren test_function

The above command essentially invokes a PyWren task that executes on AWS Lambda. The task simply returns `Hello world` back to our PyWren host. We'll show you how to do exactly that in a minute.

First and foremost, let's create a PyWren **Executor** that we will use throughput this notebook.

In [None]:
import pywren
pwex = pywren.default_executor()

## 1. call_async() -- PyWren's single invocation API

A PyWren Executor exposes two main APIs for remote execution, the first one being ***call_async()***, which runs a single PyWren task on AWS Lambda. `call_async()` takes two parameters: a user-provided function and a paramter for the function. 

Once called, it returns a ***future*** object that allows you to query the task status, get ***result()***, etc.

***Exercise:*** Modify the code below to get correct result from an execution.

In [None]:
# this is the user-defined function that we will pass to call_async()
def hello_world(param):
    if param == 42:
        return "Hello world!"

# once called, PyWren executes the user function with the parameter on AWS Lambda and returns a future object
future = pwex.call_async(hello_world, None)
# we can call result() to fetch output of execution
print(future.result())
# check if result is correct
assert future.result() == 'Hello world!'

## 2. map() -- parallel execution in the cloud
The above example executes a single function once in the cloud. This is pretty neat, but pywren really shines when we want to run functions multiple times in parallel.
To do this, we can use PyWren executor's second API: ***map()***. `map()` allows users to call a function over multiple parameters, just like the `map()` Python API.

In Python, the `map()` API applies the same function to each item in an iterable. The returned object can then be passed to `list()` or `set()` to obtain the results. For example: 

In [None]:
param_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def square(param):
    return param * param

results_with_python_map = map(square, param_list)
print(results_with_python_map)

PyWren Executor's `map()` API is not much different, except now the passed function runs on a cloud service.

***Exercise:*** Update the code below so the results with PyWren are same as above.

In [None]:
futures = pwex.map()
results_with_pywren_map = [f.result() for f in futures]

print (results_with_pywren_map)

assert results_with_pywren_map == results_with_python_map

One caveat above is that `result()` is run serially. This can be inefficient will a large number of parallel tasks. In PyWren, we provide a convenient API ***pywren.wait()*** to wait on all tasks to finish. For example: 

In [None]:
pywren.wait(futures)

results_with_pywren_map = [f.result() for f in futures]

assert results_with_pywren_map == results_with_python_map

Because the tasks in the futures have been executed before, the above code should finish immediately.

We also have ***pywren.get_all_results()***, which is just a convenient way to do `wait()` and fetch results all together.

In [None]:
results_with_pywren_map = pywren.get_all_results(futures)
assert results_with_pywren_map == results_with_python_map

## 3. multiple jobs

Putting things together, we can use `map()` to execute a function over an itereable of parameters in paralle in cloud.
Then we can call `pywren.get_all_results()` to fetch all results.
Because `map()` returns immediately after all tasks are invocated, we can switch to other work before calling `pywren.get_all_results()` and being blocked. We could even invoke another PyWren job.

In the exercise below, we want to verify the distributive law of matrix-vector multiplication, i.e. A(x+y) = Ax + Ay. To do that, we invoke two PyWren jobs, one computing 50 instances of A(x+y) and the other computing 50 instances of Ax + Ay. As we pass the same random seeds to the jobs, results returned by the two jobs should be same, according to the distributive law holds.

In [None]:
import numpy as np

def multiply_1(seed):
    np.random.seed(seed)
    A = np.random.normal(0, 1, (1024, 131072))
    x = np.random.normal(0, 1, 131072)
    y = np.random.normal(0, 1, 131072)
    return np.dot(A, x+y)

def multiply_2(seed):
    np.random.seed(seed)
    A = np.random.normal(0, 1, (1024, 131072))
    x = np.random.normal(0, 1, 131072)
    y = np.random.normal(0, 1, 131072)
    return np.dot(A, x) + np.dot(A, y)

futures_1 = pwex.map(multiply_1, range(50))
futures_2 = pwex.map(multiply_2, range(50))

results_1 = pywren.get_all_results(futures_1)
results_2 = pywren.get_all_results(futures_2)

assert np.all(np.isclose(np.stack(results_1) , np.stack(results_2)))

## 4. Visualization and Debugging
You have probably been wondering where time is spent during a PyWren job. Here we provide a method to plot the execution graph of a PyWren job for you. Let's use the futures from the maxtrix multiplication exercise as an example.

In [None]:
# load the plotting method
from training import plot_pywren_execution

In [None]:
plot_pywren_execution(futures_1 + futures_2)

You can see that the tasks are submitted in two batches. Each batch belongs to one PyWren job. You can also see that both jobs are indeed running in parallel! 

This concludes our introduction exercies. Now it is time to try out more challanges tasks with PyWren!