<div align="center">
  <img src="http://vlpavlov.org/Pythagoras-Logo3.svg"><br>
</div>

# Intoduction to Swarming: Using Pythagoras in P2P Mode

This tutorial explains the core Pythagoras constructs, which allow everyone to easily parallelize their Python programs and execute them in the cloud with just a few extra lines of code. It makes it possible to significantly speed up computationally expensive calculations.

Pythagoras supports many alternative deployment models. One of them is a P2P (***peer-to-peer***) deployment. It allows to parallelize program execution by simply launching the program simultaneously on a swarm of local computers (e.g. desktops and laptops in your office or dormitory). This tutorial explains how to use Pythagoras with P2P backend.

### Initial Setup

First, let's install and import Pythagoras:

In [1]:
!pip install pythagoras --quiet

In [2]:
from pythagoras import *

### Hello, P2P World! 

There is only one key class that we need to use in a P2P Pythagoras program:
    
* **SharedStorage_P2P_Cloud**: objects of this class are responsible for a virtual cloud; they allow to syncronize execution of multiple instances of your code via a shared folder (e.g. a Dropbox or NFS folder).

In [3]:
# my_project is a name of a shared folder. You can run the same program on multiple computers. 
# If they all have a shared folder (e.g. shared via Google Drive), 
# then concurrently running instances of your program 
# will be able to coordinate work with each other through this folder.

my_cloud = SharedStorage_P2P_Cloud(shared_dir_name = "my_project")

The **SharedStorage_P2P_Cloud.add_pure_function** decorator allows to register your function with a Pythagoras cloud. Once registerd, a fucntion gets a few new capabilities which we will discuss below.

Not every function can be added to Pythagoras cloud. There are 3 key requirements:
* a function must be [pure](https://en.wikipedia.org/wiki/Pure_function): fully deterministic, no side-effect function whose output value depends solidly on input values;
* a function must not throw exceptions;
* a function is only allowed to accept keyword parameters, positional parameters are forbidden.

In [4]:
@my_cloud.add_pure_function
def very_slow_function(*, important_parameter:int):
    """     >>>>>       THIS FUNCTION RUNS FOR ABOUT AN HOUR       <<<<<     """
    return important_parameter**2

There are several main benefits of turning your regular function into a cloud-hosted function. The most important of them: (1) Cloud-based memoization (caching), (2) Cloud-based remote execution, and (3) Cloud-based parallelization. Let's take a closer look at two the most important of them:

### Cloud-based Memoization

The first time we run a slow function with a specific combination of input arguments, Pythagoras will store the function output in a cache. The next time we we want to run a function with exactly the same input arguments, there will be no need to actually execute it, the output will be retrieved from the cache. 

The cache is cloud-based. It means, we can run the function once on any computer (either local or cloud-based) and then reuse the cached output on any other computer.

In [5]:
# The first execution is very slow: about an hour

very_slow_function( important_parameter=22 ) 

484

In [6]:
 # The second execution is very fast: a fraction of a second
    
very_slow_function( important_parameter=22 )

484

In [7]:
# If the function was executed on another computer with important_parameter=99 in the past,
# now the execution will be very fast
#
# However, if the function was never executed on this or another computer with important_parameter=99
# then this execution will be very slow. All subsequent executions with mportant_parameter=99
# will be fast

very_slow_function( important_parameter=99 ) 

9801

### Cloud-based Parallelization

Pythagoras makes it possible to seamlessly parallelize loops, which execute the same function with different combinations of input values.

In [8]:
# The first time we execute this code, it will take about 10 hours to run. 
# Of course, all the subsiquent executions will be very fast because of memoization.
# But what if we wanted to speed up even the very first execution? 

results = []

for i in range(10):
    results.append(   very_slow_function( important_parameter=i )   )
    
results   

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [9]:
# Here we are using list compehention to illustrate exactly the same scenario as above:
# The first execution will take 10 hours, subsiquent executions will be fast.
# But we do not want to wait 10 hours even the first time we run this code.

[   very_slow_function( important_parameter=i ) for i in range(100, 110)   ]

[10000, 10201, 10404, 10609, 10816, 11025, 11236, 11449, 11664, 11881]

In [10]:
# Pythagoras offers a solution. The example below shows how to 
# simultaneously launch multiple instances of a function in the cloud.
# Calculations will be done in parallel.

# Depending on how many computers you have in your swarm (P2P cloud), the executuon will take 
# less time even for the first run, when the outputs are not cached yet.
# For example, if you have 2 computers in your swarm, the code below will take about 5 hours to run instead of 10.
# If you have 4 computers in your swarm, the code below will take between 2 and3 hours to run.

very_slow_function.sync_parallel(   kw_args( important_parameter=i ) for i in range(2000,2010)   )

[4000000,
 4004001,
 4008004,
 4012009,
 4016016,
 4020025,
 4024036,
 4028049,
 4032064,
 4036081]

In [11]:
# Of course, all the outputs are stored in the cache. 
# When we run the same code for the second time (no matter if on this or on another computer),
# it will only take a fraction of a second to execute.

very_slow_function.sync_parallel(   kw_args( important_parameter=i ) for i in range(2000,2010)   )

[4000000,
 4004001,
 4008004,
 4012009,
 4016016,
 4020025,
 4024036,
 4028049,
 4032064,
 4036081]

Prefix *sync* in .sync_parallel(...) means that the remote execution is done in a synchronous way: local program waits till all remote functions fully complete and send back their results. The remote execution finishes, the outputs of all functions get back to the local computer, and only then execution flow on the local computer resumes.

Alternatively, it is possible to initiate parallel remote execution in an asynchronous way. There is a .async_parallel(...) construct for such scenario, which we will not be discussing in this tutorial.

### Deployment and Execution

To deploy and execute this notebook on a small group of computers, you need to follow a very simple process:
1. Use one of the popular file-sharing services to create a shared folder;
2. Make sure the file-sharing service agent is installed on all computers in your swarm, 
and the folder that you created is shared accross all the computers;
3. Make sure to point *shared_dir_name* parameter of the SharedStorage_P2P_Cloud constractor
to the actual name of the shared folder on each computer in the swarm.
4. Launch this notebook on each of the computers in the swarm, and select menu "Kernel / Restart & Run All"

That's it. All the instances of your code, running on different computers, will automatically syncronize and distribute work via the shared folder.

### Summary

By adding a simple decorator in front of your Python function, you can turn it into a serverless code. Another line of code replaces sequential loops with a parallel execution engine that distributes execution over a swarm of local computers. This is a very inexpencive yet efficient solution to speed up complex computational tasks, such as multi-fold cross-validation, grid search for hyperparameter optimization, or feature selection algorithms. 

Pythagoras caches function outputs. Such memoized functions run only once, all subsequent calls on any computer will skip function execution and return previously computed values. It makes complex distributed algorithms cheap to rerun, and easy to resume in case they were interrupted.

### Conclusion

Pythagoras democratizes access to serverless compute for data scientists and other engineers who need to use Python for computationally expensive calculations. It makes engineers' lives simpler, while allowing them to solve more complex problems faster and with smaller budgets.

Pythagoras supports many alternative deployment options. P2P deployment offers a zero-cost DIY virtual cloud 
that allows small teams to significantly speed up their work by using local desktops and laptops for distributed computing. 