<div align="center">
  <img src="http://vlpavlov.org/Pythagoras-Logo3.svg"><br>
</div>

# Pythagoras
## Introduction

This tutorial explains the core Pythagoras constructs, which allow everyone to easily parallelize their code and execute it in the cloud with just a few lines of extra code.

### Initial setup

First, let's install and import Pythagoras:

In [1]:
!pip install pythagoras --quiet

In [2]:
from pythagoras import *

### Hello, World! 

Two key classes we need to create a basic Pythagoras program:
    
* **ServerlessCloud**: objects of this class are responsible for actuall connection to the cloud (AWS, GCP, Azhure, etc.)
* **CloudModule**: objects of thios class are capable to store and execute in the cloud parallelized versions of your functions

In [3]:
my_cloud = ServerlessCloud("some parameters")
my_cloud_module = CloudModule(cloud=my_cloud)

In [4]:
@my_cloud_module.add
def very_slow_function(*,important_parameter:int):
    """This function runs for about an hour"""
    return important_parameter**2

@my_cloud_module.add
def another_slow_function(*,best_ever_parameter:int):
    """This function runs for about an hour"""
    return best_ever_parameter**3

Once we added all our slow functions to the module, we need to "finalize" it. This action will push all regestered to the cloud and use all the benefits of seamless access to serverless compute

In [5]:
my_cloud_module.finalize()

There are three main benefits of turning your regular function to a cloud-hosted function:
* Cloud-based memoization
* Cloud-based execution
* Cloud-based parallelization

Let's take a closer look:

### Cloud-based memoization

The first time we run a slow function with a specific combination of input values, Pythagoras will store the function output in a cache. The next time we we want to run a function with exactly the same input values, there will be no need to actually execute it, the output will be retriaved from the cache. 

The cache is cloud-based. It means, we can run the function once on any computer (either local or cloud-based) and then reuse the cached output on any other computer.

In [6]:
# First execution is very slow: over an hour
very_slow_function( important_parameter=2 ) 

4

In [7]:
 # Second execution execution is very fast: a fraction of a second
very_slow_function( important_parameter=2 )

4

In [8]:
# If the function was executed on another computer with important_parameter=10 in the past,
# now the execution will be very fast
#
# However, if the function was never executed on this or another computer with important_parameter=10
# then this execution will be very slow. However, all subsequent executions with mportant_parameter=10
# will be fast

very_slow_function( important_parameter=10 ) 

100

### Cloud-based execution

In [9]:
# When we are executing a function with new combination of input parameters, it will run locally 
very_slow_function( important_parameter=-8 )

64

In [10]:
# We can explicitly instract a function to be executed in the cloud
# If we have a slow computer, remote execution will be faster
very_slow_function.remote( important_parameter=12345 )

152399025

In [11]:
# If the output of the function for specific combination of inputs is available in the cache,
# no actual fucntion execution will happent. The output will be simple retrieved from the cache.

very_slow_function.remote( important_parameter=12345 )

152399025

In [12]:
# If the function was executed on local or remote computer with important_parameter=2 in the past,
# now the execution will be very fast
#
# However, if the function was never executed on this or another computer with important_parameter=2
# then this execution will be slow. However, all subsequent executions with mportant_parameter=2
# will be fast

In [13]:
very_slow_function.remote( important_parameter=2 )

4

### Cloud-based parallelization

In [14]:
results = []
for i in range(5):
    results.append(   very_slow_function( important_parameter=i )   )
results

[0, 1, 4, 9, 16]

In [15]:
[   very_slow_function( important_parameter=i ) for i in range(5)   ]

[0, 1, 4, 9, 16]

In [16]:
very_slow_function.parallel(   remote( important_parameter=i ) for i in range(5)   )

[4, 0, 16, 1, 9]

### Summary of key capabilities

By adding a simple decorator in front of your Python function, you can turn it into a serverless code that can run both locally and remotely. Another line of code replaces sequential loops with a parallel execution engine that simultaneously launches hundreds of serverless functions in the cloud. This is a perfect solution for complex computational tasks, such as multi-fold cross-validation, grid search for hyperparameter optimization, or feature selection algorithms.

For pure functions (fully deterministic, no side-effect functions whose output values depend solidly on input values), Pythagoras provides cloud storage to cache function outputs. Such memoized functions run only once, all subsequent calls on any computer will skip function execution and return previously computed values. It makes complex distributed algorithms cheap to rerun, and easy to resume in case they were interrupted.

Cloud storage is partially replicated on local computers, which allows Python scripts and notebooks to access stored values very fast. Each piece of data is associated with its hash that serves as a key to access the data. When some data (e.g., a large Pandas DataFrame) must be passed as an input to a serverless function, under the hood Pythagoras pushes the data to the cloud storage, and only passes its hash to the function. This approach optimizes traffic, associated with launching new instances of serverless functions in the cloud, and significantly speeds up the process.

The typical scenario of working with Pythagoras is to parallelize Python code using backend compute infrastructure, provided by a major cloud vendor. We are currently working on creating reference implementation for AWS, with plans to integrate with GCP and Azhure later. As an alternative, Pythagoras offers a simple P2P model, in which serverless code can be parallelized over a distributed swarm of workstations, on-premise servers, and even laptops. This model is a good solution for resource constrained teams and educational projects.