# Big Data Platform
## Assignment 3: ServerLess

**The goal of this assignment is to:**
- Understand and practice the details of Serverless

**Instructions:**
- Students will form teams of two people each, and submit a single homework for each team.
- The same score for the homework will be given to each member of your team.
- Your solution is in the form of a Jupyter notebook file (with extension ipynb).
- Images/Graphs/Tables should be submitted inside the notebook.
- The notebook should be runnable and properly documented. 
- Please answer all the questions and include all your code.
- You are expected to submit a clear and pythonic code.
- You can change functions signatures/definitions.

**Submission:**
- Submission of the homework will be done via Moodle by uploading (not Zip):
    - Jupyter Notebook
    - 2 Log files
    - Additional local scripts
- The homework needs to be entirely in English.
- The deadline for submission is on Moodle.
- Late submission won't be allowed.

  
- In case of identical code submissions - both groups will get a Zero. 
- Some groups might be selected randomly to present their code.

**Requirements:**  
- Python 3.6 should be used.  
- You should implement the algorithms by yourself using only basic Python libraries (such as numpy,pandas,etc.)

<br><br><br><br>

**Grading:**
- Q0 - 10 points - Setup
- Q1 - 40 points - Serverless MapReduceEngine
- Q2 - 20 points - MapReduce job to calculate inverted index
- Q3 - 30 points - Shuffle

`Total: 100`

<br><br>

In [49]:
# !pip install --quiet zipfile36
# !pip install names
# !pip install numpy
# !pip install scipy
# !pip install pandas
# !pip install lithops
# !pip install ibm-cos-sdk

In [50]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

from lithops import FunctionExecutor
from lithops import Storage

# general
import os
import time
import logging
import threading
from threading import Thread
import random
import warnings
import threading # you can use easier threading packages

# ml
import numpy as np
import scipy as sp
import pandas as pd

# visual
# import seaborn as sns
# import matplotlib.pyplot as plt

# notebook
from IPython.display import display

#random last names
import names

#SQL
import sqlite3
from sqlite3 import Error

In [51]:
random.seed(123)

# Question 0
## Setup

1. Navigate to IBM Cloud and open a trial account. No need to provide a credit card
2. Choose IBM Cloud Object Storage service from the catalog
3. Create a new bucket in IBM Cloud Object Storage
4. Create credentials for the bucket with HMAC (access key and secret key)
5. Choose IBM Cloud Functions service from the catalog and create a service


#### Lithops setup
1. By using “git” tool, install master branch of the Lithops project from
https://github.com/lithops-cloud/lithops
2. Follow Lithops documentation and configure Lithops against IBM Cloud Functions and IBM Cloud Object Storage
3. Configure Lithops log level to be in DEBUG mode
4. Run Hello World example by using Futures API and verify all is working properly.


#### IBM Cloud Object Storage setup
1. Upload all the input CSV files that you used in homework 2 into the bucket you created in IBM Cloud Object Storage


<br><br><br>

In [18]:
def hello(name, number):
    return f'hello {name} {number}'


def test():
    with FunctionExecutor() as fexec:
        fut = fexec.call_async(hello, ('World', 1))
        print(fut.result())

In [4]:
test()

2022-01-08 18:51:57,210 [INFO] lithops.config -- Lithops v2.5.8
2022-01-08 18:51:57,211 [DEBUG] lithops.config -- Loading configuration from /Users/rludan/git/BigDataHW3/.lithops_config
2022-01-08 18:51:57,214 [DEBUG] lithops.config -- Loading Serverless backend module: ibm_cf
2022-01-08 18:51:57,260 [DEBUG] lithops.config -- Loading Storage backend module: ibm_cos
2022-01-08 18:51:57,333 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Creating IBM COS client
2022-01-08 18:51:57,334 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2022-01-08 18:51:57,335 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Using access_key and secret_key
2022-01-08 18:51:57,658 [INFO] lithops.storage.backends.ibm_cos.ibm_cos -- IBM COS client created - Region: eu-de
2022-01-08 18:51:57,659 [DEBUG] lithops.serverless.backends.ibm_cf.ibm_cf -- Creating IBM Cloud Functions client
2022-01-08 18:51:57,660 [DEBUG] lithops.ser

hello World 1


2022-01-08 18:52:02,191 [DEBUG] lithops.invokers -- ExecutorID 842866-0 - Async invoker 1 finished
2022-01-08 18:52:02,191 [DEBUG] lithops.invokers -- ExecutorID 842866-0 - Async invoker 0 finished


In [52]:
DB_FILE_NAME='mydb.db'
TEMP_FOLDER='./mapreducetemp'
FINAL_FOLDER='./mapreducefinal'
NUM_OF_RECORDS = 10
TEMP_RESULTS_TBL='temp_results'
BUCKET_NAME = "cloud-object-storage-8k-cos-standard-7nq"

# Question 1
## Serverless MapReduceEngine

Modify MapReduceEngine from homework 2 into the MapReduceServerlessEngine where map and reduce tasks executed as a serverless actions, instead of local threads. In particular:
1. Deploy all map tasks as a serverless actions by using Lithops against IBM Cloud Functions.
2. Collect results from all map tasks and store them in the same SQLite as you used in MapReduceEngine and use the same code for the sort and shuffle phase.
3. Deploy reduce tasks by using Lithops against IBM Cloud Functions. Instead of persisting results from reduce tasks, return results back to the MapReduceServerlessEngine and proceed with the same workflow as in MapReduceEngine
4. Return results of reduce tasks to the user

**Please attach:**  
Text file with all log messages Lithops printed to console during the execution. Make
sure log level is set to DEBUG mode.

#### Code:

In [35]:
input_data = []

def seeder(number):
    firstname = ['John', 'Dana', 'Scott', 'Marc', 'Steven', 'Michael', 'Albert', 'Johanna']
    city = ['NewYork', 'Haifa', 'Munchen', 'London', 'PaloAlto',  'TelAviv', 'Kiev', 'Hamburg']
    secondname = []
    for i in range(10):
        rand_name = names.get_last_name()
        secondname.append(rand_name)
    df = pd.DataFrame()
    df["firstname"] = np.random.choice(firstname, NUM_OF_RECORDS)
    df["secondname"] = np.random.choice(secondname, NUM_OF_RECORDS)
    df["city"] = np.random.choice(city, NUM_OF_RECORDS)
    #     df["id"] = df.index + 1
    curr_file_name = str('MyCSV%s.csv' % number)
    df.to_csv(curr_file_name, index=False)
    print("finished creating MyCSV%s.csv" % number)
    input_data.append('./'+ curr_file_name)
for i in range (20):
    seeder(i)

finished creating MyCSV0.csv
finished creating MyCSV1.csv
finished creating MyCSV2.csv
finished creating MyCSV3.csv
finished creating MyCSV4.csv
finished creating MyCSV5.csv
finished creating MyCSV6.csv
finished creating MyCSV7.csv
finished creating MyCSV8.csv
finished creating MyCSV9.csv
finished creating MyCSV10.csv
finished creating MyCSV11.csv
finished creating MyCSV12.csv
finished creating MyCSV13.csv
finished creating MyCSV14.csv
finished creating MyCSV15.csv
finished creating MyCSV16.csv
finished creating MyCSV17.csv
finished creating MyCSV18.csv
finished creating MyCSV19.csv


In [36]:
try:
    os.mkdir(TEMP_FOLDER)
    os.mkdir(FINAL_FOLDER)
except Exception as e:
    print(f"folder(s) already exist(s): {e}")

folder(s) already exist(s): [Errno 17] File exists: './mapreducetemp'


In [53]:
sql_create_temp_results_table = """CREATE TABLE IF NOT EXISTS temp_results (
                                    key text,
                                    value text
                                    ); """

In [54]:
sql_group_by_key = """SELECT key, GROUP_CONCAT(value)
                      FROM temp_results GROUP BY key ORDER BY (key);"""

In [55]:
sql_drop_all_tables = """DROP TABLE temp_results;"""

In [56]:
def drop_temp_tables(conn):
    try:
        c = conn.cursor()
        c.execute(sql_drop_all_tables)
    except Error as e:
        print(e)

In [57]:
def create_connection(db_file):
    """ create a database connection to a SQLite database """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        print(sqlite3.version)
    except Error as e:
        print(e)
    return conn

In [58]:
def create_table(conn, create_table_sql):
    """ create a table from the create_table_sql statement
    :param conn: Connection object
    :param create_table_sql: a CREATE TABLE statement
    :return:
    """
    try:
        c = conn.cursor()
        c.execute(create_table_sql)
    except Error as e:
        print(e)

In [59]:
def get_grouped_values(conn):
    cur = conn.cursor()
    cur.execute(sql_group_by_key)

    rows = cur.fetchall()

    return rows

In [67]:
class MapReduceServerlessEngine:
    def execute(self, input_data, map_function, reduce_function, params):
        curr_map = 0

        for map_document_path in input_data:
            with FunctionExecutor() as fexec:
                with open(map_document_path, 'r') as curr_file:
                    fut = fexec.call_async(func=map_function, data=(curr_file, params['column'], map_document_path))
                    map_result = fut.result()
                    print(map_result)

                    if map_result is not None:
                        map_result_df = pd.DataFrame(map_result, columns=["key", "value"])
                        map_result_df.to_csv(TEMP_FOLDER + '/part-tmp-%s.csv' % str(curr_map), index=False, header=True)
            curr_map += 1

        for temp_file_name in os.scandir(TEMP_FOLDER):
            csv_df = pd.read_csv(temp_file_name.path)
            csv_df.to_sql(TEMP_RESULTS_TBL, connection, if_exists='append', index=False)

        grouped_values = get_grouped_values(connection)

        curr_reduce = 0
        for reduce_value in grouped_values:
            with FunctionExecutor() as fexec:
                fut = fexec.call_async(reduce_function, (reduce_value[0], reduce_value[1]))
                reduce_result = fut.result()
                print(reduce_result)

                if reduce_result is not None:
                    result_df = pd.DataFrame(reduce_result, columns=["values"])
                    result_df.to_csv(FINAL_FOLDER + '/part-%s-final.csv' % str(curr_reduce), index=False, header=True)
            curr_reduce += 1

        return "MapReduce Completed"


In [68]:
def inverted_map(document_buffer, column_index, document_name):
    values = pd.read_csv(filepath_or_buffer = document_buffer, usecols=[column_index], skiprows=1)

    return [(x[0], document_name) for x in values.to_records(index=False)]

In [69]:
def inverted_reduce(value, documents):
    ret_val = [value]
    temp_set = set(documents.split(','))
    ret_val.extend(temp_set)

    return ret_val

In [70]:
connection = create_connection(DB_FILE_NAME)
create_table(connection, sql_create_temp_results_table)
if connection is not None:
    # create temp_results table
    create_table(connection, sql_create_temp_results_table)
else:
    print("Error! cannot create the database connection.")

2.6.0


In [71]:
mapreduce = MapReduceServerlessEngine()
status = mapreduce.execute(input_data, inverted_map, inverted_reduce, params={'column': 0})
print(status)

[('Dana', './MyCSV0.csv'), ('Steven', './MyCSV0.csv'), ('John', './MyCSV0.csv'), ('Scott', './MyCSV0.csv'), ('Albert', './MyCSV0.csv'), ('Johanna', './MyCSV0.csv'), ('Scott', './MyCSV0.csv'), ('Steven', './MyCSV0.csv'), ('Johanna', './MyCSV0.csv')]
[('Steven', './MyCSV1.csv'), ('Albert', './MyCSV1.csv'), ('Marc', './MyCSV1.csv'), ('Johanna', './MyCSV1.csv'), ('Albert', './MyCSV1.csv'), ('Albert', './MyCSV1.csv'), ('Albert', './MyCSV1.csv'), ('Scott', './MyCSV1.csv'), ('Scott', './MyCSV1.csv')]
[('Johanna', './MyCSV2.csv'), ('Albert', './MyCSV2.csv'), ('Steven', './MyCSV2.csv'), ('John', './MyCSV2.csv'), ('Michael', './MyCSV2.csv'), ('Albert', './MyCSV2.csv'), ('Steven', './MyCSV2.csv'), ('Johanna', './MyCSV2.csv'), ('Johanna', './MyCSV2.csv')]
[('Michael', './MyCSV3.csv'), ('Dana', './MyCSV3.csv'), ('Michael', './MyCSV3.csv'), ('Steven', './MyCSV3.csv'), ('Steven', './MyCSV3.csv'), ('Michael', './MyCSV3.csv'), ('Steven', './MyCSV3.csv'), ('John', './MyCSV3.csv'), ('John', './MyCSV3.csv

In [72]:
for file_name in os.scandir(TEMP_FOLDER):
    os.remove(file_name.path)
drop_temp_tables(connection)

# Task 2
## Submit MapReduce job to calculate inverted index
1. Use input_data: `cos://bucket/<path to CSV data>`
2. Submit MapReduce job with reduce and map functions as you used in homework 2, as follows

    `mapreduce = MapReduceServerlessEngine()`  
    `results = mapreduce.execute(input_data, inverted_map, inverted_index)`   
    `print(results)`

**Please attach:**  
Text file with all log messages Lithops printed to console during the execution. Make
sure log level is set to DEBUG mode.

#### Code:

In [34]:
# Delete all result files in the final folder
for file_name in os.scandir(FINAL_FOLDER):
    os.remove(file_name.path)

# Delete all .csv or .db files in the current directory
for file_name in os.scandir('.'):
    name, extension = os.path.splitext(file_name)
    if extension == '.csv' or extension == '.db':
        os.remove(file_name)

In [None]:
def seeder(number):
    firstname = ['John', 'Dana', 'Scott', 'Marc', 'Steven', 'Michael', 'Albert', 'Johanna']
    city = ['NewYork', 'Haifa', 'Munchen', 'London', 'PaloAlto',  'TelAviv', 'Kiev', 'Hamburg']
    secondname = []
    for i in range(10):
        rand_name = names.get_last_name()
        secondname.append(rand_name)
    df = pd.DataFrame()
    df["firstname"] = np.random.choice(firstname, NUM_OF_RECORDS)
    df["secondname"] = np.random.choice(secondname, NUM_OF_RECORDS)
    df["city"] = np.random.choice(city, NUM_OF_RECORDS)

    # Writing the generated DataFrame to a csv file
    df.to_csv('./MyCsv%s.csv' % number, index=False)

    abspath = os.path.abspath('./MyCsv%s.csv' % number)
    key_name = os.path.basename(abspath)

    # Uploading the csv file to the bucket
    upload_csv_to_bucket(abspath, key_name)


def create_cos_client():
    global cos
    # Constants for IBM COS values
    COS_ENDPOINT = "https://s3.eu-de.cloud-object-storage.appdomain.cloud"
    COS_API_KEY_ID = ""
    COS_INSTANCE_CRN = "crn:v1:bluemix:public:cloud-object-storage:global:a/a306dcbad13145068a4e3898eb5928f5:a155c1bc-bddb-4006-9826-1f72f5fe243b:bucket:cloud-object-storage-8k-cos-standard-7nq"
    # Create client
    cos = ibm_boto3.client("s3",
                           ibm_api_key_id=COS_API_KEY_ID,
                           ibm_service_instance_id=COS_INSTANCE_CRN,
                           endpoint_url=COS_ENDPOINT,
                           config=Config(signature_version="oauth")
                           )

create_cos_client()

for i in range (20):
    seeder(i)

In [5]:
def upload_csv_to_bucket(path, key_name):
    try:
        cos.upload_file(Filename=path, Bucket=("%s" % BUCKET_NAME), Key='input/' + str(key_name))
        print(f"Uploaded: {path}")
    except Exception as e:
        print("Unable to create text file: {0}".format(e))

Uploaded: /Users/rludan/git/BigDataHW3/MyCsv0.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv1.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv2.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv3.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv4.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv5.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv6.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv7.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv8.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv9.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv10.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv11.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv12.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv13.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv14.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv15.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv16.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv17.csv
Uploaded: /Users/rludan/git/BigDataHW3/MyCsv18.csv
Uploaded: /Users/rludan/git/BigDataHW3/My

In [103]:
try:
    os.mkdir(TEMP_FOLDER)
    os.mkdir(FINAL_FOLDER)
except Exception as e:
    print(f"folder(s) already exist(s): {e}")

In [87]:
sql_create_temp_results_table = """CREATE TABLE IF NOT EXISTS temp_results (
                                    key text,
                                    value text
                                    ); """

In [88]:
sql_group_by_key = """SELECT key, GROUP_CONCAT(value)
                      FROM temp_results GROUP BY key ORDER BY (key);"""

In [89]:
sql_drop_all_tables = """DROP TABLE temp_results;"""

In [90]:
def drop_temp_tables(conn):
    try:
        c = conn.cursor()
        c.execute(sql_drop_all_tables)
    except Error as e:
        print(e)

In [91]:
def create_connection(db_file):
    """ create a database connection to a SQLite database """
    conn = None
    try:
        conn = sqlite3.connect(db_file)
        print(sqlite3.version)
    except Error as e:
        print(e)
    return conn

In [92]:
def create_table(conn, create_table_sql):
    """ create a table from the create_table_sql statement
    :param conn: Connection object
    :param create_table_sql: a CREATE TABLE statement
    :return:
    """
    try:
        c = conn.cursor()
        c.execute(create_table_sql)
    except Error as e:
        print(e)

In [93]:
def get_grouped_values(conn):
    cur = conn.cursor()
    cur.execute(sql_group_by_key)

    rows = cur.fetchall()

    return rows

In [97]:
class MapReduceServerlessEngine():
    def execute(self, input_data, map_function, reduce_function, params):
        curr_map = 0

        for input_key in input_data:
            with FunctionExecutor() as fexec:
                fut = fexec.call_async(func=map_function, data=(input_key, params['column'])) # {"key":key, "col":0}
                map_result = fut.result()
                print(map_result)

                if map_result is not None:
                    map_result_df = pd.DataFrame(map_result, columns=["key", "value"])
                    map_result_df.to_csv(TEMP_FOLDER + '/part-tmp-%s.csv' % str(curr_map), index=False, header=True)
            curr_map += 1

        for temp_file_name in os.scandir(TEMP_FOLDER):
            csv_df = pd.read_csv(temp_file_name.path)
            csv_df.to_sql(TEMP_RESULTS_TBL, connection, if_exists='append', index=False)

        grouped_values = get_grouped_values(connection)

        curr_reduce = 0
        for reduce_value in grouped_values:
            with FunctionExecutor() as fexec:
                fut = fexec.call_async(reduce_function, (reduce_value[0], reduce_value[1]))
                reduce_result = fut.result()
                print(reduce_result)

                if reduce_result is not None:
                    result_df = pd.DataFrame(reduce_result, columns=["values"])
                    result_df.to_csv(FINAL_FOLDER + '/part-%s-final.csv' % curr_reduce, index=False, header=True)
            curr_reduce += 1

        return "MapReduce Completed"

In [98]:
def inverted_map(key, col):
    storage = Storage()
    buffer = storage.get_object(BUCKET_NAME, key, stream=True)

    values = pd.read_csv(filepath_or_buffer = buffer, usecols=[col], skiprows=1)

    return [(x[0], key) for x in values.to_records(index=False)]

In [99]:
def inverted_reduce(value, documents):
    ret_val = [value]
    temp_set = set(documents.split(','))
    ret_val.extend(temp_set)

    return ret_val

In [100]:
storage = Storage()
input_data =  storage.list_keys(BUCKET_NAME, prefix='input/')

print(input_data)

2022-01-10 01:04:37,851 [DEBUG] lithops.config -- Loading configuration from /Users/rludan/git/BigDataHW3/.lithops_config
2022-01-10 01:04:37,860 [DEBUG] lithops.config -- Loading Storage backend module: ibm_cos
2022-01-10 01:04:37,861 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Creating IBM COS client
2022-01-10 01:04:37,861 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2022-01-10 01:04:37,862 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Using access_key and secret_key
2022-01-10 01:04:37,872 [INFO] lithops.storage.backends.ibm_cos.ibm_cos -- IBM COS client created - Region: eu-de


['input/MyCsv0.csv', 'input/MyCsv1.csv', 'input/MyCsv10.csv', 'input/MyCsv11.csv', 'input/MyCsv12.csv', 'input/MyCsv13.csv', 'input/MyCsv14.csv', 'input/MyCsv15.csv', 'input/MyCsv16.csv', 'input/MyCsv17.csv', 'input/MyCsv18.csv', 'input/MyCsv19.csv', 'input/MyCsv2.csv', 'input/MyCsv3.csv', 'input/MyCsv4.csv', 'input/MyCsv5.csv', 'input/MyCsv6.csv', 'input/MyCsv7.csv', 'input/MyCsv8.csv', 'input/MyCsv9.csv']


In [111]:
connection = create_connection(DB_FILE_NAME)
create_table(connection, sql_create_temp_results_table)

2.6.0


In [112]:
if connection is not None:
    # create temp_results table
    create_table(connection, sql_create_temp_results_table)
else:
    print("Error! cannot create the database connection.")

In [113]:
mapreduce = MapReduceServerlessEngine()
status = mapreduce.execute(input_data, inverted_map, inverted_reduce, params={'column':0})
print(status)

2022-01-10 01:24:54,599 [INFO] lithops.config -- Lithops v2.5.8
2022-01-10 01:24:54,601 [DEBUG] lithops.config -- Loading configuration from /Users/rludan/git/BigDataHW3/.lithops_config
2022-01-10 01:24:54,609 [DEBUG] lithops.config -- Loading Serverless backend module: ibm_cf
2022-01-10 01:24:54,610 [DEBUG] lithops.config -- Loading Storage backend module: ibm_cos
2022-01-10 01:24:54,610 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Creating IBM COS client
2022-01-10 01:24:54,611 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2022-01-10 01:24:54,612 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- Using access_key and secret_key
2022-01-10 01:24:54,620 [INFO] lithops.storage.backends.ibm_cos.ibm_cos -- IBM COS client created - Region: eu-de
2022-01-10 01:24:54,621 [DEBUG] lithops.serverless.backends.ibm_cf.ibm_cf -- Creating IBM Cloud Functions client
2022-01-10 01:24:54,622 [DEBUG] lithops.ser

[('Michael', 'input/MyCsv0.csv'), ('Steven', 'input/MyCsv0.csv'), ('Albert', 'input/MyCsv0.csv'), ('Johanna', 'input/MyCsv0.csv'), ('Marc', 'input/MyCsv0.csv'), ('Dana', 'input/MyCsv0.csv'), ('Marc', 'input/MyCsv0.csv'), ('Steven', 'input/MyCsv0.csv'), ('Steven', 'input/MyCsv0.csv')]


2022-01-10 01:24:57,223 [DEBUG] lithops.monitor -- ExecutorID 9092d4-329 - Storage job monitor finished
2022-01-10 01:24:57,606 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-330/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:24:57,607 [DEBUG] lithops.job.job -- ExecutorID 9092d4-330 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:24:57,607 [INFO] lithops.invokers -- ExecutorID 9092d4-330 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:24:57,608 [DEBUG] lithops.invokers -- ExecutorID 9092d4-330 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:24:57,610 [DEBUG] lithops.invokers -- ExecutorID 9092d4-330 - Async invoker 0 started
2022-01-10 01:24:57,611 [DEBUG] lithops.invokers -- ExecutorID 9092d4-330 - Async invoker 1 started
2022-01-10 01:24:57,611 [DEBUG] lithops.invokers -- ExecutorID 9092d4-330 | Job

[('Johanna', 'input/MyCsv1.csv'), ('Johanna', 'input/MyCsv1.csv'), ('Marc', 'input/MyCsv1.csv'), ('John', 'input/MyCsv1.csv'), ('Michael', 'input/MyCsv1.csv'), ('Michael', 'input/MyCsv1.csv'), ('Michael', 'input/MyCsv1.csv'), ('Marc', 'input/MyCsv1.csv'), ('Johanna', 'input/MyCsv1.csv')]


2022-01-10 01:25:01,084 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-331/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:01,085 [DEBUG] lithops.job.job -- ExecutorID 9092d4-331 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:01,085 [INFO] lithops.invokers -- ExecutorID 9092d4-331 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:01,086 [DEBUG] lithops.invokers -- ExecutorID 9092d4-331 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:01,087 [DEBUG] lithops.invokers -- ExecutorID 9092d4-331 - Async invoker 0 started
2022-01-10 01:25:01,088 [DEBUG] lithops.invokers -- ExecutorID 9092d4-331 - Async invoker 1 started
2022-01-10 01:25:01,088 [DEBUG] lithops.invokers -- ExecutorID 9092d4-331 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:01,092 [INFO] li

[('Scott', 'input/MyCsv10.csv'), ('Michael', 'input/MyCsv10.csv'), ('Scott', 'input/MyCsv10.csv'), ('Scott', 'input/MyCsv10.csv'), ('John', 'input/MyCsv10.csv'), ('Johanna', 'input/MyCsv10.csv'), ('Marc', 'input/MyCsv10.csv'), ('Michael', 'input/MyCsv10.csv'), ('Michael', 'input/MyCsv10.csv')]


2022-01-10 01:25:04,511 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-332/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:04,512 [DEBUG] lithops.job.job -- ExecutorID 9092d4-332 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:04,513 [INFO] lithops.invokers -- ExecutorID 9092d4-332 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:04,514 [DEBUG] lithops.invokers -- ExecutorID 9092d4-332 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:04,515 [DEBUG] lithops.invokers -- ExecutorID 9092d4-332 - Async invoker 0 started
2022-01-10 01:25:04,516 [DEBUG] lithops.invokers -- ExecutorID 9092d4-332 - Async invoker 1 started
2022-01-10 01:25:04,516 [DEBUG] lithops.invokers -- ExecutorID 9092d4-332 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:04,519 [INFO] li

[('Albert', 'input/MyCsv11.csv'), ('John', 'input/MyCsv11.csv'), ('Albert', 'input/MyCsv11.csv'), ('Scott', 'input/MyCsv11.csv'), ('Steven', 'input/MyCsv11.csv'), ('Albert', 'input/MyCsv11.csv'), ('Michael', 'input/MyCsv11.csv'), ('Albert', 'input/MyCsv11.csv'), ('Albert', 'input/MyCsv11.csv')]


2022-01-10 01:25:07,884 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-333/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:07,885 [DEBUG] lithops.job.job -- ExecutorID 9092d4-333 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:07,886 [INFO] lithops.invokers -- ExecutorID 9092d4-333 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:07,886 [DEBUG] lithops.invokers -- ExecutorID 9092d4-333 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:07,887 [DEBUG] lithops.invokers -- ExecutorID 9092d4-333 - Async invoker 0 started
2022-01-10 01:25:07,888 [DEBUG] lithops.invokers -- ExecutorID 9092d4-333 - Async invoker 1 started
2022-01-10 01:25:07,889 [DEBUG] lithops.invokers -- ExecutorID 9092d4-333 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:07,891 [INFO] li

[('Steven', 'input/MyCsv12.csv'), ('Marc', 'input/MyCsv12.csv'), ('John', 'input/MyCsv12.csv'), ('Dana', 'input/MyCsv12.csv'), ('John', 'input/MyCsv12.csv'), ('Dana', 'input/MyCsv12.csv'), ('Scott', 'input/MyCsv12.csv'), ('Scott', 'input/MyCsv12.csv'), ('Albert', 'input/MyCsv12.csv')]


2022-01-10 01:25:09,898 [DEBUG] lithops.monitor -- ExecutorID 9092d4-333 - Storage job monitor finished
2022-01-10 01:25:10,139 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-334/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:10,140 [DEBUG] lithops.job.job -- ExecutorID 9092d4-334 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:10,140 [INFO] lithops.invokers -- ExecutorID 9092d4-334 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:10,141 [DEBUG] lithops.invokers -- ExecutorID 9092d4-334 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:10,142 [DEBUG] lithops.invokers -- ExecutorID 9092d4-334 - Async invoker 0 started
2022-01-10 01:25:10,143 [DEBUG] lithops.invokers -- ExecutorID 9092d4-334 - Async invoker 1 started
2022-01-10 01:25:10,143 [DEBUG] lithops.invokers -- ExecutorID 9092d4-334 | Job

[('Scott', 'input/MyCsv13.csv'), ('Dana', 'input/MyCsv13.csv'), ('Albert', 'input/MyCsv13.csv'), ('Marc', 'input/MyCsv13.csv'), ('John', 'input/MyCsv13.csv'), ('Dana', 'input/MyCsv13.csv'), ('Johanna', 'input/MyCsv13.csv'), ('Dana', 'input/MyCsv13.csv'), ('Marc', 'input/MyCsv13.csv')]


2022-01-10 01:25:13,788 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-335/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:13,788 [DEBUG] lithops.job.job -- ExecutorID 9092d4-335 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:13,789 [INFO] lithops.invokers -- ExecutorID 9092d4-335 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:13,790 [DEBUG] lithops.invokers -- ExecutorID 9092d4-335 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:13,791 [DEBUG] lithops.invokers -- ExecutorID 9092d4-335 - Async invoker 0 started
2022-01-10 01:25:13,792 [DEBUG] lithops.invokers -- ExecutorID 9092d4-335 - Async invoker 1 started
2022-01-10 01:25:13,792 [DEBUG] lithops.invokers -- ExecutorID 9092d4-335 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:13,796 [INFO] li

[('Michael', 'input/MyCsv14.csv'), ('Marc', 'input/MyCsv14.csv'), ('John', 'input/MyCsv14.csv'), ('Marc', 'input/MyCsv14.csv'), ('Scott', 'input/MyCsv14.csv'), ('Scott', 'input/MyCsv14.csv'), ('Michael', 'input/MyCsv14.csv'), ('Albert', 'input/MyCsv14.csv'), ('Marc', 'input/MyCsv14.csv')]


2022-01-10 01:25:15,803 [DEBUG] lithops.monitor -- ExecutorID 9092d4-335 - Storage job monitor finished
2022-01-10 01:25:16,047 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-336/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:16,047 [DEBUG] lithops.job.job -- ExecutorID 9092d4-336 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:16,048 [INFO] lithops.invokers -- ExecutorID 9092d4-336 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:16,049 [DEBUG] lithops.invokers -- ExecutorID 9092d4-336 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:16,050 [DEBUG] lithops.invokers -- ExecutorID 9092d4-336 - Async invoker 0 started
2022-01-10 01:25:16,051 [DEBUG] lithops.invokers -- ExecutorID 9092d4-336 - Async invoker 1 started
2022-01-10 01:25:16,051 [DEBUG] lithops.invokers -- ExecutorID 9092d4-336 | Job

[('Michael', 'input/MyCsv15.csv'), ('Dana', 'input/MyCsv15.csv'), ('Michael', 'input/MyCsv15.csv'), ('Marc', 'input/MyCsv15.csv'), ('Albert', 'input/MyCsv15.csv'), ('Scott', 'input/MyCsv15.csv'), ('Steven', 'input/MyCsv15.csv'), ('Dana', 'input/MyCsv15.csv'), ('Dana', 'input/MyCsv15.csv')]


2022-01-10 01:25:19,465 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-337/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:19,466 [DEBUG] lithops.job.job -- ExecutorID 9092d4-337 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:19,466 [INFO] lithops.invokers -- ExecutorID 9092d4-337 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:19,467 [DEBUG] lithops.invokers -- ExecutorID 9092d4-337 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:19,468 [DEBUG] lithops.invokers -- ExecutorID 9092d4-337 - Async invoker 0 started
2022-01-10 01:25:19,469 [DEBUG] lithops.invokers -- ExecutorID 9092d4-337 - Async invoker 1 started
2022-01-10 01:25:19,469 [DEBUG] lithops.invokers -- ExecutorID 9092d4-337 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:19,472 [INFO] li

[('Johanna', 'input/MyCsv16.csv'), ('Steven', 'input/MyCsv16.csv'), ('Michael', 'input/MyCsv16.csv'), ('Michael', 'input/MyCsv16.csv'), ('Scott', 'input/MyCsv16.csv'), ('John', 'input/MyCsv16.csv'), ('Marc', 'input/MyCsv16.csv'), ('John', 'input/MyCsv16.csv'), ('Albert', 'input/MyCsv16.csv')]


2022-01-10 01:25:21,477 [DEBUG] lithops.monitor -- ExecutorID 9092d4-337 - Storage job monitor finished
2022-01-10 01:25:21,853 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-338/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:21,854 [DEBUG] lithops.job.job -- ExecutorID 9092d4-338 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:21,854 [INFO] lithops.invokers -- ExecutorID 9092d4-338 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:21,855 [DEBUG] lithops.invokers -- ExecutorID 9092d4-338 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:21,856 [DEBUG] lithops.invokers -- ExecutorID 9092d4-338 - Async invoker 0 started
2022-01-10 01:25:21,857 [DEBUG] lithops.invokers -- ExecutorID 9092d4-338 - Async invoker 1 started
2022-01-10 01:25:21,857 [DEBUG] lithops.invokers -- ExecutorID 9092d4-338 | Job

[('Albert', 'input/MyCsv17.csv'), ('Johanna', 'input/MyCsv17.csv'), ('Dana', 'input/MyCsv17.csv'), ('Scott', 'input/MyCsv17.csv'), ('Michael', 'input/MyCsv17.csv'), ('John', 'input/MyCsv17.csv'), ('Scott', 'input/MyCsv17.csv'), ('John', 'input/MyCsv17.csv'), ('Steven', 'input/MyCsv17.csv')]


2022-01-10 01:25:25,347 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-339/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:25,348 [DEBUG] lithops.job.job -- ExecutorID 9092d4-339 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:25,348 [INFO] lithops.invokers -- ExecutorID 9092d4-339 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:25,349 [DEBUG] lithops.invokers -- ExecutorID 9092d4-339 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:25,350 [DEBUG] lithops.invokers -- ExecutorID 9092d4-339 - Async invoker 0 started
2022-01-10 01:25:25,351 [DEBUG] lithops.invokers -- ExecutorID 9092d4-339 - Async invoker 1 started
2022-01-10 01:25:25,351 [DEBUG] lithops.invokers -- ExecutorID 9092d4-339 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:25,354 [INFO] li

[('John', 'input/MyCsv18.csv'), ('Steven', 'input/MyCsv18.csv'), ('John', 'input/MyCsv18.csv'), ('Steven', 'input/MyCsv18.csv'), ('Dana', 'input/MyCsv18.csv'), ('Albert', 'input/MyCsv18.csv'), ('John', 'input/MyCsv18.csv'), ('John', 'input/MyCsv18.csv'), ('Michael', 'input/MyCsv18.csv')]


2022-01-10 01:25:28,678 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-340/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:28,679 [DEBUG] lithops.job.job -- ExecutorID 9092d4-340 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:28,679 [INFO] lithops.invokers -- ExecutorID 9092d4-340 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:28,680 [DEBUG] lithops.invokers -- ExecutorID 9092d4-340 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:28,681 [DEBUG] lithops.invokers -- ExecutorID 9092d4-340 - Async invoker 0 started
2022-01-10 01:25:28,682 [DEBUG] lithops.invokers -- ExecutorID 9092d4-340 - Async invoker 1 started
2022-01-10 01:25:28,682 [DEBUG] lithops.invokers -- ExecutorID 9092d4-340 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:28,685 [INFO] li

[('Steven', 'input/MyCsv19.csv'), ('Steven', 'input/MyCsv19.csv'), ('John', 'input/MyCsv19.csv'), ('John', 'input/MyCsv19.csv'), ('John', 'input/MyCsv19.csv'), ('Scott', 'input/MyCsv19.csv'), ('Albert', 'input/MyCsv19.csv'), ('John', 'input/MyCsv19.csv'), ('Marc', 'input/MyCsv19.csv')]


2022-01-10 01:25:30,693 [DEBUG] lithops.monitor -- ExecutorID 9092d4-340 - Storage job monitor finished
2022-01-10 01:25:30,855 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-341/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:30,855 [DEBUG] lithops.job.job -- ExecutorID 9092d4-341 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:30,856 [INFO] lithops.invokers -- ExecutorID 9092d4-341 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:30,856 [DEBUG] lithops.invokers -- ExecutorID 9092d4-341 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:30,857 [DEBUG] lithops.invokers -- ExecutorID 9092d4-341 - Async invoker 0 started
2022-01-10 01:25:30,858 [DEBUG] lithops.invokers -- ExecutorID 9092d4-341 - Async invoker 1 started
2022-01-10 01:25:30,859 [DEBUG] lithops.invokers -- ExecutorID 9092d4-341 | Job

[('Albert', 'input/MyCsv2.csv'), ('Marc', 'input/MyCsv2.csv'), ('Albert', 'input/MyCsv2.csv'), ('Johanna', 'input/MyCsv2.csv'), ('Scott', 'input/MyCsv2.csv'), ('Johanna', 'input/MyCsv2.csv'), ('John', 'input/MyCsv2.csv'), ('John', 'input/MyCsv2.csv'), ('Michael', 'input/MyCsv2.csv')]


2022-01-10 01:25:34,287 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-342/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:34,288 [DEBUG] lithops.job.job -- ExecutorID 9092d4-342 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:34,289 [INFO] lithops.invokers -- ExecutorID 9092d4-342 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:34,290 [DEBUG] lithops.invokers -- ExecutorID 9092d4-342 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:34,291 [DEBUG] lithops.invokers -- ExecutorID 9092d4-342 - Async invoker 0 started
2022-01-10 01:25:34,292 [DEBUG] lithops.invokers -- ExecutorID 9092d4-342 - Async invoker 1 started
2022-01-10 01:25:34,293 [DEBUG] lithops.invokers -- ExecutorID 9092d4-342 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:34,296 [INFO] li

[('Johanna', 'input/MyCsv3.csv'), ('John', 'input/MyCsv3.csv'), ('Johanna', 'input/MyCsv3.csv'), ('Michael', 'input/MyCsv3.csv'), ('Marc', 'input/MyCsv3.csv'), ('Scott', 'input/MyCsv3.csv'), ('Dana', 'input/MyCsv3.csv'), ('Johanna', 'input/MyCsv3.csv'), ('Scott', 'input/MyCsv3.csv')]


2022-01-10 01:25:36,304 [DEBUG] lithops.monitor -- ExecutorID 9092d4-342 - Storage job monitor finished
2022-01-10 01:25:36,800 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-343/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:36,801 [DEBUG] lithops.job.job -- ExecutorID 9092d4-343 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:36,801 [INFO] lithops.invokers -- ExecutorID 9092d4-343 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:36,802 [DEBUG] lithops.invokers -- ExecutorID 9092d4-343 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:36,803 [DEBUG] lithops.invokers -- ExecutorID 9092d4-343 - Async invoker 0 started
2022-01-10 01:25:36,804 [DEBUG] lithops.invokers -- ExecutorID 9092d4-343 - Async invoker 1 started
2022-01-10 01:25:36,804 [DEBUG] lithops.invokers -- ExecutorID 9092d4-343 | Job

[('Scott', 'input/MyCsv4.csv'), ('Johanna', 'input/MyCsv4.csv'), ('Johanna', 'input/MyCsv4.csv'), ('Steven', 'input/MyCsv4.csv'), ('John', 'input/MyCsv4.csv'), ('Steven', 'input/MyCsv4.csv'), ('Johanna', 'input/MyCsv4.csv'), ('Albert', 'input/MyCsv4.csv'), ('Dana', 'input/MyCsv4.csv')]


2022-01-10 01:25:40,254 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-344/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:40,255 [DEBUG] lithops.job.job -- ExecutorID 9092d4-344 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:40,255 [INFO] lithops.invokers -- ExecutorID 9092d4-344 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:40,256 [DEBUG] lithops.invokers -- ExecutorID 9092d4-344 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:40,258 [DEBUG] lithops.invokers -- ExecutorID 9092d4-344 - Async invoker 0 started
2022-01-10 01:25:40,259 [DEBUG] lithops.invokers -- ExecutorID 9092d4-344 - Async invoker 1 started
2022-01-10 01:25:40,260 [DEBUG] lithops.invokers -- ExecutorID 9092d4-344 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:40,263 [INFO] li

[('Dana', 'input/MyCsv5.csv'), ('Albert', 'input/MyCsv5.csv'), ('Scott', 'input/MyCsv5.csv'), ('John', 'input/MyCsv5.csv'), ('Johanna', 'input/MyCsv5.csv'), ('Steven', 'input/MyCsv5.csv'), ('Johanna', 'input/MyCsv5.csv'), ('Steven', 'input/MyCsv5.csv'), ('Scott', 'input/MyCsv5.csv')]


2022-01-10 01:25:42,267 [DEBUG] lithops.monitor -- ExecutorID 9092d4-344 - Storage job monitor finished
2022-01-10 01:25:42,759 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-345/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:42,761 [DEBUG] lithops.job.job -- ExecutorID 9092d4-345 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:42,761 [INFO] lithops.invokers -- ExecutorID 9092d4-345 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:42,762 [DEBUG] lithops.invokers -- ExecutorID 9092d4-345 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:42,764 [DEBUG] lithops.invokers -- ExecutorID 9092d4-345 - Async invoker 0 started
2022-01-10 01:25:42,765 [DEBUG] lithops.invokers -- ExecutorID 9092d4-345 - Async invoker 1 started
2022-01-10 01:25:42,765 [DEBUG] lithops.invokers -- ExecutorID 9092d4-345 | Job

[('Michael', 'input/MyCsv6.csv'), ('Dana', 'input/MyCsv6.csv'), ('Marc', 'input/MyCsv6.csv'), ('Marc', 'input/MyCsv6.csv'), ('Steven', 'input/MyCsv6.csv'), ('Dana', 'input/MyCsv6.csv'), ('Michael', 'input/MyCsv6.csv'), ('Steven', 'input/MyCsv6.csv'), ('Dana', 'input/MyCsv6.csv')]


2022-01-10 01:25:46,206 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-346/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:46,207 [DEBUG] lithops.job.job -- ExecutorID 9092d4-346 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:46,208 [INFO] lithops.invokers -- ExecutorID 9092d4-346 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:46,208 [DEBUG] lithops.invokers -- ExecutorID 9092d4-346 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:46,209 [DEBUG] lithops.invokers -- ExecutorID 9092d4-346 - Async invoker 0 started
2022-01-10 01:25:46,210 [DEBUG] lithops.invokers -- ExecutorID 9092d4-346 - Async invoker 1 started
2022-01-10 01:25:46,210 [DEBUG] lithops.invokers -- ExecutorID 9092d4-346 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:46,213 [INFO] li

[('Steven', 'input/MyCsv7.csv'), ('Michael', 'input/MyCsv7.csv'), ('Marc', 'input/MyCsv7.csv'), ('Scott', 'input/MyCsv7.csv'), ('Scott', 'input/MyCsv7.csv'), ('Marc', 'input/MyCsv7.csv'), ('John', 'input/MyCsv7.csv'), ('Dana', 'input/MyCsv7.csv'), ('Albert', 'input/MyCsv7.csv')]


2022-01-10 01:25:49,593 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-347/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:49,594 [DEBUG] lithops.job.job -- ExecutorID 9092d4-347 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:49,594 [INFO] lithops.invokers -- ExecutorID 9092d4-347 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:49,595 [DEBUG] lithops.invokers -- ExecutorID 9092d4-347 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:49,596 [DEBUG] lithops.invokers -- ExecutorID 9092d4-347 - Async invoker 0 started
2022-01-10 01:25:49,596 [DEBUG] lithops.invokers -- ExecutorID 9092d4-347 - Async invoker 1 started
2022-01-10 01:25:49,597 [DEBUG] lithops.invokers -- ExecutorID 9092d4-347 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:49,600 [INFO] li

[('Dana', 'input/MyCsv8.csv'), ('Marc', 'input/MyCsv8.csv'), ('Marc', 'input/MyCsv8.csv'), ('Steven', 'input/MyCsv8.csv'), ('Johanna', 'input/MyCsv8.csv'), ('Johanna', 'input/MyCsv8.csv'), ('Scott', 'input/MyCsv8.csv'), ('Marc', 'input/MyCsv8.csv'), ('John', 'input/MyCsv8.csv')]


2022-01-10 01:25:52,940 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-348/42728dfd788128508c1faec1b29b27a9.func.pickle - Size: 1.0KiB - OK
2022-01-10 01:25:52,941 [DEBUG] lithops.job.job -- ExecutorID 9092d4-348 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:52,941 [INFO] lithops.invokers -- ExecutorID 9092d4-348 | JobID A000 - Starting function invocation: inverted_map() - Total: 1 activations
2022-01-10 01:25:52,942 [DEBUG] lithops.invokers -- ExecutorID 9092d4-348 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:52,943 [DEBUG] lithops.invokers -- ExecutorID 9092d4-348 - Async invoker 0 started
2022-01-10 01:25:52,944 [DEBUG] lithops.invokers -- ExecutorID 9092d4-348 - Async invoker 1 started
2022-01-10 01:25:52,944 [DEBUG] lithops.invokers -- ExecutorID 9092d4-348 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:52,948 [INFO] li

[('Marc', 'input/MyCsv9.csv'), ('Albert', 'input/MyCsv9.csv'), ('John', 'input/MyCsv9.csv'), ('John', 'input/MyCsv9.csv'), ('Albert', 'input/MyCsv9.csv'), ('John', 'input/MyCsv9.csv'), ('Michael', 'input/MyCsv9.csv'), ('John', 'input/MyCsv9.csv'), ('Dana', 'input/MyCsv9.csv')]


2022-01-10 01:25:54,954 [DEBUG] lithops.monitor -- ExecutorID 9092d4-348 - Storage job monitor finished
2022-01-10 01:25:55,419 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-349/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:25:55,420 [DEBUG] lithops.job.job -- ExecutorID 9092d4-349 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:55,421 [INFO] lithops.invokers -- ExecutorID 9092d4-349 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:25:55,422 [DEBUG] lithops.invokers -- ExecutorID 9092d4-349 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:55,423 [DEBUG] lithops.invokers -- ExecutorID 9092d4-349 - Async invoker 0 started
2022-01-10 01:25:55,424 [DEBUG] lithops.invokers -- ExecutorID 9092d4-349 - Async invoker 1 started
2022-01-10 01:25:55,425 [DEBUG] lithops.invokers -- ExecutorID 9092d4-349 | 

['Albert', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv19.csv', 'input/MyCsv13.csv', 'input/MyCsv2.csv', 'input/MyCsv4.csv', 'input/MyCsv5.csv', 'input/MyCsv14.csv', 'input/MyCsv9.csv', 'input/MyCsv16.csv', 'input/MyCsv0.csv', 'input/MyCsv18.csv', 'input/MyCsv17.csv', 'input/MyCsv11.csv', 'input/MyCsv15.csv']


2022-01-10 01:25:58,745 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-350/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:25:58,746 [DEBUG] lithops.job.job -- ExecutorID 9092d4-350 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:25:58,747 [INFO] lithops.invokers -- ExecutorID 9092d4-350 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:25:58,748 [DEBUG] lithops.invokers -- ExecutorID 9092d4-350 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:25:58,749 [DEBUG] lithops.invokers -- ExecutorID 9092d4-350 - Async invoker 0 started
2022-01-10 01:25:58,750 [DEBUG] lithops.invokers -- ExecutorID 9092d4-350 - Async invoker 1 started
2022-01-10 01:25:58,750 [DEBUG] lithops.invokers -- ExecutorID 9092d4-350 | JobID A000 - Free workers: 1200 - Going to run 1 activations in 1 workers
2022-01-10 01:25:58,753 [INFO]

['Dana', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv6.csv', 'input/MyCsv17.csv', 'input/MyCsv3.csv', 'input/MyCsv8.csv', 'input/MyCsv13.csv', 'input/MyCsv4.csv', 'input/MyCsv9.csv', 'input/MyCsv0.csv', 'input/MyCsv5.csv', 'input/MyCsv18.csv', 'input/MyCsv15.csv']


2022-01-10 01:26:00,758 [DEBUG] lithops.monitor -- ExecutorID 9092d4-350 - Storage job monitor finished
2022-01-10 01:26:01,050 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-351/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:01,051 [DEBUG] lithops.job.job -- ExecutorID 9092d4-351 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:01,051 [INFO] lithops.invokers -- ExecutorID 9092d4-351 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:01,052 [DEBUG] lithops.invokers -- ExecutorID 9092d4-351 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:01,053 [DEBUG] lithops.invokers -- ExecutorID 9092d4-351 - Async invoker 0 started
2022-01-10 01:26:01,054 [DEBUG] lithops.invokers -- ExecutorID 9092d4-351 - Async invoker 1 started
2022-01-10 01:26:01,055 [DEBUG] lithops.invokers -- ExecutorID 9092d4-351 | 

['Johanna', 'input/MyCsv8.csv', 'input/MyCsv3.csv', 'input/MyCsv10.csv', 'input/MyCsv13.csv', 'input/MyCsv2.csv', 'input/MyCsv4.csv', 'input/MyCsv16.csv', 'input/MyCsv0.csv', 'input/MyCsv5.csv', 'input/MyCsv17.csv', 'input/MyCsv1.csv']


2022-01-10 01:26:03,064 [DEBUG] lithops.monitor -- ExecutorID 9092d4-351 - Storage job monitor finished
2022-01-10 01:26:03,332 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-352/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:03,333 [DEBUG] lithops.job.job -- ExecutorID 9092d4-352 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:03,334 [INFO] lithops.invokers -- ExecutorID 9092d4-352 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:03,334 [DEBUG] lithops.invokers -- ExecutorID 9092d4-352 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:03,335 [DEBUG] lithops.invokers -- ExecutorID 9092d4-352 - Async invoker 0 started
2022-01-10 01:26:03,336 [DEBUG] lithops.invokers -- ExecutorID 9092d4-352 - Async invoker 1 started
2022-01-10 01:26:03,336 [DEBUG] lithops.invokers -- ExecutorID 9092d4-352 | 

['John', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv8.csv', 'input/MyCsv3.csv', 'input/MyCsv10.csv', 'input/MyCsv19.csv', 'input/MyCsv2.csv', 'input/MyCsv13.csv', 'input/MyCsv4.csv', 'input/MyCsv5.csv', 'input/MyCsv14.csv', 'input/MyCsv9.csv', 'input/MyCsv16.csv', 'input/MyCsv18.csv', 'input/MyCsv17.csv', 'input/MyCsv11.csv', 'input/MyCsv1.csv']


2022-01-10 01:26:05,345 [DEBUG] lithops.monitor -- ExecutorID 9092d4-352 - Storage job monitor finished
2022-01-10 01:26:05,763 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-353/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:05,764 [DEBUG] lithops.job.job -- ExecutorID 9092d4-353 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:05,765 [INFO] lithops.invokers -- ExecutorID 9092d4-353 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:05,766 [DEBUG] lithops.invokers -- ExecutorID 9092d4-353 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:05,767 [DEBUG] lithops.invokers -- ExecutorID 9092d4-353 - Async invoker 0 started
2022-01-10 01:26:05,769 [DEBUG] lithops.invokers -- ExecutorID 9092d4-353 - Async invoker 1 started
2022-01-10 01:26:05,770 [DEBUG] lithops.invokers -- ExecutorID 9092d4-353 | 

['Marc', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv6.csv', 'input/MyCsv8.csv', 'input/MyCsv19.csv', 'input/MyCsv3.csv', 'input/MyCsv10.csv', 'input/MyCsv2.csv', 'input/MyCsv13.csv', 'input/MyCsv14.csv', 'input/MyCsv9.csv', 'input/MyCsv16.csv', 'input/MyCsv0.csv', 'input/MyCsv1.csv', 'input/MyCsv15.csv']


2022-01-10 01:26:07,779 [DEBUG] lithops.monitor -- ExecutorID 9092d4-353 - Storage job monitor finished
2022-01-10 01:26:07,966 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-354/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:07,967 [DEBUG] lithops.job.job -- ExecutorID 9092d4-354 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:07,967 [INFO] lithops.invokers -- ExecutorID 9092d4-354 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:07,968 [DEBUG] lithops.invokers -- ExecutorID 9092d4-354 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:07,969 [DEBUG] lithops.invokers -- ExecutorID 9092d4-354 - Async invoker 0 started
2022-01-10 01:26:07,970 [DEBUG] lithops.invokers -- ExecutorID 9092d4-354 - Async invoker 1 started
2022-01-10 01:26:07,971 [DEBUG] lithops.invokers -- ExecutorID 9092d4-354 | 

['Michael', 'input/MyCsv7.csv', 'input/MyCsv6.csv', 'input/MyCsv3.csv', 'input/MyCsv10.csv', 'input/MyCsv2.csv', 'input/MyCsv14.csv', 'input/MyCsv9.csv', 'input/MyCsv16.csv', 'input/MyCsv0.csv', 'input/MyCsv18.csv', 'input/MyCsv17.csv', 'input/MyCsv11.csv', 'input/MyCsv1.csv', 'input/MyCsv15.csv']


2022-01-10 01:26:09,980 [DEBUG] lithops.monitor -- ExecutorID 9092d4-354 - Storage job monitor finished
2022-01-10 01:26:10,161 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-355/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:10,162 [DEBUG] lithops.job.job -- ExecutorID 9092d4-355 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:10,163 [INFO] lithops.invokers -- ExecutorID 9092d4-355 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:10,164 [DEBUG] lithops.invokers -- ExecutorID 9092d4-355 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:10,165 [DEBUG] lithops.invokers -- ExecutorID 9092d4-355 - Async invoker 0 started
2022-01-10 01:26:10,167 [DEBUG] lithops.invokers -- ExecutorID 9092d4-355 - Async invoker 1 started
2022-01-10 01:26:10,167 [DEBUG] lithops.invokers -- ExecutorID 9092d4-355 | 

['Scott', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv8.csv', 'input/MyCsv3.csv', 'input/MyCsv19.csv', 'input/MyCsv10.csv', 'input/MyCsv2.csv', 'input/MyCsv13.csv', 'input/MyCsv4.csv', 'input/MyCsv14.csv', 'input/MyCsv16.csv', 'input/MyCsv5.csv', 'input/MyCsv17.csv', 'input/MyCsv11.csv', 'input/MyCsv15.csv']


2022-01-10 01:26:12,174 [DEBUG] lithops.monitor -- ExecutorID 9092d4-355 - Storage job monitor finished
2022-01-10 01:26:12,448 [DEBUG] lithops.storage.backends.ibm_cos.ibm_cos -- PUT Object lithops.jobs/9092d4-356/6f58df03b50aebe5addeb861ef5aa93c.func.pickle - Size: 655.0B - OK
2022-01-10 01:26:12,449 [DEBUG] lithops.job.job -- ExecutorID 9092d4-356 | JobID A000 - Data per activation is < 8.0KiB. Passing data through invocation payload
2022-01-10 01:26:12,449 [INFO] lithops.invokers -- ExecutorID 9092d4-356 | JobID A000 - Starting function invocation: inverted_reduce() - Total: 1 activations
2022-01-10 01:26:12,450 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 | JobID A000 - Worker processes: 1 - Chunksize: 1
2022-01-10 01:26:12,451 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 - Async invoker 0 started
2022-01-10 01:26:12,452 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 - Async invoker 1 started
2022-01-10 01:26:12,453 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 | 

['Steven', 'input/MyCsv12.csv', 'input/MyCsv7.csv', 'input/MyCsv6.csv', 'input/MyCsv17.csv', 'input/MyCsv19.csv', 'input/MyCsv8.csv', 'input/MyCsv4.csv', 'input/MyCsv16.csv', 'input/MyCsv0.csv', 'input/MyCsv5.csv', 'input/MyCsv18.csv', 'input/MyCsv11.csv', 'input/MyCsv15.csv']
MapReduce Completed


2022-01-10 01:26:14,083 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 - Async invoker 1 finished
2022-01-10 01:26:14,083 [DEBUG] lithops.invokers -- ExecutorID 9092d4-356 - Async invoker 0 finished


In [107]:
for file_name in os.scandir(TEMP_FOLDER):
    os.remove(file_name.path)
drop_temp_tables(connection)

attempt to write a readonly database


# Question 3
## Shuffle

MapReduceServerlessEngine deploys both map and reduce tasks as serverless invocations.   
However, once map stage completed, the result are transferred from the map tasks to the SQLite database located on the client machine (laptop in your case), then performed local shuffle and then invoked reduce tasks passing them relevant parameters.

(To support your answers, feel free to use examples, Images, etc.)
<br><br>

**1. Explain why this approach is not efficient and what are cons and pros of such architecture in general. In broader scope you may assume that MapReduceServerlessEngine executed in some powerful machine and not just laptop.**

\<your answer here>

<br><br>
**2. Suggest how can you improve shuffle so intermediate data will not be downloaded to the client at all and shuffle performed in the cloud as well. Explain pros and cons of the approaches you suggest.**


\<your answer here>

<br><br>
**3. Can you make serverless shuffle?**


\<your answer here>

<br><br><br><br>
Good Luck :) 