# Parallel Processing of Data Using MapReduce
This notebook will enable you to understand how to analyze data in parallel using the map and reduce functions of MapReduce.

Please note that the map function used in this notebook is not a real map. A real MapReduce framework like Hadoop or Spark requires some additional configuration and normally will not be applied to data that is so small. Therefore, you might find the runtime between different parallel processing notebooks does not vary too much.

In [None]:
import time
from functools import reduce
import sys
import math

def breakDoc(text,nToBreakInto):
    textList=[]
    fLength = len(text)
    nLinesInEach = int(math.ceil(float(fLength)/nToBreakInto))
    for i in range(nToBreakInto):
        startIndex=i*nLinesInEach
        endIndex=(i+1)*nLinesInEach
        if endIndex<=fLength-1:
            textList.append(text[startIndex:endIndex])
        else:
            textList.append(text[startIndex:])
    return textList

def loadText():
    textList=[]
    condition=True
    while condition:
        text=input('Please Enter the Text You Want to Encipher: ')
        if text=='stop':
            condition=False
        else:
            textList.append(text)
    return textList

def cipher(text,key):
    import string
    stri=""
    for ch in text:
        if ch not in string.ascii_letters:
            stri+=ch
        else:
            output = chr(ord(ch) + key)
            outputNum = ord(output)
            if 64 < outputNum < 91 or 96 <outputNum < 123:
                        stri+=output
            else:
                x=chr(outputNum-26)
                stri+=x
    return stri

def CCMapReduce(text,key,nToBreakInto):
    #starttime = datetime.datetime.now()
    start = time.process_time()
    textList=breakDoc(text,nToBreakInto)
    encodedList=list(map(cipher,textList,[key]*len(textList)))
    encodedText=reduce(lambda x,y:x+y,encodedList)
    #endtime = datetime.datetime.now()
    #print "Runtime: ",(endtime - starttime).seconds,"seconds"
    stop=time.process_time()
    print("Runtime: ",(stop-start),"seconds")
    return encodedText

def loadDocument():
    filename=input('Please Enter the Text You Want to Encipher: ')
    with open(filename) as f:
        text=f.read()
    return text

## Encrpyt one document with MapReduce
The cell below breaks a document into several chunks, encrypt each of the chunks separately and joins the results into one document. It uses the divide-and-conquer strategy, that is, splitting the data, processing the data, and joining the results. Once the cell below is run, it will output the runtime of the function.

Please use the text file called "merge.txt". It includes three novels, _Pride and Prejudice_, _Jane Eyre_ and _Crime and Punishment_.

In [None]:
text=loadDocument()
nToBreakInto=int(input("Please Enter the Number of Chunks: "))
key=int(input("Please Enter Shift Key: "))
encodedText=CCMapReduce(text,key,nToBreakInto)

** Print the encrypted document**

In [None]:
print(encodedText)

Copy and paste the two cells above and vary the value for the shift key and the number of pieces in which to divide the dataset.

**Question**: How does the run time vary with different values of the shift key? You need to keep the number of pieces constant to answer this question.  

**Question**: How does the run time vary with different values for the number of pieces? You need to keep the value for the shift key to answer this question.

**Question**: What is the speedup time for a shift key of 5 and the use of 3 pieces? Show the equation you are using to calculate the speedup time.

**Question**: For similar values for the number of chunks and shift keys, how does the run time using MapReduce compare to the run time from the Parallel Processing Notebook? 

**Note** You may reuse the copied and paste cells to rerun the experiment (only copy and paste once).

**Question**: Discuss why or why not encrypting files is an embarrassingly parallel problem.

## Parallelism and Critical Paths

a.	Describe a problem where a MapReduce approach would make processing more efficient.

b.  Describe a problem where parallel processing would only help in some steps