Skip to content

Latest commit

 

History

History
80 lines (55 loc) · 1.84 KB

map_reduce.md

File metadata and controls

80 lines (55 loc) · 1.84 KB

Basic MapReduce Understanding

SQL

The sample query below allows to aggregate over groups. It has a nested query and uses the output of an inner query in an outer one.

Query to find the percentage of executions from each county.

SELECT
  county,
  100.0 * COUNT(*) / (SELECT COUNT(*) FROM executions)
    AS percentage
FROM executions
GROUP BY county
ORDER BY percentage DESC

This is MapReduce in SQL. MapReduce is a famous programming paradigm which views computations as occuring in a "map" and "reduce" step.

Ref : Nested Query - SelectStar

Python

map()

The map() function executes a specified function for each item in a iterable. The item is sent to the function as a parameter.

data = [1, 2, 3, 4, 5, 6]
mapped_result = map(lambda x: x*2, data)

Output

[2, 4, 6, 8, 10, 12]

Ref : Map Function - W3Schools

reduce()

A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.

import numpy as np

x = np.arange(1, 6)
np.add.reduce(x)

Output

15

Ref : Reduce - DataScienceHandbook

Combining map and reduce

data = [1, 2, 3, 4, 5, 6]
mapped_result = map(lambda x: x*2, data)

final_result = reduce(lambda x, y: x+y, mapped_result)

Output

42

Ref : Simple explanation of MapReduce - Stackoverflow

More References