Code for the Data Lake Talk
Python Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
cp_file.sh
mapreduce.py
spark_wordcount.py

README.md

Creating a Local Data Lake

Companion code and slides to my talk about whether you need Hadoop. A lot of processing can be done in-memory on your laptop, if you have a reasonably modern laptop.

For example, you can run MapReduce with PyPy.