corelogic-on-greatlakes

Documentation, How-tos, and example code for using Great Lakes to process U-M CoreLogic data.

Overview

This repository demonstrates a workflow for processing the CoreLogic Data on the Great Lakes (GL) cluster at the University of Michigan.

The repository is organized in several directories that each demo one step in the following workflow:

[intro-to-corelogic-data]: describes the CoreLogic data and how you can get access at the University of Michigan
[running-jupyter-spark-gl-ondemand]: describes how a user can start a Jupyter + Spark notebook in an Open on Demand session on the Great Lakes (GL) cluster
[processing-corelogic-using-pyspark]: demonstrates how the CoreLogic data can be processed (read, explore, filter, save/write) using PySpark
[github-and-greatlakes]: explains how a user can clone from/commit to a Github repository from their home directory on the Great Lakes (GL) cluster

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
github-and-greatlakes		github-and-greatlakes
intro-to-corelogic-data		intro-to-corelogic-data
processing-corelogic-using-pyspark		processing-corelogic-using-pyspark
running-jupyter-spark-gl-ondemand		running-jupyter-spark-gl-ondemand
.gitignore		.gitignore
README.md		README.md