hadoop_pip

Manage AWS EMR clusters running Spark PIP

Helper script to send steps to EMR machines. Main usage:

usage: run_pip.py [-h] --config [CONFIG [CONFIG ...]]
run_pip.py: error: argument --config/-c is required

Pass a config (or list of configs) to run_pip.py. This will write application.properties files for each job to S3.

It will then start an EMR cluster, and passing each application.properties file to process_job.py to kick off each job.

If run within other python code (by importing function run from run_pip.py), can be used to return the CSV point-in-polygon outputs from these jobs.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
hadoop_pip		hadoop_pip
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hadoop_pip

About

Releases 3

Packages

Contributors 2

Languages

License

wri/hadoop_pip

Folders and files

Latest commit

History

Repository files navigation

hadoop_pip

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages