This is a simple set of bash functions for manipulating a Amazon Elastic MapReduce clusters.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
You must install the AWS Command Line Interface.
You then must setup a credentials file with your default emrsettings and configure your EMR_HOME to the directory that hosts that ruby client install root directory.
export EMR_HOME=/path/to/credentialsFileFolder
Finally, you must source the setenv.sh
file
. setenv.sh
Setting EMR_CRED_JSON will allow you to override the credentials.json
file .
To find an existing cluster:
emrlist
To attach to a cluster, using a flow id:
emrset <flow id>
To get the current flow id:
emrset
To remotely login to the master node of the current flow id:
emrlogin
To remotely login with just the ip address:
emrlogin <ip address>
Note that most commands will take the flow id or an ip address to override the default flow id set using emrset
.
This is shorthand for calling from the shell.
emr <some args>
When you start a flow on EMR, you will be given a flow id. Use emrset to set the flow id for use by many of the other commands
emrset <flow id>
Calling emrset without the id returns the current flow id.
Will return all job flows created in the last 2 days
Will return the current master node on the EMR cluster.
Will remotely login to the master node.
Will return the current status of a given running flow.
Will terminate your remote EMR cluster.
Will launch screen on the master node. Screen must be already installed. If a screen instance is already running, this command will automatically attach.
Will automatically 'tail' the current flow step logs.
emrtail 2
Without a step number, a list of available steps will be displayed.
Will create a local SOCKS proxy to the master node. This is useful for accessing the JobTracker and NameNode. You must install FoxyProxy in FireFox for this to work best.
Will scp a given file to the remote master node.
emrscp my-hadoop-app.jar
This is useful if you leave your EMR cluster running and want to manually spawn jobs from emrlogin or emrscreen.
Will scp all conf/*-site.xml
files from the master node into the given directory.
emrconf local-conf
This is useful if you leave your EMR cluster running on a AWS VPC and wish to run Hadoop jobs from a local shell.