H(ad)oopla! A Python script to fetch the output of failed Python Hadoop streaming jobs. It scraps the hadoop web interface and gets a random failed mapper and reducer task. It outputs it with code highlighting for easy reading.
doopla -h Usage: doopla [<jobid>] doopla -h | --help doopla --version Options: -h --help Show this screen. --version Show version.
- Automatically get the last failed job for a user
- Code highlighting via
Two options for installing:
pip install doopla
git clone and setup.py:
git clone email@example.com:trustyou/doopla.git cd doopla python setup.py install
doopla please create a file in your home directory called
.doopla and add
[main] hadoop_version: <HADOOP_VERSION> # either 1 or 2 - defaults to 2 hadoop_user: <HADOOP_USER> hadoop_url: <HADOOP_URL> # For Hadoop 2.x use the Job history URL http_user: <USER> http_password: <THE_PASSWORD>
HADOOP_URL for the HTTP URL of your the Hadoop Web interface. Replace
HADOOP_USER for your hadoop user (or the one you want to check) and the
HTTP_PASSWORD for the http password you normally use to log into the web interface.
The is simple a mather of executing
It will search for the most recently failed job and get the output.
$ doopla JOB_ID
If you want to get the output of a specific job.
You can also add
2>/dev/null if you want to shut down the HTTPS certificate warnings.
This is a 4 hours hack while skipping lunch and waiting for a job to finish so it is in alpha stage and it is full of bugs. So feel free to create pull requests if you see something that can be improved.
- Python >= 2.6 or >= 3.3
MIT licensed. See the bundled
LICENSE <https://github.com/mfcabrera/doopla/blob/master/LICENSE>_ file for more details.