Skip to content
Simple environment to help rebuild Cloudera's Apache Spark.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
puppet/production
.gitignore
LICENSE
README.md
Vagrantfile
undelete.patch

README.md

Build Cloudera Spark with Various Extras

A Vagrant setup on a CentOS 7 machine to allow for the quick build/rebuild of Cloudera's Apache Spark from https://github.com/cloudera/spark .

Requirements

Make sure that you have the following softwares installed:

Get Started

Clone this git repository to your local workstation:

git clone https://github.com/teamclairvoyant/vagrant-sparkbuilder.git
cd vagrant-sparkbuilder

Start the Vagrant instance:

vagrant up
vagrant ssh

Inside the instance, change to the spark directory:

cd spark

Checkout the branch/tag that corresponds to the target CDH version and build Spark with the Hive Thriftserver while excluding dependencies that are shipped as part of CDH:

git checkout cdh5.7.0-release
patch -p0 </vagrant/undelete.patch
./make-distribution.sh -DskipTests \
  -Dhadoop.version=2.6.0-cdh5.7.0 \
  -Phadoop-2.6 \
  -Pyarn \
  -Phive -Phive-thriftserver \
  -Pflume-provided \
  -Phadoop-provided \
  -Phbase-provided \
  -Phive-provided \
  -Pparquet-provided
git checkout -- make-distribution.sh

Copy the resulting distribution back to your local workstation:

rsync -av dist/ /vagrant/dist-cdh5.7.0-nodeps

More Examples

Checkout the branch/tag that corresponds to the target CDH version and build Spark with the Hive Thriftserver:

git checkout cdh5.5.2-release
patch -p0 </vagrant/undelete.patch
./make-distribution.sh -DskipTests \
  -Dhadoop.version=2.6.0-cdh5.5.2 \
  -Phadoop-2.6 \
  -Pyarn \
  -Phive -Phive-thriftserver
git checkout -- make-distribution.sh
rsync -av dist/ /vagrant/dist-cdh5.7.0

Checkout the branch/tag that corresponds to the target CDH version and build Spark with the Hive Thriftserver while excluding dependencies that are shipped as part of CDH:

git checkout cdh5.5.2-release
patch -p0 </vagrant/undelete.patch
./make-distribution.sh -DskipTests \
  -Dhadoop.version=2.6.0-cdh5.5.2 \
  -Phadoop-2.6 \
  -Pyarn \
  -Phive -Phive-thriftserver \
  -Pflume-provided \
  -Phadoop-provided \
  -Phbase-provided \
  -Phive-provided \
  -Pparquet-provided
git checkout -- make-distribution.sh
rsync -av dist/ /vagrant/dist-cdh5.5.2-nodeps

Checkout the branch/tag that corresponds to the target CDH version and build Spark with the SparkR bits while excluding dependencies that are shipped as part of CDH:

sudo yum -y -e1 -d1 install epel-release
sudo yum -y -e1 -d1 install R
git checkout cdh5.7.0-release
patch -p0 </vagrant/undelete.patch
./make-distribution.sh -DskipTests \
  -Dhadoop.version=2.6.0-cdh5.7.0 \
  -Phadoop-2.6 \
  -Pyarn \
  -Psparkr \
  -Pflume-provided \
  -Phadoop-provided \
  -Phbase-provided \
  -Phive-provided \
  -Pparquet-provided
git checkout -- make-distribution.sh
rsync -av dist/ /vagrant/dist-cdh5.7.0-nodeps-R

License

Copyright (C) 2016 Clairvoyant, LLC.

Licensed under the Apache License, Version 2.0.

You can’t perform that action at this time.