Skip to content

Latest commit

 

History

History
90 lines (61 loc) · 4.32 KB

DEPLOY.md

File metadata and controls

90 lines (61 loc) · 4.32 KB

Interactive Apache Spark with JupyterLab and Apache Livy on Mesosphere DC/OS

Prerequisites

Spin up a 9 private agent, 1 public agent DC/OS 1.10 Stable cluster

Deploy Marathon-LB

dcos package install --yes marathon-lb

Note: If deploying Marathon-LB on Mesosphere DC/OS Enterprise, please use a service account and setup the appropriate ACLs as documented in Provisoning Marathon-LB

Ref: Deploy Marathon-LB on Mesosphere DC/OS Enterprise

Deploy Apache Livy

dcos marathon app add https://github.com/vishnu2kmohan/livy-dcos-docker/raw/master/livy-marathon.json

Note: The default livy.conf may be modified and rehosted on a webserver to suit your specific needs. Modify the uri to point to its location your webserver.

Deploy JupyterLab

Note: This JupyterLab setup has BeakerX and sparkmagic preinstalled.

curl -O https://raw.githubusercontent.com/vishnu2kmohan/beakerx-dcos-docker/master/beakerx-sparkmagic-marathon.json

Edit and set the value of the HAPROXY_0_VHOST label to the hostname (or ideally, a unique CNAME) of the loadbalancer fronting the public agent(s) where Marathon-LB is installed.

dcos marathon app add beakerx-sparkmagic-marathon.json

Note: The default sparkmagic config.json may be modified and rehosted on a webserver to suit your specific needs. Modify the uri to point to its location on your webserver.

Connect to JupyterLab

Point your web browser to the VHOST that was specified.

The default password is set to jupyter if you deployed the app to /beakerx using the default Marathon app definition.

If you modified and deployed the app into a folder, e.g., /foo/bar/beakerx the auto-configured password will be jupyter-foo-bar

Ref: Jupyter Notebook Password Provisioning

Start a PySpark3 Notebook from the JupyterLab Launcher and paste the following code into a cell

SparkPi

from random import random 
from operator import add

partitions = 10
n = 100000 * partitions

def f(_):
    x = random() * 2 - 1
    y = random() * 2 - 1
    return 1 if x ** 2 + y ** 2 <= 1 else 0

n = 100000 * 50
count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))

Ctrl-Enter to execute the code in the cell, which will trigger sparkmagic to communicate with Apache Livy whose livy.conf has been configured to spawn Spark Executors on your Mesosphere DC/OS cluster.

References