From bb55a0ff9d7f926596ae276a954ee324ceac621d Mon Sep 17 00:00:00 2001 From: Lee Yang Date: Fri, 23 Feb 2018 14:26:57 -0800 Subject: [PATCH] add instructions for starting up the notebook --- examples/mnist/TFOS_pipeline.ipynb | 56 ++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/examples/mnist/TFOS_pipeline.ipynb b/examples/mnist/TFOS_pipeline.ipynb index 6a65f1fb..5f8c0801 100644 --- a/examples/mnist/TFOS_pipeline.ipynb +++ b/examples/mnist/TFOS_pipeline.ipynb @@ -21,6 +21,44 @@ "In addition, there is a new [dfutil](https://yahoo.github.io/TensorFlowOnSpark/tensorflowonspark.dfutil.html) module which provides helper functions to convert from TensorFlow TFRecords to Spark DataFrames and vice versa.\n" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Start a Spark Standalone Cluster\n", + "\n", + "First, in a terminal/shell window, start a single-machine Spark Standalone Cluster with three workers:\n", + "```\n", + "export MASTER=spark://$(hostname):7077\n", + "export SPARK_WORKER_INSTANCES=3\n", + "export CORES_PER_WORKER=1\n", + "export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) \n", + "${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Launch the Spark Jupyter Notebook\n", + "\n", + "Now, in the same window, launch a Pyspark Jupyter notebook:\n", + "```\n", + "cd ${TFoS_HOME}/examples/mnist\n", + "PYSPARK_DRIVER_PYTHON=\"jupyter\" \\\n", + "PYSPARK_DRIVER_PYTHON_OPTS=\"notebook --ip=`hostname`\" \\\n", + "pyspark --master ${MASTER} \\\n", + "--conf spark.cores.max=${TOTAL_CORES} \\\n", + "--conf spark.task.cpus=${CORES_PER_WORKER} \\\n", + "--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist_pipeline.py \\\n", + "--conf spark.executorEnv.JAVA_HOME=\"$JAVA_HOME\"\n", + "```\n", + "\n", + "This should open a Jupyter browser pointing to the directory where this notebook is hosted.\n", + "Click on the TFOS_pipeline.ipynb file, and begin executing the steps of the notebook." + ] + }, { "cell_type": "code", "execution_count": null, @@ -293,7 +331,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call to save the output to disk." + "Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call below to save the output to disk." ] }, { @@ -316,6 +354,20 @@ "print(subprocess.check_output([\"ls\", \"-l\", output]))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Shutdown\n", + "\n", + "In your terminal/shell window, you can type `` to exit the Notebook server.\n", + "\n", + "Then, stop the Standalone Cluster via:\n", + "```\n", + "${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh\n", + "```" + ] + }, { "cell_type": "code", "execution_count": null, @@ -340,7 +392,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", - "version": "2.7.12" + "version": "2.7.13" } }, "nbformat": 4,