Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 54 additions & 2 deletions examples/mnist/TFOS_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,44 @@
"In addition, there is a new [dfutil](https://yahoo.github.io/TensorFlowOnSpark/tensorflowonspark.dfutil.html) module which provides helper functions to convert from TensorFlow TFRecords to Spark DataFrames and vice versa.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start a Spark Standalone Cluster\n",
"\n",
"First, in a terminal/shell window, start a single-machine Spark Standalone Cluster with three workers:\n",
"```\n",
"export MASTER=spark://$(hostname):7077\n",
"export SPARK_WORKER_INSTANCES=3\n",
"export CORES_PER_WORKER=1\n",
"export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) \n",
"${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Launch the Spark Jupyter Notebook\n",
"\n",
"Now, in the same window, launch a Pyspark Jupyter notebook:\n",
"```\n",
"cd ${TFoS_HOME}/examples/mnist\n",
"PYSPARK_DRIVER_PYTHON=\"jupyter\" \\\n",
"PYSPARK_DRIVER_PYTHON_OPTS=\"notebook --ip=`hostname`\" \\\n",
"pyspark --master ${MASTER} \\\n",
"--conf spark.cores.max=${TOTAL_CORES} \\\n",
"--conf spark.task.cpus=${CORES_PER_WORKER} \\\n",
"--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist_pipeline.py \\\n",
"--conf spark.executorEnv.JAVA_HOME=\"$JAVA_HOME\"\n",
"```\n",
"\n",
"This should open a Jupyter browser pointing to the directory where this notebook is hosted.\n",
"Click on the TFOS_pipeline.ipynb file, and begin executing the steps of the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -293,7 +331,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call to save the output to disk."
"Now, invoke the `TFModel.transform()` method and save the output DataFrame. **Note**: Spark \"transformations\" are \"lazy\" by design, so no actual inferencing will occur until an \"action\" is invoked on the output DataFrame `preds`, which in this case is the `write.json` call below to save the output to disk."
]
},
{
Expand All @@ -316,6 +354,20 @@
"print(subprocess.check_output([\"ls\", \"-l\", output]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Shutdown\n",
"\n",
"In your terminal/shell window, you can type `<ctrl-C>` to exit the Notebook server.\n",
"\n",
"Then, stop the Standalone Cluster via:\n",
"```\n",
"${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -340,7 +392,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
"version": "2.7.13"
}
},
"nbformat": 4,
Expand Down