Adds support for deploying PySpark models by dcrankshaw · Pull Request #196 · ucbrise/clipper

dcrankshaw · 2017-06-04T01:52:05Z

You can see how it works by looking at the integration tests I added in integration-tests/.

This should wait for #192 before being merged.

Note that after this gets merged, we need to create an automated build for clipper/pyspark-container on DockerHub.

AmplabJenkins · 2017-06-04T02:08:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/382/
Test FAILed.

AmplabJenkins · 2017-06-04T02:12:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/383/
Test FAILed.

dcrankshaw · 2017-06-04T02:14:44Z

jenkins test this please

AmplabJenkins · 2017-06-04T02:24:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/384/
Test FAILed.

AmplabJenkins · 2017-06-04T02:55:52Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/385/
Test FAILed.

AmplabJenkins · 2017-06-04T03:14:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/386/
Test FAILed.

AmplabJenkins · 2017-06-04T05:20:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/387/
Test FAILed.

AmplabJenkins · 2017-06-04T05:42:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/388/
Test FAILed.

feynmanliang · 2017-06-04T20:41:22Z

bin/run_unittests.sh

  cd $DIR
+  if [ -z ${SPARK_HOME+x} ]; then
+    echo "Downloading Spark"
+    curl -o spark.tgz https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz


should we pull vars for these versions?

feynmanliang · 2017-06-04T20:42:26Z

integration-tests/deploy_pyspark_models.py

+import findspark
+findspark.init()
+import pyspark
+from pyspark import SparkConf, SparkContext


Is SparkConf used?

feynmanliang · 2017-06-04T20:42:40Z

containers/python/pyspark_container.py

+import findspark
+findspark.init()
+import pyspark
+from pyspark import SparkConf, SparkContext


Is SparkConf used?

feynmanliang · 2017-06-04T20:50:32Z

PySparkContainerDockerfile

+COPY containers/python/python_container_conda_deps.txt /lib/
+
+RUN curl -o /spark.tgz https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz \
+      && cd / && tar zxf /spark.tgz && mv /spark-2.1.1-bin-hadoop2.7 /spark \


Should we extract vars for these versions and envsubst or BUILDARGS them in?

Corey-Zumar

Preliminary style review + questions. Will attempt to train and deploy PySpark models after #192 is merged and we rebase on develop.

Corey-Zumar · 2017-06-04T20:10:54Z

clipper_admin/clipper_manager.py

            if local_path != remote_path:
                if os.path.isdir(local_path):
-                    self._copytree(local_path, remote_path)
+                    # self._copytree(local_path, remote_path)


Remove commented code

Corey-Zumar · 2017-06-04T20:13:19Z

clipper_admin/clipper_manager.py

-    # where dst may or may not exist. We cannot use
-    # shutil.copytree() alone because it stipulates that
-    # dst cannot already exist
-    def _copytree(self, src, dst, symlinks=False, ignore=None):


What is the rationale behind deleting this method? Did you run into an issue with it?

Yeah it wasn't working for recursive copying of directories.

Corey-Zumar · 2017-06-04T20:15:48Z

clipper_admin/clipper_manager.py

+        else:
+            print(
+                "Warning: Anaconda environment was either not found or exporting the environment "
+                "failed. Your function will still be serialized deployed, but may fail due to "


I think the second sentence should read Your function will still be serialized and deployed, but may fail due to... (missing and)

Corey-Zumar · 2017-06-04T20:17:01Z

clipper_admin/clipper_manager.py

+        # Remove temp files
+        shutil.rmtree(serialization_dir)
+
+        return deploy_result


Add a Returns section to documentation with type and explanation of deploy_result

Corey-Zumar · 2017-06-04T20:19:38Z

clipper_admin/clipper_manager.py

+
+        return deploy_result
+
    def deploy_predict_function(self,


This method should also have Returns documentation

Corey-Zumar · 2017-06-05T01:38:21Z

containers/python/pyspark_container.py

+                                        spark_model_path)
+
+    def predict_ints(self, inputs):
+        if self.input_type != rpc.INPUT_TYPE_INTS:


Why are we performing an additional check on the validity of the input type? Lines 270-278 of rpc.py should take care of this validation.

Oh I copied this code from the python_container.py code. I didn't realize it was redundant. I'll remove it.

Got it. Can you remove these checks from python_container as well?

Corey-Zumar · 2017-06-05T01:41:11Z

containers/python/pyspark_container.py

+        preds = self.predict_func(self.spark, self.model, inputs)
+        return [str(p) for p in preds]
+
+    def _log_incorrect_input_type(self, input_type):


Remove this if input type validation is redundant (see comment above regarding preexisting validation in rpc.py)

Corey-Zumar · 2017-06-05T01:44:10Z

integration-tests/deploy_pyspark_models.py

+PORT_RANGE = [34256, 40000]
+
+
+def find_unbound_port():


This doesn't have to be done now, but we should eventually create a reusable module that we can import this functionality from because it's also defined in many_apps_many_models.py in the same directory.

Agreed. Filed #201.

Corey-Zumar · 2017-06-05T01:44:29Z

integration-tests/deploy_pyspark_models.py

+
+def parseData(line, obj, pos_label):
+    fields = line.strip().split(',')
+    # return LabeledPoint(obj(int(fields[0]), pos_label), [float(v)/255.0 for v in fields[1:]])


Remove commented code.

Corey-Zumar · 2017-06-05T01:55:30Z

clipper_admin/clipper_manager.py


            vol = "{model_repo}/{name}/{version}".format(
                model_repo=MODEL_REPO, name=name, version=version)
+            print("Vol is: %s" % vol)


Consider removing debug print statement.

AmplabJenkins · 2017-06-05T02:25:43Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/397/
Test FAILed.

dcrankshaw · 2017-06-05T02:27:58Z

jenkins test this please

AmplabJenkins · 2017-06-05T02:49:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/398/
Test FAILed.

AmplabJenkins · 2017-06-05T02:54:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/399/
Test FAILed.

AmplabJenkins · 2017-06-05T03:11:01Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/400/
Test FAILed.

AmplabJenkins · 2017-06-05T03:34:34Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/401/
Test FAILed.

AmplabJenkins · 2017-06-05T04:09:33Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/402/
Test PASSed.

Corey-Zumar

Just a couple minor comments. This works well!

Corey-Zumar · 2017-06-05T05:13:04Z

containers/python/pyspark_container.py

+        correct_input_type = rpc.input_type_to_string(self.input_type)
+        print(
+            "Attempted to use prediction function for input type {incorrect_input_type}.\
+            This model-container was configured accept data for input type {correct_input_type}"


Nit: Missing "to". This should be

Attempted to use prediction function for input type {incorrect_input_type}, but this model-container was configured to accept data for input type {correct_input_type}

Corey-Zumar · 2017-06-05T05:17:00Z

PySparkContainerDockerfile

+
+CMD ["/container/pyspark_container_entry.sh"]
+
+# vim: set filetype=dockerfile:


Remove commented code

This actually isn't a comment, it's a vim modeline directive http://vim.wikia.com/wiki/Modeline_magic

Corey-Zumar

LGTM

dcrankshaw added status: needs review type: enhancement labels Jun 4, 2017

dcrankshaw self-assigned this Jun 4, 2017

dcrankshaw added 11 commits June 3, 2017 19:00

saving pyspark looks like it works

8ebebed

more work on pyspark container. untested

2b83074

updated docker files

d3fc6c7

fixed imports

f45fa0f

initialization of pyspark docker container works

e5a77b2

pyspark deploy integration tests pass

58090d8

got mllib tests passing again

8a21741

pyspark deploy is working

ff31db3

download spark if not present

63cbf9a

format code

74cdb85

format code

a84d2c4

dcrankshaw force-pushed the deploy_pyspark_models branch from 69af1ad to a84d2c4 Compare June 4, 2017 02:05

fixed name of pyspark dockerfile

5f7da90

cleanup

0fd8a00

moved failing clipper_manager test to end

1b1920d

dcrankshaw added 3 commits June 3, 2017 21:57

added data dependence

83e7caf

format code

5b3ddc7

Fixed tests

8e4fb0c

feynmanliang reviewed Jun 4, 2017

View reviewed changes

dcrankshaw mentioned this pull request Jun 4, 2017

Deploying model from s3 fails for clipper running inside non-Debian distros #199

Closed

Corey-Zumar requested changes Jun 5, 2017

View reviewed changes

merged develop

7a63115

addressed review comments.

874e042

fix labels issue

3a3815f

fixed deploy_pyspark labels

b885989

fixed label arg order in deploy_pyspark_model

c9331c3

Corey-Zumar requested changes Jun 5, 2017

View reviewed changes

Corey-Zumar added status: needs revision and removed status: needs review labels Jun 5, 2017

Corey-Zumar approved these changes Jun 5, 2017

View reviewed changes

Corey-Zumar merged commit 946fc1d into ucbrise:develop Jun 5, 2017

dcrankshaw mentioned this pull request Jun 8, 2017

R model deployment #195

Merged


		CMD ["/container/pyspark_container_entry.sh"]

		# vim: set filetype=dockerfile:

Conversation

dcrankshaw commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

dcrankshaw commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

AmplabJenkins commented Jun 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feynmanliang Jun 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feynmanliang Jun 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Corey-Zumar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

feynmanliang Jun 4, 2017 •

edited

Loading

feynmanliang Jun 4, 2017 •

edited

Loading

Corey-Zumar left a comment •

edited

Loading