[CLIPPER-113] Implement model active version tracking and updates #115

dcrankshaw · 2017-04-07T03:19:13Z

This PR associated a current_model_version with each model and makes applications specify candidate model names but not versions. Instead, the current version of a model is deployed. Model versions are automatically updated when a newer model version is deployed. I also implemented a set_model_version REST call in the management frontend that allows users to manually set the current model version to any version that has previously been deployed. This allows for model rollback and roll-forward.

I removed the feedback section of the selection policy and the tutorial wording still needs a little bit of work.

dcrankshaw · 2017-04-07T03:20:19Z

Note that in order to test this with the revised tutorial, you will need to build the query_frontend and management_frontend containers locally. The ones deployed to DockerHub are from the develop branch.

AmplabJenkins · 2017-04-07T03:23:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/127/
Test FAILed.

dcrankshaw · 2017-04-07T03:26:32Z

@nishadsingh1 Can you make a version of the deploy sklearn model image from the tutorial for the TensorFlow model? In other words, delete the sklearn box from this image. With this new model versioning PR, the TF model replaces the sklearn model and I would like to illustrate that.

Also, what did you use to make those images? Can you add whatever powerpoint/google slides file you used to the Clipper Google Drive?

dcrankshaw · 2017-04-07T03:26:55Z

jenkins test this please

AmplabJenkins · 2017-04-07T03:41:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/128/
Test PASSed.

Corey-Zumar

Tests pass and functionality seems to be mostly complete. Comments address a couple of concerns involving behavior with deploy_model() in conjunction with set_model_version. Additionally:

Line 491 of clipper_manager.py should refer to Clipper.deploy_model() instead of Clipper.add_model().
What’s the intent of the “Warning” image in the tutorial_part_2 notebook? Seems unclear to me.

Corey-Zumar · 2017-04-10T00:01:57Z

src/frontends/src/query_frontend.hpp

-                                      policy, latency_slo_micros, input_type);
+        std::vector<VersionedModelId> versioned_models;
+        {
+          std::unique_lock<std::mutex> l(current_model_versions_mutex_);


Do we need to hold this lock for the entire query processing and response creation procedure?

Nope. The lock is in a nested block so it will go out of scope and be released on line 171 before the prediction is made.

Corey-Zumar · 2017-04-10T00:02:58Z

src/frontends/src/query_frontend.hpp

-                                               models, policy, input_type);
+        std::vector<VersionedModelId> versioned_models;
+        {
+          std::unique_lock<std::mutex> l(current_model_versions_mutex_);


As above, do we need to hold this lock for the entire update procedure or just until the versioned_models map is populated? If the answer is the latter, releasing the lock earlier may avoid resource contention between the update and predict functions.

The lock goes out of scope on line 210.

Corey-Zumar · 2017-04-10T00:05:17Z

src/libclipper/include/clipper/redis.hpp

+/**
+* Subscribes to changes in model versions.
+*
+* The


Nit: check documentation formatting

Corey-Zumar · 2017-04-10T00:06:18Z

src/libclipper/src/metrics.cpp

@@ -1,3 +1,4 @@
+#include <algorithm>


Is this include necessary? (Maybe gcc is throwing an error where clang isn't?, etc)

std::sort which we use in this file is defined in <algorithm>. I prefer to explicitly include the header dependencies for a file as much as possible and I noticed this one was missing.

Corey-Zumar · 2017-04-10T00:17:38Z

src/libclipper/src/redis.cpp

+    std::string key = gen_model_current_version_key(model_name);
+    auto result = send_cmd_with_reply<string>(redis, {"GET", key});
+    if (result) {
+      return std::stoi(*result);


Should we verify that result is a nonnegative integer?

Yeah good idea.

Corey-Zumar · 2017-04-10T00:22:11Z

src/libclipper/src/redis.cpp

                                std::move(callback));
 }

 void subscribe_to_container_changes(
    Subscriber& subscriber,
    std::function<void(const std::string&, const std::string&)> callback) {
-  subscribe_to_keyspace_changes(REDIS_CONTAINER_DB_NUM, subscriber,
+  subscribe_to_keyspace_changes(REDIS_CONTAINER_DB_NUM, "", subscriber,


Nit: For clarity, defining const std::string NO_PREFIX = ""; and using it in place of "" is clearer.

Hmm I actually disagree here. Defining NO_PREFIX as a constant seems like treating this as a special case. I'll define a local variable std::string prefix = "" in each method instead.

Corey-Zumar · 2017-04-10T01:34:47Z

management/clipper_manager.py

@@ -555,6 +557,37 @@ def inspect_instance(self):
            s = r.text
        return s

+    def set_model_version(model_name, model_version, num_containers=0):


We should make this the only point through which model rollback / rollforward can be achieved. Currently, we can also call deploy_model() while specifying a previously-added version of a model. However, deploy_model() has no checks in place to ensure that the model being added is actually equivalent to the old model corresponding to the given version. Hypothetically, version x of a model can therefore change throughout time, and this seems to defeat the purpose of versioning.

We can fix this by adding checks to deploy_model() that fail if a model is deployed with a previously-specified version.

Yeah this is a really good point.

I added checks to the management frontend's add_application and add_model methods to reject duplicate adds.

Corey-Zumar · 2017-04-10T02:01:31Z

src/management/src/management_frontend_tests.cpp

+                        container_name, model_path4));
+
+  ASSERT_FALSE(rh_.set_model_version("m", 11));
+  ASSERT_EQ(get_current_model_version(*redis_, "m"), -1);


Wouldn't we expect get_current_model_version() to return 4, corresponding to the last successful call to add_model() with model4? Is attempting to set an invalid model version supposed to change the model version to -1? If so, we should consistently maintain the previous version in the face of an update failure, if possible.

I was adding the model directly to Redis so set_model_version wasn't being called. I fixed this by calling the managament frontend's RequestHandler::add_model with JSON and it now should reflect the expected behavior.

Corey-Zumar · 2017-04-10T02:42:36Z

management/clipper_manager.py

@@ -555,6 +557,37 @@ def inspect_instance(self):
            s = r.text
        return s

+    def set_model_version(model_name, model_version, num_containers=0):


Missing self in parameter list

dcrankshaw · 2017-04-12T01:12:26Z

@Corey-Zumar I addressed your comments. I also added a check to prevent duplicate adding of applications or model versions. This fixes #112.

AmplabJenkins · 2017-04-12T01:19:10Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/136/
Test FAILed.

AmplabJenkins · 2017-04-12T01:39:10Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/137/
Test FAILed.

Corey-Zumar

Awesome, LGTM. Will merge after jenkins is able to build successfully.

dcrankshaw · 2017-04-12T15:32:38Z

@Corey-Zumar BTW the warning in the tutorial is because @nishadsingh1 said he accidentally ran cleanup too early the first time he ran the tutorial. The idea was to have a visual separation from the tutorial code and the cleanup. It's been there for awhile, I just moved the location around slightly.

AmplabJenkins · 2017-04-12T15:40:06Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/140/
Test PASSed.

dcrankshaw · 2017-04-12T15:57:53Z

@Corey-Zumar Hold off on merging this until the tutorial is updated. Should be sometime today.

dcrankshaw · 2017-04-12T16:24:10Z

@Corey-Zumar Tutorial is updated. This PR is good to go once Jenkins passes it.

AmplabJenkins · 2017-04-12T16:34:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/141/
Test PASSed.

dcrankshaw added 6 commits April 6, 2017 14:48

changed applications to take unversioned models

154659d

added set_model_version to clipper_manager.py

af044f1

don't try to start containers for externally managed models

d1cd1ee

code compiles

14d6b7e

unit tests pass

ae678c1

updated tutorial and removed selection policy section

e6fde27

dcrankshaw added the status: needs review label Apr 7, 2017

dcrankshaw self-assigned this Apr 7, 2017

dcrankshaw requested a review from Corey-Zumar April 7, 2017 03:19

dcrankshaw mentioned this pull request Apr 7, 2017

[CLIPPER-105] Refactor selection policy to support single-model applications with default output #89

Merged

Corey-Zumar requested changes Apr 10, 2017

View reviewed changes

Corey-Zumar removed the status: needs review label Apr 10, 2017

dcrankshaw added 3 commits April 11, 2017 17:02

addressed code review comments

154ae7c

added checks for duplicate add_model and add_application calls

60e82f7

test for successful model publish before deploying model data

d38d5be

dcrankshaw added the status: needs review label Apr 12, 2017

Fixed unittests

eb47529

Corey-Zumar approved these changes Apr 12, 2017

View reviewed changes

fixed double cleanup call in unittests.sh causing non-zero exit code

336dbf0

dcrankshaw mentioned this pull request Apr 12, 2017

[CLIPPER-107][CLIPPER-109] Return JSON-formatted responses to prediction queries #116

Merged

Updated tutorial to reflect tf model as new version of sklearn model

f3cd57b

Corey-Zumar merged commit 1648285 into ucbrise:develop Apr 12, 2017

dcrankshaw mentioned this pull request Apr 17, 2017

safe handling of duplicate register_application requests #112

Closed

dcrankshaw deleted the model_versioning branch May 17, 2017 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLIPPER-113] Implement model active version tracking and updates #115

[CLIPPER-113] Implement model active version tracking and updates #115

dcrankshaw commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

Corey-Zumar left a comment

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

dcrankshaw Apr 12, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 12, 2017

Corey-Zumar Apr 10, 2017

dcrankshaw Apr 11, 2017

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

Corey-Zumar left a comment

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

dcrankshaw commented Apr 12, 2017

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

[CLIPPER-113] Implement model active version tracking and updates #115

[CLIPPER-113] Implement model active version tracking and updates #115

Conversation

dcrankshaw commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

dcrankshaw commented Apr 7, 2017

AmplabJenkins commented Apr 7, 2017

Corey-Zumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

Corey-Zumar left a comment

Choose a reason for hiding this comment

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017

dcrankshaw commented Apr 12, 2017

dcrankshaw commented Apr 12, 2017

AmplabJenkins commented Apr 12, 2017