Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy model causes docker container to exit #149

Closed
rmdort opened this issue May 10, 2017 · 6 comments
Closed

Deploy model causes docker container to exit #149

rmdort opened this issue May 10, 2017 · 6 comments
Assignees

Comments

@rmdort
Copy link
Contributor

rmdort commented May 10, 2017

To reproduce this

  1. Deploy any model with version no as 1
  2. Try the /predict api to do a simple call
  3. Deploy version 2 of the same model
  4. Try /predict

query_frontend container exits with code 139

clipper/query_frontend:latest        "/clipper/release/..."   38 minutes ago      Exited (139) 26 seconds ago

This is the container log before exiting

[15:17:44.212][info]     [CLIPPER] Adding new container - model: captcha_predict_model, version: 6, ID: 1, input_type: strings
[15:17:44.212][info]  [TASKEXE...] Created queue for new model: captcha_predict_model : 6
[15:17:44.224][info]       [REDIS] Successfully issued command "HMSET captcha_predict_model,6,0 model_id captcha_predict_model:6 model_name captcha_predict_model model_version 6 model_replica_id 0 zmq_connection_id 1 batch_size 1 input_type strings"
[15:17:46.600][info]         [RPC] Found message to receive
[15:17:49.217][info]         [RPC] Found message to receive
[15:17:51.609][info]         [RPC] Found message to receive
[15:17:52.517][info]       [REDIS] Successfully issued command "GET captcha_predict:0:0"
[15:17:52.517][info]  [QUERYPR...] Found 1 tasks
[15:17:52.525][info]         [RPC] Found message to receive

Is there any other log i can look at whats happening. I am using the NoopContainer

@rmdort rmdort changed the title Deploy model cases docker container to exit Deploy model causes docker container to exit May 10, 2017
@dcrankshaw
Copy link
Contributor

Confirming that I can reproduce this. I was able to reproduce the error while running under lldb. Attaching the stacktrace here for future reference.

Process 12228 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x240)
    frame #0: 0x00007fffde6b9400 libsystem_pthread.dylib`pthread_mutex_lock
libsystem_pthread.dylib`pthread_mutex_lock:
->  0x7fffde6b9400 <+0>:  cmpq   $0x4d55545a, (%rdi)       ; imm = 0x4D55545A
    0x7fffde6b9407 <+7>:  jne    0x7fffde6b9461            ; <+97>
    0x7fffde6b9409 <+9>:  leaq   0x1f(%rdi), %r8
    0x7fffde6b940d <+13>: andq   $-0x8, %r8
(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x240)
  * frame #0: 0x00007fffde6b9400 libsystem_pthread.dylib`pthread_mutex_lock
    frame #1: 0x00000001000ef3ad query_frontend`boost::mutex::lock() [inlined] boost::posix::pthread_mutex_lock(m=0x0000000000000240) at mutex.hpp:62
    frame #2: 0x00000001000ef3a4 query_frontend`boost::mutex::lock(this=0x0000000000000240) at mutex.hpp:116
    frame #3: 0x00000001000ef230 query_frontend`boost::unique_lock<boost::mutex>::lock(this=0x0000700006509a30) at lock_types.hpp:346
    frame #4: 0x00000001000ef147 query_frontend`boost::unique_lock<boost::mutex>::unique_lock(this=0x0000700006509a30, m_=0x0000000000000240) at lock_types.hpp:124
    frame #5: 0x00000001000eed3d query_frontend`boost::unique_lock<boost::mutex>::unique_lock(this=0x0000700006509a30, m_=0x0000000000000240) at lock_types.hpp:123
    frame #6: 0x0000000100166f07 query_frontend`boost::shared_mutex::lock(this=0x0000000000000238) at shared_mutex.hpp:289
    frame #7: 0x0000000100166e90 query_frontend`boost::unique_lock<boost::shared_mutex>::lock(this=0x0000700006509bd8) at lock_types.hpp:346
    frame #8: 0x0000000100166da7 query_frontend`boost::unique_lock<boost::shared_mutex>::unique_lock(this=0x0000700006509bd8, m_=0x0000000000000238) at lock_types.hpp:124
    frame #9: 0x0000000100166d2d query_frontend`boost::unique_lock<boost::shared_mutex>::unique_lock(this=0x0000700006509bd8, m_=0x0000000000000238) at lock_types.hpp:123
    frame #10: 0x000000010021f366 query_frontend`clipper::ModelContainer::update_throughput(this=0x0000000000000000, batch_size=1, total_latency_micros=1075) at containers.cpp:35
    frame #11: 0x0000000100157410 query_frontend`clipper::TaskExecutor::process_completed_message(this=0x00007fff5fbfd5f0, completed_msg=0x000070000650a0f8, deserialized_output="15.1846112726", current_time=0x000070000650a158, cur_model_metric=optional<clipper::ModelMetrics> @ 0x000070000650a3c8) at task_executor.hpp:496
    frame #12: 0x0000000100155a4d query_frontend`clipper::TaskExecutor::on_response_recv(this=0x00007fff5fbfd5f0, response=clipper::rpc::RPCResponse @ 0x000070000650a520) at task_executor.hpp:478
    frame #13: 0x0000000100155188 query_frontend`clipper::TaskExecutor::TaskExecutor(this=0x0000000100d04918, response=clipper::rpc::RPCResponse @ 0x000070000650a6c8)::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)::operator()(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >) const at task_executor.hpp:239
    frame #14: 0x0000000100154e9d query_frontend`_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7clipper12TaskExecutorC1EvEUlNS_4pairIKiNS_6vectorIhNS_9allocatorIhEEEEEEE_SB_EEEvDpOT_ [inlined] std::__1::__invoke<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >(__f=0x0000000100d04918, __args=0x000070000650a970)::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&>(fp)(std::__1::forward<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >(fp0))), clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&&&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&) at __functional_base:416
    frame #15: 0x0000000100154cc0 query_frontend`_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7clipper12TaskExecutorC1EvEUlNS_4pairIKiNS_6vectorIhNS_9allocatorIhEEEEEEE_SB_EEEvDpOT_(__args=0x0000000100d04918, __args=0x000070000650a970) at __functional_base:468
    frame #16: 0x0000000100154b59 query_frontend`std::__1::__function::__func<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >), std::__1::allocator<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>::operator(this=0x0000000100d04910, __arg=0x000070000650a970)(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&) at functional:1437
    frame #17: 0x00000001001fe821 query_frontend`std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>::operator(this=0x0000000100d04910, __arg=pair<const int, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > @ 0x000070000650a970)(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >) const at functional:1817
    frame #18: 0x00000001001fdf8d query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] decltype(__f=0x0000000100d04910, __args=0x0000000100d04940)>&>(fp)(std::__1::forward<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>(fp0))) std::__1::__invoke<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>(std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&&&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&&) at __functional_base:416
    frame #19: 0x00000001001fdf08 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] std::__1::__bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<>, __is_valid_bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<> >::value>::type std::__1::__apply_functor<std::__1::function<void (__f=0x0000000100d04910, __bound_args=0x0000000100d04940, __args=0x000070000650a910)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, 0ul, std::__1::tuple<> >(std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) at functional:2097
    frame #20: 0x00000001001fded1 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] std::__1::__bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<>, __is_valid_bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<> >::value>::type std::__1::__bind<std::__1::function<void (this=0x0000000100d04900)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>::operator()<>() at functional:2160
    frame #21: 0x00000001001fde96 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run(this=0x0000000100d047b0) at future.hpp:3179
    frame #22: 0x0000000100152cf5 query_frontend`boost::detail::task_base_shared_state<void ()>::run(this=0x0000000100d047b0) at future.hpp:2765
    frame #23: 0x0000000100152bdc query_frontend`boost::packaged_task<void ()>::operator(this=0x0000000100d04cc8)() at future.hpp:3551
    frame #24: 0x0000000100152ac9 query_frontend`clipper::ThreadPool::ThreadTask<boost::packaged_task<void ()> >::execute(this=0x0000000100d04cc0) at threadpool.hpp:151
    frame #25: 0x0000000100149518 query_frontend`clipper::ThreadPool::worker(this=0x00000001002f2740) at threadpool.hpp:226
    frame #26: 0x000000010014b064 query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(void*) [inlined] decltype(__f=0x0000000100b03b70, __a0=0x0000000100b03b80)).*fp(std::__1::forward<>(fp1))) std::__1::__invoke<void (clipper::ThreadPool::*)(), clipper::ThreadPool*, void>(void (clipper::ThreadPool::*&&)(), clipper::ThreadPool*&&) at __functional_base:383
    frame #27: 0x000000010014afe5 query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(void*) [inlined] void std::__1::__thread_execute<void (clipper::ThreadPool::*)(), clipper::ThreadPool*, 1ul>(__t=0x0000000100b03b70)(), clipper::ThreadPool*>&, std::__1::__tuple_indices<1ul>) at thread:347
    frame #28: 0x000000010014afbd query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(__vp=0x0000000100b03b70) at thread:357
    frame #29: 0x00007fffde6bb9af libsystem_pthread.dylib`_pthread_body + 180
    frame #30: 0x00007fffde6bb8fb libsystem_pthread.dylib`_pthread_start + 286
    frame #31: 0x00007fffde6bb101 libsystem_pthread.dylib`thread_start + 13

And the full log:

(lldb) target create "debug/src/frontends/query_frontend"
Current executable set to 'debug/src/frontends/query_frontend' (x86_64).
(lldb) run
Process 12228 launched: '/Users/crankshaw/code/amplab/model-serving/clipper-contributor-sandbox/clipper/debug/src/frontends/query_frontend' (x86_64)
[13:19:00.381][info]       [REDIS] Successfully issued command "SELECT 1"
[13:19:00.381][info]     [STATEDB] Persistent state DB created
[13:19:00.381][info]  [TASKEXE...] TaskExecutor started
[13:19:00.381][info]         [RPC] RPC thread started at address:
[13:19:00.382][info]       [REDIS] Successfully issued command "CONFIG SET notify-keyspace-events AKE"
[13:19:00.382][info]       [REDIS] SUBSCRIPTION STRING: __keyspace@3__:*
[13:19:00.383][info]      [TIMERS] Starting timer thread
[13:19:00.383][info]      [TIMERS] Timer thread started
[13:19:00.383][info]  [QUERYPR...] Query Processor started
[13:19:00.383][info]      [TIMERS] In timer event loop
[13:19:00.384][info]       [REDIS] SUBSCRIPTION STRING: __keyspace@5__:*
[13:19:00.384][info]       [REDIS] SUBSCRIPTION STRING: __keyspace@6__:CURRENT_MODEL_VERSION:*
[13:19:00.384][info]       [REDIS] Successfully issued command "SELECT 5"
[13:19:00.384][info]       [REDIS] Successfully issued command "KEYS *"
[13:19:00.384][info]       [REDIS] Successfully issued command "SELECT 5"
[13:19:00.384][info]       [REDIS] Successfully issued command "HGETALL dbg_app"
[13:19:00.384][info]       [REDIS] 	 candidate_model_names: dbg_model
[13:19:00.384][info]       [REDIS] 	 input_type: doubles
[13:19:00.384][info]       [REDIS] 	 policy: DefaultOutputSelectionPolicy
[13:19:00.384][info]       [REDIS] 	 default_output: -4.7
[13:19:00.384][info]       [REDIS] 	 latency_slo_micros: 40000
[13:19:00.385][info]       [REDIS] Successfully issued command "SET dbg_app:0:0 {"y_hat":"-4.7"}"
[13:19:00.385][info]       [REDIS] Successfully issued command "SELECT 5"
[13:19:00.385][info]       [REDIS] Successfully issued command "HGETALL test_app"
[13:19:00.385][info]       [REDIS] 	 candidate_model_names: test_model
[13:19:00.385][info]       [REDIS] 	 input_type: strings
[13:19:00.385][info]       [REDIS] 	 policy: DefaultOutputSelectionPolicy
[13:19:00.385][info]       [REDIS] 	 default_output: -4.7
[13:19:00.385][info]       [REDIS] 	 latency_slo_micros: 40000
[13:19:00.385][info]       [REDIS] Successfully issued command "SET test_app:0:0 {"y_hat":"-4.7"}"
[13:19:00.385][info]  [QUERYFR...] Found 2 existing applications registered in Clipper: dbg_app,test_app.
[13:19:00.385][info]       [REDIS] Successfully issued command "SELECT 2"
[13:19:00.385][info]       [REDIS] Successfully issued command "KEYS *"
[13:19:00.386][info]       [REDIS] Successfully issued command "SELECT 6"
[13:19:00.386][info]       [REDIS] Successfully issued command "GET CURRENT_MODEL_VERSION:dbg_model"
[13:19:00.386][info]       [REDIS] Successfully issued command "SELECT 6"
[13:19:00.386][info]       [REDIS] Successfully issued command "GET CURRENT_MODEL_VERSION:test_model"
[13:19:00.386][info]  [QUERYFR...] Found 2 models deployed to Clipper: dbg_model@v2,test_model@v1.
[13:19:00.414][info]         [RPC] Found message to receive
[13:19:00.414][info]         [RPC] Found message to receive
[13:19:00.414][info]         [RPC] New container connected
[13:19:00.414][info]         [RPC] Container added
[13:19:00.415][info]       [REDIS] Successfully issued command "SELECT 3"
[13:19:00.415][info]       [REDIS] MESSAGE: hset
[13:19:00.415][info]       [REDIS] Successfully issued command "SELECT 3"
[13:19:00.415][info]       [REDIS] Successfully issued command "HMSET dbg_model,2,0 model_id dbg_model:2 model_name dbg_model model_version 2 model_replica_id 0 zmq_connection_id 0 batch_size 1 input_type doubles"
[13:19:00.415][info]       [REDIS] Successfully issued command "HGETALL dbg_model,2,0"
[13:19:00.415][info]       [REDIS] 	 model_id: dbg_model:2
[13:19:00.415][info]       [REDIS] 	 model_name: dbg_model
[13:19:00.415][info]       [REDIS] 	 model_version: 2
[13:19:00.415][info]       [REDIS] 	 model_replica_id: 0
[13:19:00.415][info]       [REDIS] 	 zmq_connection_id: 0
[13:19:00.415][info]       [REDIS] 	 batch_size: 1
[13:19:00.415][info]       [REDIS] 	 input_type: doubles
[13:19:00.415][info]     [CLIPPER] Adding new container - model: dbg_model, version: 2, ID: 0, input_type: doubles
[13:19:00.417][info]  [TASKEXE...] Created queue for new model: dbg_model : 2
[13:19:00.441][info]         [RPC] Found message to receive
[13:19:00.442][info]         [RPC] Found message to receive
[13:19:00.443][info]         [RPC] Found message to receive
[13:19:00.443][info]         [RPC] Found message to receive
[13:19:00.443][info]         [RPC] Found message to receive
[13:19:05.418][info]         [RPC] Found message to receive
[13:19:06.804][info]       [REDIS] Successfully issued command "GET dbg_app:0:0"
[13:19:06.804][info]  [QUERYPR...] Found 1 tasks
[13:19:06.806][info]         [RPC] Found message to receive
[13:19:09.322][info]       [REDIS] Successfully issued command "GET dbg_app:0:0"
[13:19:09.322][info]  [QUERYPR...] Found 1 tasks
[13:19:09.324][info]         [RPC] Found message to receive
[13:19:10.112][info]       [REDIS] Successfully issued command "GET dbg_app:0:0"
[13:19:10.112][info]  [QUERYPR...] Found 1 tasks
[13:19:10.114][info]         [RPC] Found message to receive
[13:19:10.789][info]       [REDIS] Successfully issued command "GET dbg_app:0:0"
[13:19:10.789][info]  [QUERYPR...] Found 1 tasks
[13:19:10.794][info]         [RPC] Found message to receive
[13:19:24.669][info]       [REDIS] MESSAGE: set
[13:19:24.669][info]  [QUERYFR...] MODEL VERSION CHANGE DETECTED. Key: dbg_model, event_type: set
[13:19:24.669][info]       [REDIS] Successfully issued command "SELECT 6"
[13:19:24.669][info]       [REDIS] Successfully issued command "GET CURRENT_MODEL_VERSION:dbg_model"
[13:19:24.668][info]   [MGMTFRNTD] Add model POST request
[13:19:24.668][info]       [REDIS] Successfully issued command "SELECT 2"
[13:19:24.668][info]       [REDIS] Successfully issued command "HGETALL dbg_model:3"
[13:19:24.668][info]       [REDIS] Successfully issued command "SELECT 2"
[13:19:24.668][info]       [REDIS] Successfully issued command "HMSET dbg_model:3 model_name dbg_model model_version 3 load 0.000000 input_type doubles labels x container_name EXTERNAL model_data_path EXTERNAL"
[13:19:24.668][info]       [REDIS] Successfully issued command "SELECT 2"
[13:19:24.668][info]       [REDIS] Successfully issued command "KEYS dbg_model:*"
[13:19:24.669][info]       [REDIS] Successfully issued command "SELECT 6"
[13:19:24.669][info]       [REDIS] Successfully issued command "SET CURRENT_MODEL_VERSION:dbg_model 3"
[13:19:31.829][info]         [RPC] Found message to receive
[13:19:31.830][info]         [RPC] Found message to receive
[13:19:31.830][info]         [RPC] New container connected
[13:19:31.830][info]         [RPC] Container added
[13:19:31.830][info]       [REDIS] Successfully issued command "SELECT 3"
[13:19:31.830][info]       [REDIS] MESSAGE: hset
[13:19:31.830][info]       [REDIS] Successfully issued command "SELECT 3"
[13:19:31.831][info]       [REDIS] Successfully issued command "HGETALL dbg_model,3,0"
[13:19:31.831][info]       [REDIS] 	 model_id: dbg_model:3
[13:19:31.831][info]       [REDIS] 	 model_name: dbg_model
[13:19:31.831][info]       [REDIS] 	 model_version: 3
[13:19:31.831][info]       [REDIS] 	 model_replica_id: 0
[13:19:31.831][info]       [REDIS] 	 zmq_connection_id: 1
[13:19:31.831][info]       [REDIS] 	 batch_size: 1
[13:19:31.831][info]       [REDIS] 	 input_type: doubles
[13:19:31.831][info]     [CLIPPER] Adding new container - model: dbg_model, version: 3, ID: 1, input_type: doubles
[13:19:31.831][info]  [TASKEXE...] Created queue for new model: dbg_model : 3
[13:19:31.831][info]       [REDIS] Successfully issued command "HMSET dbg_model,3,0 model_id dbg_model:3 model_name dbg_model model_version 3 model_replica_id 0 zmq_connection_id 1 batch_size 1 input_type doubles"
[13:19:36.832][info]         [RPC] Found message to receive
Process 12228 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x240)
    frame #0: 0x00007fffde6b9400 libsystem_pthread.dylib`pthread_mutex_lock
libsystem_pthread.dylib`pthread_mutex_lock:
->  0x7fffde6b9400 <+0>:  cmpq   $0x4d55545a, (%rdi)       ; imm = 0x4D55545A
    0x7fffde6b9407 <+7>:  jne    0x7fffde6b9461            ; <+97>
    0x7fffde6b9409 <+9>:  leaq   0x1f(%rdi), %r8
    0x7fffde6b940d <+13>: andq   $-0x8, %r8
(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x240)
  * frame #0: 0x00007fffde6b9400 libsystem_pthread.dylib`pthread_mutex_lock
    frame #1: 0x00000001000ef3ad query_frontend`boost::mutex::lock() [inlined] boost::posix::pthread_mutex_lock(m=0x0000000000000240) at mutex.hpp:62
    frame #2: 0x00000001000ef3a4 query_frontend`boost::mutex::lock(this=0x0000000000000240) at mutex.hpp:116
    frame #3: 0x00000001000ef230 query_frontend`boost::unique_lock<boost::mutex>::lock(this=0x0000700006509a30) at lock_types.hpp:346
    frame #4: 0x00000001000ef147 query_frontend`boost::unique_lock<boost::mutex>::unique_lock(this=0x0000700006509a30, m_=0x0000000000000240) at lock_types.hpp:124
    frame #5: 0x00000001000eed3d query_frontend`boost::unique_lock<boost::mutex>::unique_lock(this=0x0000700006509a30, m_=0x0000000000000240) at lock_types.hpp:123
    frame #6: 0x0000000100166f07 query_frontend`boost::shared_mutex::lock(this=0x0000000000000238) at shared_mutex.hpp:289
    frame #7: 0x0000000100166e90 query_frontend`boost::unique_lock<boost::shared_mutex>::lock(this=0x0000700006509bd8) at lock_types.hpp:346
    frame #8: 0x0000000100166da7 query_frontend`boost::unique_lock<boost::shared_mutex>::unique_lock(this=0x0000700006509bd8, m_=0x0000000000000238) at lock_types.hpp:124
    frame #9: 0x0000000100166d2d query_frontend`boost::unique_lock<boost::shared_mutex>::unique_lock(this=0x0000700006509bd8, m_=0x0000000000000238) at lock_types.hpp:123
    frame #10: 0x000000010021f366 query_frontend`clipper::ModelContainer::update_throughput(this=0x0000000000000000, batch_size=1, total_latency_micros=1075) at containers.cpp:35
    frame #11: 0x0000000100157410 query_frontend`clipper::TaskExecutor::process_completed_message(this=0x00007fff5fbfd5f0, completed_msg=0x000070000650a0f8, deserialized_output="15.1846112726", current_time=0x000070000650a158, cur_model_metric=optional<clipper::ModelMetrics> @ 0x000070000650a3c8) at task_executor.hpp:496
    frame #12: 0x0000000100155a4d query_frontend`clipper::TaskExecutor::on_response_recv(this=0x00007fff5fbfd5f0, response=clipper::rpc::RPCResponse @ 0x000070000650a520) at task_executor.hpp:478
    frame #13: 0x0000000100155188 query_frontend`clipper::TaskExecutor::TaskExecutor(this=0x0000000100d04918, response=clipper::rpc::RPCResponse @ 0x000070000650a6c8)::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)::operator()(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >) const at task_executor.hpp:239
    frame #14: 0x0000000100154e9d query_frontend`_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7clipper12TaskExecutorC1EvEUlNS_4pairIKiNS_6vectorIhNS_9allocatorIhEEEEEEE_SB_EEEvDpOT_ [inlined] std::__1::__invoke<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >(__f=0x0000000100d04918, __args=0x000070000650a970)::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&>(fp)(std::__1::forward<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >(fp0))), clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)&&&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&) at __functional_base:416
    frame #15: 0x0000000100154cc0 query_frontend`_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7clipper12TaskExecutorC1EvEUlNS_4pairIKiNS_6vectorIhNS_9allocatorIhEEEEEEE_SB_EEEvDpOT_(__args=0x0000000100d04918, __args=0x000070000650a970) at __functional_base:468
    frame #16: 0x0000000100154b59 query_frontend`std::__1::__function::__func<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >), std::__1::allocator<clipper::TaskExecutor::TaskExecutor()::'lambda'(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>::operator(this=0x0000000100d04910, __arg=0x000070000650a970)(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&) at functional:1437
    frame #17: 0x00000001001fe821 query_frontend`std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>::operator(this=0x0000000100d04910, __arg=pair<const int, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > @ 0x000070000650a970)(std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >) const at functional:1817
    frame #18: 0x00000001001fdf8d query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] decltype(__f=0x0000000100d04910, __args=0x0000000100d04940)>&>(fp)(std::__1::forward<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>(fp0))) std::__1::__invoke<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>(std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&&&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&&&) at __functional_base:416
    frame #19: 0x00000001001fdf08 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] std::__1::__bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<>, __is_valid_bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<> >::value>::type std::__1::__apply_functor<std::__1::function<void (__f=0x0000000100d04910, __bound_args=0x0000000100d04940, __args=0x000070000650a910)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, 0ul, std::__1::tuple<> >(std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) at functional:2097
    frame #20: 0x00000001001fded1 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run() [inlined] std::__1::__bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<>, __is_valid_bind_return<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>, std::__1::tuple<std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > > >, std::__1::tuple<> >::value>::type std::__1::__bind<std::__1::function<void (this=0x0000000100d04900)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>::operator()<>() at functional:2160
    frame #21: 0x00000001001fde96 query_frontend`boost::detail::task_shared_state<std::__1::__bind<std::__1::function<void (std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >)>&, std::__1::pair<int const, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > >&>, void ()>::do_run(this=0x0000000100d047b0) at future.hpp:3179
    frame #22: 0x0000000100152cf5 query_frontend`boost::detail::task_base_shared_state<void ()>::run(this=0x0000000100d047b0) at future.hpp:2765
    frame #23: 0x0000000100152bdc query_frontend`boost::packaged_task<void ()>::operator(this=0x0000000100d04cc8)() at future.hpp:3551
    frame #24: 0x0000000100152ac9 query_frontend`clipper::ThreadPool::ThreadTask<boost::packaged_task<void ()> >::execute(this=0x0000000100d04cc0) at threadpool.hpp:151
    frame #25: 0x0000000100149518 query_frontend`clipper::ThreadPool::worker(this=0x00000001002f2740) at threadpool.hpp:226
    frame #26: 0x000000010014b064 query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(void*) [inlined] decltype(__f=0x0000000100b03b70, __a0=0x0000000100b03b80)).*fp(std::__1::forward<>(fp1))) std::__1::__invoke<void (clipper::ThreadPool::*)(), clipper::ThreadPool*, void>(void (clipper::ThreadPool::*&&)(), clipper::ThreadPool*&&) at __functional_base:383
    frame #27: 0x000000010014afe5 query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(void*) [inlined] void std::__1::__thread_execute<void (clipper::ThreadPool::*)(), clipper::ThreadPool*, 1ul>(__t=0x0000000100b03b70)(), clipper::ThreadPool*>&, std::__1::__tuple_indices<1ul>) at thread:347
    frame #28: 0x000000010014afbd query_frontend`void* std::__1::__thread_proxy<std::__1::tuple<void (clipper::ThreadPool::*)(), clipper::ThreadPool*> >(__vp=0x0000000100b03b70) at thread:357
    frame #29: 0x00007fffde6bb9af libsystem_pthread.dylib`_pthread_body + 180
    frame #30: 0x00007fffde6bb8fb libsystem_pthread.dylib`_pthread_start + 286
    frame #31: 0x00007fffde6bb101 libsystem_pthread.dylib`thread_start + 13

@dcrankshaw dcrankshaw self-assigned this May 11, 2017
@rmdort
Copy link
Contributor Author

rmdort commented May 12, 2017

  • Containers also die even when a model for a new project is deployed.
  • And also when /predict is called on the first project

We have also noticed when is a model is being deployed (ie when a docker container is pulled), the output of /predict is default. Shouldnt it get the right predictions from the old container?

Actually clipper currently cant handle multiple projects. /predict on the 2nd project now returns default. And when i call it on the 1st project, the container dies. Very very strange.

@dcrankshaw
Copy link
Contributor

I've found the issue. I'll be submitting a fix in a few minutes.

@dcrankshaw
Copy link
Contributor

Let me pull that statement back a little bit. I've found the problem that causes the query_frontend container to die when a new model is deployed. It may be related to #148 as well (where Clipper can't handle to applications), but I have yet to verify that. I will look into that next.

As far as the problem where Clipper returns default when a new model is deployed but the container hasn't connect yet, this is a separate issue. It's not exactly a bug, just less than desirable behavior. I'm going to file a separate issue that explains the problem.

@dcrankshaw
Copy link
Contributor

Filed #152 to track the default predictions issue

@rmdort
Copy link
Contributor Author

rmdort commented May 12, 2017

Thank you for the clarifications. Clearer now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants