Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton pytorch backend malloc coredump #4778

Closed
jackzhou121 opened this issue Aug 17, 2022 · 7 comments
Closed

triton pytorch backend malloc coredump #4778

jackzhou121 opened this issue Aug 17, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@jackzhou121
Copy link

Description
I use triton sdk to do torchscript model inference, i use two process with nvidia-mps, and i found sometimes one of two process failed with coredump, i use gdb to debug the problem, here is the bt info:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f1df8ef823b in malloc () from /usr/lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f1c817fe000 (LWP 126))]
(gdb) bt
#0 0x00007f1df8ef823b in malloc () from /usr/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f1df9266b39 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007f1df1a21c1c in void std::vector<torch::jit::Value*, std::allocatortorch::jit::Value* >::emplace_backtorch::jit::Value*(torch::jit::Value*&&) ()
from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#3 0x00007f1df1acffee in torch::jit::Node::addOutput() () from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#4 0x00007f1df1ad76c5 in torch::jit::Block::cloneFrom(torch::jit::Block*, std::function<torch::jit::Value* (torch::jit::Value*)>) () from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#5 0x00007f1df1ad7f84 in torch::jit::Graph::copy() () from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#6 0x00007f1df19a8724 in torch::jit::GraphFunction::get_executor() () from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#7 0x00007f1df19a579e in torch::jit::GraphFunction::run(std::vector<c10::IValue, std::allocatorc10::IValue >&) () from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#8 0x00007f1df19a5c5e in torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) ()
from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#9 0x00007f1df19b84bb in torch::jit::Method::operator()(std::vector<c10::IValue, std::allocatorc10::IValue >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, c10::IValue> > > const&) const ()
from /opt/tritonserver/backends/pytorch/libtorch_cpu.so
#10 0x00007f1d40126f5d in triton::backend::pytorch::ModelInstanceState::Execute(std::vector<TRITONBACKEND_Response*, std::allocator<TRITONBACKEND_Response*> >, unsigned int, std::vector<c10::IValue, std::allocatorc10::IValue >, std::vector<at::Tensor, std::allocatorat::Tensor >) () from /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
#11 0x00007f1d4012d255 in triton::backend::pytorch::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int) ()
from /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
#12 0x00007f1d4012eaa4 in TRITONBACKEND_ModelInstanceExecute () from /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
#13 0x00007f1df80b0faa in nvidia::inferenceserver::TritonModelInstance::Execute(std::vector<TRITONBACKEND_Request
, std::allocator<TRITONBACKEND_Request*> >&) ()
from /opt/tritonserver/lib/libtritonserver.so
#14 0x00007f1df80b1857 in nvidia::inferenceserver::TritonModelInstance::Schedule(std::vector<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_deletenvidia::inferenceserver::InferenceRequest >, std::allocator<std::unique_ptr<nvidia::inferenceserver::InferenceRequest, std::default_deletenvidia::inferenceserver::InferenceRequest > > >&&, std::function<void ()> const&) () from /opt/tritonserver/lib/libtritonserver.so
#15 0x00007f1df7f5ccc1 in nvidia::inferenceserver::Payload::Execute(bool*) () from /opt/tritonserver/lib/libtritonserver.so
#16 0x00007f1df80ab4f7 in nvidia::inferenceserver::TritonModelInstance::TritonBackendThread::BackendThread(int, int) () from /opt/tritonserver/lib/libtritonserver.so
#17 0x00007f1df9292de4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007f1df9d0c609 in start_thread (arg=) at pthread_create.c:477
#19 0x00007f1df8f7d163 in clone () from /usr/lib/x86_64-linux-gnu/libc.so.6

Triton Information
i use ngc triton container 21.11, tritonversion: 2.16, pytorch:1.11.0a0+b6df043
NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.5

Are you using the Triton container or did you build it yourself?
yes i use ngc triton container 21.11

To Reproduce
start nvidia mps and run two process

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
instance_group [
{
count: 8
kind: KIND_GPU
gpus:[0,1]
}
]

parameters: {
key: "DISABLE_OPTIMIZED_EXECUTION"
value: {
string_value:"true"
}
}

parameters: {
key: "ENABLE_NVFUSER"
value: {
string_value: "true"
}
}

parameters: {
key: "INFERENCE_MODE"
value: {
string_value: "true"
}
}

input [
{
name: "xwithouttone__0"
data_type: TYPE_INT64
format: FORMAT_NONE
dims: [-1]
},
{
name: "tone__1"
data_type: TYPE_INT64
format: FORMAT_NONE
dims: [-1]
},
{
name: "prosodyx__2"
data_type: TYPE_INT64
format: FORMAT_NONE
dims: [-1]
},
{
name: "emotionid__3"
data_type: TYPE_INT64
format: FORMAT_NONE
dims: [-1]
},
{
name: "emotionlevel__4"
data_type: TYPE_FP32
format: FORMAT_NONE
dims: [-1]
},
{
name: "alpha__5"
data_type: TYPE_FP32
format: FORMAT_NONE
dims: [-1]
}

]
output [
{
name: "output__0"
data_type: TYPE_FP16
dims: [1,-1,256]
}
]

default_model_filename: "20220804_novel_f25_fastspeech_tensorRT_a30.plan"

optimization {
input_pinned_memory {
enable: true
},
output_pinned_memory {
enable: true
}
}
Expected behavior
A clear and concise description of what you expected to happen.

@jackzhou121
Copy link
Author

platform: "pytorch_libtorch"

@dyastremsky
Copy link
Contributor

Thanks for providing the config and backtrace. Can you run this model in the latest container (22.07) and report the results?

@dyastremsky dyastremsky added the bug Something isn't working label Aug 17, 2022
@jackzhou121
Copy link
Author

the problem is gone when run model in the latest container(22.07)

@jackzhou121
Copy link
Author

Why triton container 21.11 sometimes crashed, if i start two processes and waramup models ?

@dyastremsky
Copy link
Contributor

I'd need the backtrace and run commands to start looking to see why. Possibly the model to see if I can reproduce the bug.

Is it happening in the latest container? It's possible there was a concurrency bug that's been fixed in the last 9 months/versions.

@jackzhou121
Copy link
Author

jackzhou121 commented Aug 29, 2022 via email

@dyastremsky
Copy link
Contributor

Great. Do you still need this looked into? The backtrace can help you locate where the error is happening to debug. If it's already been fixed, it may not make sense to look into (we only patch future releases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants