Thread safety question about python grpcclient and server #2616

LightToYang · 2021-03-12T09:23:43Z

I ran grpcclient infer() method in multi-thread application (FastAPI), and sometimes the output results are same when inputting different images.
The mistake is alway occurred between adjacent inputs.

For examples:

0001.jpg ==> 0001 result

0002.jpg ==> 0002 result  (same) 

0003.jpg ==> 0003 result

0004.jpg ==> 0002 result  (same)

Since I read #1856 as it says python grpcclient infer() is thread safe, what's wrong with my application ?

The text was updated successfully, but these errors were encountered:

LightToYang · 2021-03-12T09:28:28Z

I use nvcr.io/nvidia/tritonserver:20.10-py3, dose it contain the solution of #1427 ?

CoderHam · 2021-03-12T10:00:12Z

I use nvcr.io/nvidia/tritonserver:20.10-py3, dose it contain the solution of #1427 ?

Yes it does contain the fix from that PR. Can you share a minimal example of your client to repro the same?

LightToYang · 2021-03-15T09:13:16Z

@CoderHam
I find I actually use python client which is not related to cpp client fixed in #1427.
Here is a minimal example of my client to repro the same result.

Using the following code to get the face 512-d feature:

def get_embedding(img_path):
    with open(img_path, "rb") as f:
        img = f.read()
        img_bytes = np.frombuffer(img, dtype=np.uint8)[None, :]
        results = pure_feature_infer(img_bytes)

        embedding = results['embedding'][0]    
        norm_embedding = embedding / np.sqrt(np.dot(embedding, embedding))
        return norm_embedding

def pure_feature_infer(
    image, 
    max_length=64000, 
    model_name='Feature', 
    input_names=['DALI_INPUT'], 
    output_names=['embedding']
):
    image_post = image.copy()

    image_post = list(map(lambda img, ml=max_length: np.pad(img, (0, ml - img.shape[0])), image_post))
    image_post = np.stack(image_post)

    input_shape = [1, max_length]
    inputs = []
    for input_name in input_names:
        inputs.append(tritonclient.grpc.InferInput(input_name, input_shape, "UINT8"))
    inputs[0].set_data_from_numpy(image_post) 
    outputs = []
    for output_name in output_names:
        outputs.append(tritonclient.grpc.InferRequestedOutput(output_name))
    
    results = triton_client.infer(
        model_name=model_name, 
        inputs=inputs, 
        outputs=outputs
    )
    output_results = {}
    for output_name in output_names:
        output_results[output_name] = results.as_numpy(output_name)
    return output_results

LightToYang · 2021-03-15T09:13:57Z

Then using threadpool to simulate the high concurrency situation:

thread_pool = ThreadPoolExecutor(20)
all_task = []
embedding_list = []
for img_path in img_path_list:
     filepath, tmpfilename = os.path.split(img_path)
     shotname, extension = os.path.splitext(tmpfilename)
     # print(filepath, tmpfilename, shotname, extension)

     all_task.append(thread_pool.submit(get_embedding, (img_path)))

for future in as_completed(all_task):
     norm_embedding = future.result()
     embedding_list.append(norm_embedding)

LightToYang · 2021-03-15T09:27:45Z

Comparing each face feature with all the face feature

def check_all_data(embedding_array):
    def np_cosine(x,y):
        return np.inner(x,y)*0.5 + 0.5

    total_num = 0
    unmatch_num = 0
    
    for i, embedding in enumerate(embedding_array):
        sim = np_cosine(embedding, embedding_array)
        index = np.argmax(sim)
        total_num += 1
        if i != index:
            unmatch_num += 1
    print(f'{unmatch_num}/{total_num}')  

embedding_array = np.array(embedding_list, dtype=np.float32)
check_all_data(embedding_array)

However getting a lot of repetitive 512-d.

embedding_array: (11190, 512)
unmatch_num/total_num: 233/11190

I think it is related to somewhere thread unsafety of triton, beacause it's alright when it's running with single thread.

img_path_list = glob.glob(f'{dir_path}/*jpg')
for i, img_path in enumerate(img_path_list):
    norm_embedding = get_embedding(img_path)
    embedding_list.append(norm_embedding)

embedding_array: (11190, 512)
unmatch_num/total_num: 0/11190

LightToYang · 2021-03-16T09:30:12Z

with ProcessPoolExecutor(max_workers=10) as executor:
        futures = []
        for img_path in img_path_list:
            job = executor.submit(get_embedding, img_path)
            futures.append(job)
        for job in as_completed(futures):
            try:
                norm_embedding = job.result()
                embedding_list.append(norm_embedding)
            except Exception as e:
                print(e)

I replace thread pool with process pool, and get the results like:

(11190, 512)
69/11190

Is that means the duplicated return values are resulted from server but not client? @tanmayv25
By the way, with the above process pool code , sometimes I get Segmentation fault (core dumped) error.

LightToYang · 2021-03-16T09:52:56Z

This is my config.pbtxt, using DALI, TensorRT, ONNX backend as pre-process, network and post-process respectively.
I doubt whether something wrong with one of the above backend ?

name: "Feature"
platform: "ensemble"
max_batch_size: 0
input [
  {
    name: "DALI_INPUT"
    data_type: TYPE_UINT8
    dims: [1, -1]
  }
]
output [
  {
    name: "embedding",
    data_type: TYPE_FP32,
    dims: [1, 512],
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "Feature-Preprocess"
      model_version: 1
      input_map {
        key: "DALI_INPUT"
        value: "DALI_INPUT"
      }
      output_map {
        key: "DALI_OUTPUT"
        value: "DALI_OUTPUT"
      }
    },
    {
      model_name: "Feature-Net"
      model_version: 1
      input_map {
        key: "DALI_OUTPUT"
        value: "DALI_OUTPUT"
      }
      output_map {
        key: "fc1"
        value: "fc1"
      }
    },
    {
      model_name: "Feature-Post"
      model_version: 1
      input_map {
        key: "fc1"
        value: "fc1"
      }
      output_map {
        key: "embedding"
        value: "embedding"
      }
    }
  ]
}

LightToYang · 2021-03-17T09:40:35Z

triton-inference-server/dali_backend#39

banasraf · 2021-03-25T09:13:32Z

Hello @LightToYang, you mentioned that sometimes you get Segmentation fault. Does it happen on the client side, or the server side? Also, could you try creating a separate triton client instance for each process/thread to make sure that the thread-safety of the grpc client isn't a problem here?

deadeyegoodwin · 2021-03-31T16:38:14Z

Closing. Reopen with additional information if issue is not resolved.

LightToYang changed the title ~~the output results are sometimes same when inputting different images~~ Thread safety question about python grpcclient and server Mar 12, 2021

LightToYang mentioned this issue Mar 17, 2021

data overrided problem with dali backend triton-inference-server/dali_backend#39

Closed

deadeyegoodwin closed this as completed Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread safety question about python grpcclient and server #2616

Thread safety question about python grpcclient and server #2616

LightToYang commented Mar 12, 2021 •

edited

LightToYang commented Mar 12, 2021 •

edited

CoderHam commented Mar 12, 2021

LightToYang commented Mar 15, 2021 •

edited

LightToYang commented Mar 15, 2021 •

edited

LightToYang commented Mar 15, 2021 •

edited

LightToYang commented Mar 16, 2021

LightToYang commented Mar 16, 2021

LightToYang commented Mar 17, 2021

banasraf commented Mar 25, 2021

deadeyegoodwin commented Mar 31, 2021

Thread safety question about python grpcclient and server #2616

Thread safety question about python grpcclient and server #2616

Comments

LightToYang commented Mar 12, 2021 • edited

LightToYang commented Mar 12, 2021 • edited

CoderHam commented Mar 12, 2021

LightToYang commented Mar 15, 2021 • edited

LightToYang commented Mar 15, 2021 • edited

LightToYang commented Mar 15, 2021 • edited

LightToYang commented Mar 16, 2021

LightToYang commented Mar 16, 2021

LightToYang commented Mar 17, 2021

banasraf commented Mar 25, 2021

deadeyegoodwin commented Mar 31, 2021

LightToYang commented Mar 12, 2021 •

edited

LightToYang commented Mar 12, 2021 •

edited

LightToYang commented Mar 15, 2021 •

edited

LightToYang commented Mar 15, 2021 •

edited

LightToYang commented Mar 15, 2021 •

edited