Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety question about python grpcclient and server #2616

Closed
LightToYang opened this issue Mar 12, 2021 · 10 comments
Closed

Thread safety question about python grpcclient and server #2616

LightToYang opened this issue Mar 12, 2021 · 10 comments

Comments

@LightToYang
Copy link

LightToYang commented Mar 12, 2021

I ran grpcclient infer() method in multi-thread application (FastAPI), and sometimes the output results are same when inputting different images.
The mistake is alway occurred between adjacent inputs.

For examples:

0001.jpg ==> 0001 result

0002.jpg ==> 0002 result  (same) 

0003.jpg ==> 0003 result

0004.jpg ==> 0002 result  (same)

Since I read #1856 as it says python grpcclient infer() is thread safe, what's wrong with my application ?

@LightToYang
Copy link
Author

LightToYang commented Mar 12, 2021

I use nvcr.io/nvidia/tritonserver:20.10-py3, dose it contain the solution of #1427 ?

@LightToYang LightToYang changed the title the output results are sometimes same when inputting different images Thread safety question about python grpcclient and server Mar 12, 2021
@CoderHam
Copy link
Contributor

I use nvcr.io/nvidia/tritonserver:20.10-py3, dose it contain the solution of #1427 ?

Yes it does contain the fix from that PR. Can you share a minimal example of your client to repro the same?

@LightToYang
Copy link
Author

LightToYang commented Mar 15, 2021

@CoderHam
I find I actually use python client which is not related to cpp client fixed in #1427.
Here is a minimal example of my client to repro the same result.

Using the following code to get the face 512-d feature:

def get_embedding(img_path):
    with open(img_path, "rb") as f:
        img = f.read()
        img_bytes = np.frombuffer(img, dtype=np.uint8)[None, :]
        results = pure_feature_infer(img_bytes)

        embedding = results['embedding'][0]    
        norm_embedding = embedding / np.sqrt(np.dot(embedding, embedding))
        return norm_embedding

def pure_feature_infer(
    image, 
    max_length=64000, 
    model_name='Feature', 
    input_names=['DALI_INPUT'], 
    output_names=['embedding']
):
    image_post = image.copy()

    image_post = list(map(lambda img, ml=max_length: np.pad(img, (0, ml - img.shape[0])), image_post))
    image_post = np.stack(image_post)

    input_shape = [1, max_length]
    inputs = []
    for input_name in input_names:
        inputs.append(tritonclient.grpc.InferInput(input_name, input_shape, "UINT8"))
    inputs[0].set_data_from_numpy(image_post) 
    outputs = []
    for output_name in output_names:
        outputs.append(tritonclient.grpc.InferRequestedOutput(output_name))
    
    results = triton_client.infer(
        model_name=model_name, 
        inputs=inputs, 
        outputs=outputs
    )
    output_results = {}
    for output_name in output_names:
        output_results[output_name] = results.as_numpy(output_name)
    return output_results

@LightToYang
Copy link
Author

LightToYang commented Mar 15, 2021

Then using threadpool to simulate the high concurrency situation:

thread_pool = ThreadPoolExecutor(20)
all_task = []
embedding_list = []
for img_path in img_path_list:
     filepath, tmpfilename = os.path.split(img_path)
     shotname, extension = os.path.splitext(tmpfilename)
     # print(filepath, tmpfilename, shotname, extension)

     all_task.append(thread_pool.submit(get_embedding, (img_path)))

for future in as_completed(all_task):
     norm_embedding = future.result()
     embedding_list.append(norm_embedding)

@LightToYang
Copy link
Author

LightToYang commented Mar 15, 2021

Comparing each face feature with all the face feature

def check_all_data(embedding_array):
    def np_cosine(x,y):
        return np.inner(x,y)*0.5 + 0.5

    total_num = 0
    unmatch_num = 0
    
    for i, embedding in enumerate(embedding_array):
        sim = np_cosine(embedding, embedding_array)
        index = np.argmax(sim)
        total_num += 1
        if i != index:
            unmatch_num += 1
    print(f'{unmatch_num}/{total_num}')  

embedding_array = np.array(embedding_list, dtype=np.float32)
check_all_data(embedding_array)

However getting a lot of repetitive 512-d.

embedding_array: (11190, 512)
unmatch_num/total_num: 233/11190

I think it is related to somewhere thread unsafety of triton, beacause it's alright when it's running with single thread.

img_path_list = glob.glob(f'{dir_path}/*jpg')
for i, img_path in enumerate(img_path_list):
    norm_embedding = get_embedding(img_path)
    embedding_list.append(norm_embedding)
embedding_array: (11190, 512)
unmatch_num/total_num: 0/11190

@LightToYang
Copy link
Author

with ProcessPoolExecutor(max_workers=10) as executor:
        futures = []
        for img_path in img_path_list:
            job = executor.submit(get_embedding, img_path)
            futures.append(job)
        for job in as_completed(futures):
            try:
                norm_embedding = job.result()
                embedding_list.append(norm_embedding)
            except Exception as e:
                print(e)

I replace thread pool with process pool, and get the results like:

(11190, 512)
69/11190

Is that means the duplicated return values are resulted from server but not client? @tanmayv25
By the way, with the above process pool code , sometimes I get Segmentation fault (core dumped) error.

@LightToYang
Copy link
Author

This is my config.pbtxt, using DALI, TensorRT, ONNX backend as pre-process, network and post-process respectively.
I doubt whether something wrong with one of the above backend ?

name: "Feature"
platform: "ensemble"
max_batch_size: 0
input [
  {
    name: "DALI_INPUT"
    data_type: TYPE_UINT8
    dims: [1, -1]
  }
]
output [
  {
    name: "embedding",
    data_type: TYPE_FP32,
    dims: [1, 512],
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "Feature-Preprocess"
      model_version: 1
      input_map {
        key: "DALI_INPUT"
        value: "DALI_INPUT"
      }
      output_map {
        key: "DALI_OUTPUT"
        value: "DALI_OUTPUT"
      }
    },
    {
      model_name: "Feature-Net"
      model_version: 1
      input_map {
        key: "DALI_OUTPUT"
        value: "DALI_OUTPUT"
      }
      output_map {
        key: "fc1"
        value: "fc1"
      }
    },
    {
      model_name: "Feature-Post"
      model_version: 1
      input_map {
        key: "fc1"
        value: "fc1"
      }
      output_map {
        key: "embedding"
        value: "embedding"
      }
    }
  ]
}

@LightToYang
Copy link
Author

@banasraf
Copy link

Hello @LightToYang, you mentioned that sometimes you get Segmentation fault. Does it happen on the client side, or the server side? Also, could you try creating a separate triton client instance for each process/thread to make sure that the thread-safety of the grpc client isn't a problem here?

@deadeyegoodwin
Copy link
Contributor

Closing. Reopen with additional information if issue is not resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants