Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw binary data order of multiple inputs batch request #1337

Closed
jfdsr opened this issue Apr 19, 2020 · 1 comment
Closed

Raw binary data order of multiple inputs batch request #1337

jfdsr opened this issue Apr 19, 2020 · 1 comment

Comments

@jfdsr
Copy link

jfdsr commented Apr 19, 2020

My ONNX model configuration file is:

name: "bert_mtdnn_onnx"
platform: "onnxruntime_onnx"
default_model_filename: "bert.onnx"
max_batch_size: 512
input {
    name: "input_ids"
    data_type: TYPE_INT64
    dims: 512
  }
input {
    name: "input_mask"
    data_type: TYPE_INT64
    dims: 512
  }
input {
    name: "segment_ids"
    data_type: TYPE_INT64
    dims: 512
  }
output {
    name: "explainer_2"
    data_type: TYPE_FP32
    dims: 2
  }
output {
    name: "hedging_0"
    data_type: TYPE_FP32
    dims: 2
  }
output {
    name: "memoryloss_3"
    data_type: TYPE_FP32
    dims: 2
  }
output {
    name: "sentiment_1"
    data_type: TYPE_FP32
    dims: 2
  }
instance_group [
   {
      kind: KIND_GPU
      count: 1
      gpus: 0  
   }
]

My question is when I send a Batched input request to the server in my case like below (code attached), is the order the server expects the raw binary bytes data to be in as follows: input_ids[0]+segment_ids[0]+input_mask[0]+input_ids[1]+segment_ids[1]+input_mask[1]+... (as by commented out code) or is it order of all input_ids then all segment_ids then all input_mask (as by comment "# or Order of batch inputs #2" in code below)?
I'm not able to match my expected output from my inputs currently is why, except for the first input in the batched input going in has the correct corresponding output, the rest of the batched inputs don't match the expected output though...

    batch_size = len(input_ids.cpu().numpy().tolist())
    input_ids = tensor_to_numpy_and_padd(input_ids)         # input token padded with 0's if necessary
    segment_ids = tensor_to_numpy_and_padd(segment_ids)  # this is always a 0 array
    input_mask = tensor_to_numpy_and_padd(input_mask)       # always 1 array padded with 0's

    # Order of batch inputs #1
    # x = b''
    # for i in range(len(input_ids)):
    #     x = x + input_ids[i] + segment_ids[i] + input_mask[i]
    # data = x

    # or Order of batch inputs #2?
    x = b''
    for i in range(len(input_ids)):
        x = x + input_ids[i]
    for i in range(len(segment_ids)):
        x = x + segment_ids[i]
    for i in range(len(input_mask)):
        x = x + input_mask[i]
    data = x

    inference_server_root = "http://{}:8000/api/infer/bert_mtdnn_onnx".format(inference_server_container_name)
    r = requests.post(
        url=inference_server_root,
        headers={
            'NV-InferRequest': 'batch_size: ' + str(batch_size) + ' input [{ name: "input_ids" }, { name: "segment_ids"}, { name: "input_mask"}] output [{ name: "hedging_0" cls { count: 2 } }, { name: "sentiment_1" cls { count: 2 } }, { name: "explainer_2" cls { count: 2 } }, { name: "memoryloss_3" cls { count: 2 } }]',
            'Content-Type': 'application/octet-stream'
        },
        data=data
    )


def tensor_to_numpy_and_padd(tens_variable):
    tens_variable = tens_variable.cpu().numpy().tolist()

    batch_list = []
    for i in tens_variable:
        out = [0] * max_length_bert  # padd to max_length_bert
        out[:len(i)] = i
        out = np.asarray(out, dtype=np.int64)  # dtype for all variables is INT64 ***
        out = out.tobytes()
        batch_list.append(out)

    return batch_list

Thanks.

@jfdsr jfdsr changed the title Raw binary data order of multiple inputs batch request + Response dimensions index order Raw binary data order of multiple inputs batch request Apr 19, 2020
@GuanLuo
Copy link
Contributor

GuanLuo commented Apr 20, 2020

The input tensor values are communicated in the body of the HTTP POST request as raw binary in the order as the inputs are listed in the request header. See detail.

Have you tried to use our provided HTTP client library and see if the results are correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants