-
Notifications
You must be signed in to change notification settings - Fork 184
Performance #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, |
Hello @aciddelgado , thanks for your response.
`def generate_response(user_tokens, static_tokens, max_tokens=150, temperature=0.2): def get_static_tokens(static_prompt): user_tokens = list(tokenizer.encode(text))` the time between both the logs statement above and below the generator.append_tokens(all_tokens) is around 1-1.7sec. |
Hello @aciddelgado Thank you |
Hello @naveen-kurra onnxruntime-genai/examples/python/model-qa.py Lines 14 to 19 in e613206
The other two things that I can think of that may be influencing are the length of the list of tokens you are appending and the max length you are using; it would be useful to know those two things. |
Hello @aciddelgado , `config = og.Config(model_config_path) execution_provider = "cuda" if torch.cuda.is_available() else "cpu" def initialize_model(): global model, tokenizer, generator, params, done_listening and here a few test cases for Length of tokens(LoT). And the max_length is 10000. I'll look forward for your response and I really appreciate you time. Thank you. |
Uh oh!
There was an error while loading. Please reload this page.
Hello Community,
I'm using onnxruntime-genai for creating my application using phi3.5 onnx.
I have noticed that the genrator.append_tokens() function is taking around 1-1.5s to process. I wanna learn how to solve this issue as every milli second of latency is important for my application. I'm using python. Please let me know you need any further logs to solve this issue, i'm very happy to provide them
The text was updated successfully, but these errors were encountered: