Llama 2 7b chat model output quality is low

I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be echoing same answer for some questions asked).

Before using DJL container, I was using TGI container and the model was working absolutely fine.
I understand there could be difference in the way of inferencing for both these containers but is there a way of overriding the inference code.
Following is the sample prompt that I am using to prompt the model:
"[INST] <<SYS>>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<</SYS>>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"

The model is finetuned on the above mentioned prompt so we need to inference in such a way that it comprehends this format of the prompt and gives the answer.

Any resources/suggestions would be really helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama 2 7b chat model output quality is low #2093

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama 2 7b chat model output quality is low #2093

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions