Skip to content

Llama 2 7b chat model output quality is low #2093

@ghost

Description

I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be echoing same answer for some questions asked).

Before using DJL container, I was using TGI container and the model was working absolutely fine.
I understand there could be difference in the way of inferencing for both these containers but is there a way of overriding the inference code.
Following is the sample prompt that I am using to prompt the model:
"[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"

The model is finetuned on the above mentioned prompt so we need to inference in such a way that it comprehends this format of the prompt and gives the answer.

Any resources/suggestions would be really helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions