Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: VLLM suport for function calling in Mistral-7B-Instruct-v0.3 #5156

Closed
javierquin opened this issue May 31, 2024 · 2 comments
Closed

Comments

@javierquin
Copy link

🚀 The feature, motivation and pitch

Mistral 7B instruct 0.3 implements function calling, this is a powerfull tool and MistralAi is one of the first to implement it in a "small" model.

from mistral_inference.model import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

vllm has the best inference speed but i can not find a way to use function calling

Alternatives

No response

Additional context

No response

@hmellor
Copy link
Collaborator

hmellor commented May 31, 2024

Duplicate of #1869

@hmellor hmellor marked this as a duplicate of #1869 May 31, 2024
@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2024
@K-Mistele
Copy link
Contributor

@hmellor @javierquin Please see #5649, limited support for Mistral-7B-Instruct-v0.3 is WIP and ready for testing. A couple caveats still exist but these are being worked out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants