# Debug Functionary

[functionary](https://github.com/MeetKai/functionary)

[functionary-small-v2.4-GGUF](https://huggingface.co/meetkai/functionary-small-v2.4-GGUF)

重点是工具调用的函数定义，保证 LLM 将输出映射到函数说明，然后 LLM 获取参数值，构造具体的函数调用。

LLM 的能力要求是映射函数名（依赖函数定义的语义匹配），并获得对应的参数值（LLM的自身能力，依赖函数参数的语义理解）。
工程上工作重点是函数的说明以及参数的说明，有利于 LLM 进行语义匹配。

## 模型

In [18]:
from llama_cpp import Llama
from llama_cpp.llama_tokenizer import LlamaHFTokenizer
FUNCTION_CALLING_MODELS_DIR = "/opt/local/llm_models/GGUF/function-calling"
# ---------- functionary-small-v2.4-GGUF ----------
FUNCTIONARY_MODELS_DIR=f"{FUNCTION_CALLING_MODELS_DIR}/meetkai/functionary-small-v2.4-GGUF"
model_path=f"{FUNCTIONARY_MODELS_DIR}/functionary-small-v2.4.Q8_0.gguf"
# ---------- functionary-medium-v2.4-GGUF ----------
# FUNCTIONARY_MODELS_DIR=f"{FUNCTION_CALLING_MODELS_DIR}/meetkai/functionary-medium-v2.4-GGUF"
# model_path=f"{FUNCTIONARY_MODELS_DIR}/functionary-medium-v2.4.Q4_0.gguf"

llm = Llama(
    model_path=model_path,
    chat_format="functionary-v2",
    tokenizer=LlamaHFTokenizer.from_pretrained(FUNCTIONARY_MODELS_DIR),
    n_gpu_layers=-1
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /opt/local/llm_models/GGUF/function-calling/meetkai/functionary-small-v2.4-GGUF/functionary-small-v2.4.Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 32004
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                          llama.block_count u32 

## llama.cpp 推理

In [22]:

messages = [
    # {"role": "user", "content": "what's the weather like in Hanoi?"}
    # {"role": "user", "content": "今天北京天气如何？"}
    {"role": "user", "content": "福州武器装备情况如何？"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_weapon_stats",
            "description": "Get the weapon stats",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

result = llm.create_chat_completion(
      messages = messages,
      tools=tools,
      tool_choice="auto",
)

print(result["choices"][0]["message"])

Llama.generate: prefix-match hit

llama_print_timings:        load time =    9983.66 ms
llama_print_timings:      sample time =       0.84 ms /     8 runs   (    0.10 ms per token,  9535.16 tokens per second)
llama_print_timings: prompt eval time =     365.51 ms /   119 tokens (    3.07 ms per token,   325.57 tokens per second)
llama_print_timings:        eval time =     175.13 ms /     7 runs   (   25.02 ms per token,    39.97 tokens per second)
llama_print_timings:       total time =     566.01 ms /   126 tokens
from_string grammar:
space ::= space_1 
space_1 ::= [ ] | 
string ::= ["] string_5 ["] space 
string_3 ::= [^"\] | [\] string_4 
string_4 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
string_5 ::= string_3 string_5 | 
root ::= [{] space ["] [l] [o] [c] [a] [t] [i] [o] [n] ["] space [:] space string [}] space 

Llama.generate: prefix-match hit


space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space 
root ::= "{" space "\"location\"" space ":" space string "}" space



llama_print_timings:        load time =    9983.66 ms
llama_print_timings:      sample time =      34.00 ms /     9 runs   (    3.78 ms per token,   264.72 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =     224.84 ms /     9 runs   (   24.98 ms per token,    40.03 tokens per second)
llama_print_timings:       total time =     283.72 ms /    10 tokens
Llama.generate: prefix-match hit

llama_print_timings:        load time =    9983.66 ms
llama_print_timings:      sample time =       0.09 ms /     1 runs   (    0.09 ms per token, 11494.25 tokens per second)
llama_print_timings: prompt eval time =      69.53 ms /     7 tokens (    9.93 ms per token,   100.67 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =      72.26 ms /     8 

{'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'call_CY3RhcPcDGkbu7m44TzwAuiU', 'type': 'function', 'function': {'name': 'get_weapon_stats', 'arguments': '{"location": "福州"}'}}], 'function_call': {'name': 'get_weapon_stats', 'arguments': '{"location": "福州"}'}}


In [23]:
result

{'id': 'chatcmpl-e5223824-35de-42c0-a276-af1fab6bdc8f',
 'object': 'chat.completion',
 'created': 1713348541,
 'model': '/opt/local/llm_models/GGUF/function-calling/meetkai/functionary-small-v2.4-GGUF/functionary-small-v2.4.Q8_0.gguf',
 'choices': [{'index': 0,
   'logprobs': None,
   'message': {'role': 'assistant',
    'content': None,
    'tool_calls': [{'id': 'call_CY3RhcPcDGkbu7m44TzwAuiU',
      'type': 'function',
      'function': {'name': 'get_weapon_stats',
       'arguments': '{"location": "福州"}'}}],
    'function_call': {'name': 'get_weapon_stats',
     'arguments': '{"location": "福州"}'}},
   'finish_reason': 'tool_calls'}],
 'usage': {'prompt_tokens': 204, 'completion_tokens': 1, 'total_tokens': 205}}

## 模型微调

是否具备领域适配能力？

https://github.com/MeetKai/functionary/tree/main/functionary/train

### 

### Chat Template

From functionary-small-v2.4-GGUF/tokenizer_config.json

In [None]:
"{% for message in messages %}\n{% if message['role'] == 'user' or message['role'] == 'system' %}\n{{ '<|from|>' + message['role'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}{% elif message['role'] == 'tool' %}\n{{ '<|from|>' + message['name'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}{% else %}\n{% set contain_content='no'%}\n{% if message['content'] is not none %}\n{{ '<|from|>assistant\n<|recipient|>all\n<|content|>' + message['content'] }}{% set contain_content='yes'%}\n{% endif %}\n{% if 'tool_calls' in message and message['tool_calls'] is not none %}\n{% for tool_call in message['tool_calls'] %}\n{% set prompt='<|from|>assistant\n<|recipient|>' + tool_call['function']['name'] + '\n<|content|>' + tool_call['function']['arguments'] %}\n{% if loop.index == 1 and contain_content == \"no\" %}\n{{ prompt }}{% else %}\n{{ '\n' + prompt}}{% endif %}\n{% endfor %}\n{% endif %}\n{{ '<|stop|>\n' }}{% endif %}\n{% endfor %}\n{% if add_generation_prompt %}{{ '<|from|>assistant\n<|recipient|>' }}{% endif %}",

In [None]:
{% for message in messages %}
    {% if message['role'] == 'user' or message['role'] == 'system' %}
        {{ '<|from|>' + message['role'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
    {% elif message['role'] == 'tool' %}
        {{ '<|from|>' + message['name'] + '\n<|recipient|>all\n<|content|>' + message['content'] + '\n' }}
    {% else %}
        {% set contain_content='no'%}
        {% if message['content'] is not none %}
            {{ '<|from|>assistant\n<|recipient|>all\n<|content|>' + message['content'] }}
            {% set contain_content='yes'%}
        {% endif %}
        {% if 'tool_calls' in message and message['tool_calls'] is not none %}
            {% for tool_call in message['tool_calls'] %}
                {% set prompt='<|from|>assistant\n<|recipient|>' + tool_call['function']['name'] + '\n<|content|>' + tool_call['function']['arguments'] %}
                {% if loop.index == 1 and contain_content == \"no\" %}
                    {{ prompt }}
                {% else %}
                    {{ '\n' + prompt}}
                {% endif %}
            {% endfor %}
        {% endif %}
        {{ '<|stop|>\n' }}
    {% endif %}
{% endfor %}
{% if add_generation_prompt %}
    {{ '<|from|>assistant\n<|recipient|>' }}
{% endif %}