This code is a hermes tool call parser for llama that works on vllm. It adds a buffer function that prevents fragmentation with the "<", "tool", "_call", ">" tokens when training or prompting the llama model with the format defined in NousResearch/Hermes-Function-Calling without additional tokens. You can check the original hermes parser at the link below.
vllm serve meta-llama/Llama-3.2-3B-Instruct \
--enable-auto-tool-choice --tool-call-parser llama_hermes --tool-parser-plugin <<this_cloned_repo_path>>/lh_tool_parser.py \
--port 4000 --enable-lora --lora-modules tool=morsmordre/m-3b-v1-iteration-00-sf-xlam-10