Releases: deepset-ai/haystack
v2.15.0
⭐️ Highlights
Parallel Tool Calling for Faster Agents
ToolInvoker
now processes all tool calls passed torun
orrun_async
in parallel using an internalThreadPoolExecutor
. This improves performance by reducing the time spent on sequential tool invocations.- This parallel execution capability enables
ToolInvoker
to batch and process multiple tool calls concurrently, allowing Agents to run complex pipelines efficiently with decreased latency. - You no longer need to pass an
async_executor
.ToolInvoker
manages its own executor, configurable via themax_workers
parameter ininit
.
Introducing LLMMessagesRouter
The new LLMMessagesRouter component that classifies and routes incoming ChatMessage
objects to different connections using a generative LLM. This component can be used with general-purpose LLMs and with specialized LLMs for moderation like Llama Guard.
Usage example:
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.components.routers.llm_messages_router import LLMMessagesRouter
from haystack.dataclasses import ChatMessage
chat_generator = HuggingFaceAPIChatGenerator(api_type="serverless_inference_api", api_params={"model": "meta-llama/Llama-Guard-4-12B", "provider": "groq"}, )
router = LLMMessagesRouter(chat_generator=chat_generator, output_names=["unsafe", "safe"], output_patterns=["unsafe", "safe"])
print(router.run([ChatMessage.from_user("How to rob a bank?")]))
New HuggingFaceTEIRanker Component
HuggingFaceTEIRanker enables end-to-end reranking via the Text Embeddings Inference (TEI) API. It supports both self-hosted TEI services and Hugging Face Inference Endpoints, giving you flexible, high-quality reranking out of the box.
🚀 New Features
-
Added a
ComponentInfo
dataclass to haystack to store information about the component. We pass it toStreamingChunk
so we can tell from which component a stream is coming. -
Pass the
component_info
to theStreamingChunk
in theOpenAIChatGenerator
,AzureOpenAIChatGenerator
,HuggingFaceAPIChatGenerator
,HuggingFaceGenerator
,HugginFaceLocalGenerator
andHuggingFaceLocalChatGenerator
. -
Added the
enable_streaming_callback_passthrough
to theinit
,run
andrun_async
methods ofToolInvoker
. If set toTrue
theToolInvoker
will try and pass thestreaming_callback
function to a tool's invoke method only if the tool's invoke method hasstreaming_callback
in its signature. -
Added dedicated
finish_reason
field toStreamingChunk
class to improve type safety and enable sophisticated streaming UI logic. The field uses aFinishReason
type alias with standard values: "stop", "length", "tool_calls", "content_filter", plus Haystack-specific value "tool_call_results" (used by ToolInvoker to indicate tool execution completion). -
Updated
ToolInvoker
component to use the newfinish_reason
field when streaming tool results. The component now setsfinish_reason="tool_call_results"
in the final streaming chunk to indicate that tool execution has completed, while maintaining backward compatibility by also setting the value inmeta["finish_reason"]
. -
Added a
raise_on_failure
boolean parameter toOpenAIDocumentEmbedder
andAzureOpenAIDocumentEmbedder
. If set toTrue
then the component will raise an exception when there is an error with the API request. It is set toFalse
by default so the previous behavior of logging an exception and continuing is still the default. -
Add
AsyncHFTokenStreamingHandler
for async streaming support inHuggingFaceLocalChatGenerator
-
For
HuggingFaceAPIGenerator
andHuggingFaceAPIChatGenerator
all additional key, value pairs passed inapi_params
are now passed to the initializations of the underlying Inference Clients. This allows passing of additional parameters to the clients liketimeout
,headers
,provider
, etc. This means we now can easily specify a different inference provider by passing theprovider
key inapi_params
. -
Updated StreamingChunk to add the fields
tool_calls
,tool_call_result
,index
, andstart
to make it easier to format the stream in a streaming callback.- Added new dataclass
ToolCallDelta
for theStreamingChunk.tool_calls
field to reflect that the arguments can be a string delta. - Updated
print_streaming_chunk
and_convert_streaming_chunks_to_chat_message
utility methods to use these new fields. This especially improves the formatting when usingprint_streaming_chunk
with Agent. - Updated
OpenAIGenerator
,OpenAIChatGenerator
,HuggingFaceAPIGenerator
,HuggingFaceAPIChatGenerator
,HuggingFaceLocalGenerator
andHuggingFaceLocalChatGenerator
to follow the new dataclasses. - Updated
ToolInvoker
to follow the StreamingChunk dataclass.
- Added new dataclass
⚡️ Enhancement Notes
-
Added a new
deserialize_component_inplace
function to handle generic component deserialization that works with any component type. -
Made doc-parser a core dependency since
ComponentTool
that uses it is one of the coreTool
components. -
Make the
PipelineBase().validate_input
method public so users can use it with the confidence that it won't receive breaking changes without warning. This method is useful for checking that all required connections in a pipeline have a connection and is automatically called in the run method of Pipeline. It is being exposed as public for users who would like to call this method before runtime to validate the pipeline. -
For component run Datadog tracing, set the span resource name to the component name instead of the operation name.
-
Added a
trust_remote_code
parameter to theSentenceTransformersSimilarityRanker
component. When set to True, this enables execution of custom models and scripts hosted on the Hugging Face Hub. -
Add a new parameter
require_tool_call_ids
toChatMessage.to_openai_dict_format
. The default isTrue
, for compatibility with OpenAI's Chat API: if theid
field is missing in a Tool Call, an error is raised. UsingFalse
is useful for shallow OpenAI-compatible APIs, where theid
field is not required. -
Haystack's core modules are now "type complete", meaning that all function parameters and return types are explicitly annotated. This increases the usefulness of the newly added
py.typed
marker and sidesteps differences in type inference between the various type checker implementations. -
HuggingFaceAPIChatGenerator
now uses the util method_convert_streaming_chunks_to_chat_message
. This is to help with being consistent for how we convertStreamingChunks
into a finalChatMessage
.- If only system messages are provided as input a warning will be logged to the user indicating that this likely not intended and that they should probably also provide user messages.
⚠️ Deprecation Notes
async_executor
parameter inToolInvoker
is deprecated in favor ofmax_workers
parameter and will be removed in Haystack 2.16.0. You can usemax_workers
parameter to control the number of threads used for parallel tool calling.
🐛 Bug Fixes
- Fixed the
to_dict
andfrom_dict
ofToolInvoker
to properly serialize thestreaming_callback
init parameter. - Fix bug where if
raise_on_failure=False
and an error occurs mid-batch that the following embeddings would be paired with the wrong documents. - Fix component_invoker used by
ComponentTool
to work when a dataclass likeChatMessage
is directly passed tocomponent_tool.invoke(...)
. Previously this would either cause an error or silently skip your input. - Fixed a bug in the
LLMMetadataExtractor
that occurred when processingDocument
objects withNone
or empty string content. The component now gracefully handles these cases by marking such documents as failed and providing an appropriate error message in their metadata, without attempting an LLM call. - RecursiveDocumentSplitter now generates a unique
Document.id
for every chunk. The meta fields (split_id
,parent_id
, etc.) are populated beforeDocument
creation, so the hash used forid
generation is always unique. - In
ConditionalRouter
fixed theto_dict
andfrom_dict
methods to properly handle the case whenoutput_type
is aList
of types or aList
of strings. This occurs when a user specifies a route inConditionalRouter
to have multiple outputs. - Fix serialization of
GeneratedAnswer
whenChatMessage
objects are nested inmeta
. - Fix the serialization of
ComponentTool
andTool
when specifyingoutputs_to_string
. Previously an error occurred on deserialization right after serializing if outputs_to_string is not None. - When calling
set_output_types
we now also check that the decorator@component.output_types
is not present on therun_async
method of aComponent
. Previously we only checked that the Component.run method did not possess the decorator. - Fix type comparison in schema validation by replacing
is not
with!=
when checking the typeList[ChatMessage]
. This prevents false mismatches due to Python'sis
operator comparing object identity instead of equality. - Re-export symbols in
__init__.py
files. This ensures that short imports likefrom haystack.components.builders import ChatPromptBuilder
work equivalently tofrom haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
, without causing errors or warnings in mypy/Pylance. - The
SuperComponent
class can now correctly serialize and deserialize aSuperComponent
based on an async pipeline. Previously, theSuperComponent
class always assumed the und...
v2.15.0-rc1
v2.15.0-rc1
v2.14.3
Bug Fixes
- In ConditionalRouter fixed the
to_dict
andfrom_dict
methods to properly handle the case whenoutput_type
is a List of types or a List of strings. This occurs when a user specifies a route in ConditionalRouter to have multiple outputs. - Fix the serialization of ComponentTool and Tool when specifying
outputs_to_string
. Previously an error occurred on deserialization right after serializing ifoutputs_to_string
is not None.
v2.14.3-rc1
v2.14.3-rc1
v2.14.2
Bug Fixes
- Fixed a bug in
OpenAIDocumentEmbedder
andAzureOpenAIDocumentEmbedder
where if an OpenAI API error occurred mid-batch then the following embeddings would be paired with the wrong documents.
New Features
- Added a
raise_on_failure
boolean parameter toOpenAIDocumentEmbedder
andAzureOpenAIDocumentEmbedder
. If set toTrue
then the component will raise an exception when there is an error with the API request. It is set toFalse
by default so the previous behavior of logging an exception and continuing is still the default.
v2.14.2-rc1
v2.14.2-rc1
v2.14.1
Bug Fixes
- Fixed a mypy issue in the OpenAIChatGenerator and its handling of stream responses. This issue only occurs with mypy >=1.16.0.
- Fix type comparison in schema validation by replacing is not with != when checking the type List[ChatMessage]. This prevents false mismatches due to Python's is operator comparing object identity instead of equality.
v2.14.1-rc1
v2.14.1-rc1
v2.14.0
⭐️ Highlights
Enhancements for Complex Agentic Systems
We've improved agent workflows with better message handling and streaming support. Agent component now returns a last_message
output for quick access to the final message, and can use a streaming_callback
to emit tool results in real time. You can use the updated print_streaming_chunk
or write your own callback function to enable ToolCall details during streaming.
from haystack.components.websearch import SerperDevWebSearch
from haystack.components.agents import Agent
from haystack.components.generators.utils import print_streaming_chunk
from haystack.tools import tool, ComponentTool
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
web_search = ComponentTool(name="web_search", component=SerperDevWebSearch(top_k=5))
wiki_search = ComponentTool(name="wiki_search", component=SerperDevWebSearch(top_k=5, allowed_domains=["https://www.wikipedia.org/"]))
research_agent = Agent(
chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
system_prompt="""
You are a research agent that can find information on web or specifically on wikipedia.
Use wiki_search tool if you need facts and use web_search tool for latest news on topics.
Use one tool at a time, use the other tool if the retrieved information is not enough.
Summarize the retrieved information before returning response to the user.
""",
tools=[web_search, wiki_search],
streaming_callback=print_streaming_chunk
)
result = research_agent.run(messages=[ChatMessage.from_user("Can you tell me about Florence Nightingale's life?")])
Enabling streaming with print_streaming_chunk
function looks like this:
[TOOL CALL]
Tool: wiki_search
Arguments: {"query":"Florence Nightingale"}
[TOOL RESULT]
{'documents': [{'title': 'List of schools in Nottinghamshire', 'link': 'https://www.wikipedia.org/wiki/List_of_schools_in_Nottinghamshire', 'position': 1, 'id': 'a6d0fe00f1e0cd06324f80fb926ba647878fb7bee8182de59a932500aeb54a5b', 'content': 'The Florence Nightingale Academy, Eastwood; The Flying High Academy, Mansfield; Forest Glade Primary School, Sutton-in-Ashfield; Forest Town Primary School ...', 'blob': None, 'score': None, 'embedding': None, 'sparse_embedding': None}], 'links': ['https://www.wikipedia.org/wiki/List_of_schools_in_Nottinghamshire']}
...
Print the last_message
print("Final Answer:", result["last_message"].text)
>>> Final Answer: Florence Nightingale (1820-1910) was a pioneering figure in nursing and is often hailed as the founder of modern nursing. She was born...
Additionally, AnswerBuilder stores all generated messages in all_messages
meta field of GeneratedAnswer and supports a new last_message_only
mode for lightweight flows where only the final message needs to be processed.
Visualizing Pipelines with SuperComponents
We extended pipeline.draw()
and pipeline.show()
, which save pipeline diagrams to images files or display them in Jupyter notebooks. You can now pass super_component_expansion=True
to expand any SuperComponents and draw more detailed visualizations.
Here is an example with a pipeline containing MultiFileConverter and DocumentPreprocssor SuperComponents. After installing the dependencies that the MultiFileConverter
needs for all supported file formats via pip install haystack-ai pypdf markdown-it-py mdit_plain trafilatura python-pptx python-docx jq openpyxl tabulate pandas
, you can run:
from pathlib import Path
from haystack import Pipeline
from haystack.components.converters import MultiFileConverter
from haystack.components.preprocessors import DocumentPreprocessor
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
pipeline = Pipeline()
pipeline.add_component("converter", MultiFileConverter())
pipeline.add_component("preprocessor", DocumentPreprocessor())
pipeline.add_component("writer", DocumentWriter(document_store = document_store))
pipeline.connect("converter", "preprocessor")
pipeline.connect("preprocessor", "writer")
# expanded pipeline that shows all components
path = Path("expanded_pipeline.png")
pipeline.draw(path=path, super_component_expansion=True)
# original pipeline
path = Path("original_pipeline.png")
pipeline.draw(path=path)
SentenceTransformersSimilarityRanker with PyTorch, ONNX, and OpenVINO
We added a new SentenceTransformersSimilarityRanker component that uses the Sentence Transformers library to rank documents based on their semantic similarity to the query. This component replaces the legacy TransformersSimilarityRanker
component, which may be deprecated in a future release, with removal following a deprecation period. The SentenceTransformersSimilarityRanker
also allows choosing different inference backends: PyTorch, ONNX, and OpenVINO. For example, after installing sentence-transformers>=4.1.0
, you can run:
from haystack.components.rankers import SentenceTransformersSimilarityRanker
from haystack.utils.device import ComponentDevice
onnx_ranker = SentenceTransformersSimilarityRanker(
model="sentence-transformers/all-MiniLM-L6-v2",
token=None,
device=ComponentDevice.from_str("cpu"),
backend="onnx",
)
onnx_ranker.warm_up()
docs = [Document(content="Berlin"), Document(content="Sarajevo")]
output = onnx_ranker.run(query="City in Germany", documents=docs)
ranked_docs = output["documents"]
⬆️ Upgrade Notes
- We've added a
py.typed
file to Haystack to enable type information to be used by downstream projects, in line with PEP 561. This means Haystack's type hints will now be visible to type checkers in projects that depend on it. Haystack is primarily type checked using mypy (not pyright) and, despite our efforts, some type information can be incomplete or unreliable. If you use static type checking in your own project, you may notice some changes: previously, Haystack's types were effectively treated asAny
, but now actual type information will be available and enforced. We'll continue improving typing with the next release. - The deprecated
deserialize_tools_inplace
utility function has been removed. Usedeserialize_tools_or_toolset_inplace
instead, importing it as follows:from haystack.tools import deserialize_tools_or_toolset_inplace
.
🚀 New Features
-
Added
run_async
method toToolInvoker
class to allow asynchronous tool invocations. -
Agent can now stream tool result with
run_async
method as well. -
Introduced
serialize_value
anddeserialize_value
utility methods for consistent value (de)serialization across modules. -
Moved the
State
class to theagents.state
module and added serialization and deserialization capabilities. -
Add support for multiple outputs in ConditionalRouter
-
Implement JSON-safe serialization for OpenAI usage data by converting token counts and details (like CompletionTokensDetails and PromptTokensDetails) into plain dictionaries.
-
Added a new
SentenceTransformersSimilarityRanker
component that uses the Sentence Transformers library to rank documents based on their semantic similarity to the query. This component is a replacement for the legacyTransformersSimilarityRanker
component, which may be deprecated in a future release, with removal following after a deprecation period. TheSentenceTransformersSimilarityRanker
also allows choosing different inference backends: PyTorch, ONNX, and OpenVINO. To use theSentenceTransformersSimilarityRanker
, you need to installsentence-transformers>=4.1.0
. -
Add a
streaming_callback
parameter toToolInvoker
to enable streaming of tool results. Note that tool_result is emitted only after the tool execution completes and is not streamed incrementally. -
Update
print_streaming_chunk
to print ToolCall information if it is present in the chunk's metadata. -
Update
Agent
to forward thestreaming_callback
toToolInvoker
to emit tool results during tool invocation. -
Enhance SuperComponent's type compatibility check to return the detected common type between two input types.
⚡️ Enhancement Notes
-
When using HuggingFaceAPIChatGenerator with streaming, the returned ChatMessage now contains the number of prompt tokens and completion tokens in its meta data. Internally, the HuggingFaceAPIChatGenerator requests an additional streaming chunk that contains usage data. It then processes the usage streaming chunk to add usage meta data to the returned ChatMessage.
-
We now have a Protocol for TextEmbedder. The protocol makes it easier to create custom components or SuperComponents that expect any TextEmbedder as init parameter.
-
We added a
Component
signature validation method that details the mismatches between therun
andrun_async
method signatures. This allows a user to debug custom components easily. -
Enhanced the AnswerBuilder component with two agent-friendly features:
- All generated messages are now stored in the
meta
field of the GeneratedAnswer objects under anall_messages
key, improving traceability and debugging capabilities. - Added a new
last_message_only
parameter that, when set toTrue
, processes only the last message in the replies while still preserving the complete co...
- All generated messages are now stored in the
v2.14.0-rc2
v2.14.0-rc2