<a href="https://colab.research.google.com/github/vblagoje/notebooks/blob/main/haystack2x-experiments/chat_hf_local_summarize_graham.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip uninstall -y haystack-ai llmx transformers

Found existing installation: haystack-ai 2.0.0b4
Uninstalling haystack-ai-2.0.0b4:
  Successfully uninstalled haystack-ai-2.0.0b4


In [3]:
!pip install -q accelerate git+https://github.com/huggingface/transformers.git git+https://github.com/deepset-ai/haystack.git@hf_chat_support

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone


In [5]:
import torch

from haystack.components.generators.utils import default_streaming_callback
from haystack.components.builders import DynamicChatPromptBuilder
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.generators.chat import HuggingFaceLocalChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline

In [6]:
lcf = LinkContentFetcher(user_agents=["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"])
html_converter = HTMLToDocument(extractor_type="ArticleExtractor")

template = """Given the information below: \n
            {% for document in documents %}
                {{ document.content }}
            {% endfor %}
            Answer question: {{ query }}. \n Answer:"""

prompt_builder = DynamicChatPromptBuilder(runtime_variables=["documents"])

In [7]:
llm = HuggingFaceLocalChatGenerator(model="HuggingFaceH4/zephyr-7b-alpha",
                                    huggingface_pipeline_kwargs={"device_map": "auto"},
                                    streaming_callback=default_streaming_callback)

In [8]:
pipe = Pipeline()
pipe.add_component("fetcher", lcf)
pipe.add_component("converter", html_converter)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)


pipe.connect("fetcher.streams", "converter.sources")
pipe.connect("converter.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.messages")

In [9]:
template_prefix = """Given the article below: \n
            {% for document in documents %}
                {{ document.content }}
            {% endfor %}
            {{prompt_suffix}}
            """

messages = [ChatMessage.from_user(template_prefix)]


In [12]:
result = pipe.run(data={"urls": ["https://www.paulgraham.com/getideas.html"],
                        "prompt_source": messages,
                        "template_variables": {"prompt_suffix" : "Summarize the main takeaways and learnings"},
                        "generation_kwargs": {"prompt_lookup_num_tokens": 10}})

The main takeaways and learnings from the article are:

1. To get new ideas, look for anomalies or gaps in knowledge, especially at the frontiers of knowledge.
2. Knowledge grows fractally, meaning that its edges may appear smooth from a distance, but up close, there are gaps and anomalies that seem obvious and inexplicable.
3. Exploring these gaps can yield whole new fractal buds, which can lead to significant discoveries and advancements.

In summary, the article suggests that noticing anomalies and gaps in knowledge is a key strategy for generating new ideas and driving innovation.</s>
