[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/dspy/2.Writing-Blog-Posts-with-DSPy.ipynb)

# Writing Blog Posts with DSPy Programs

### In this tutorial, we will generate blog posts from questions!

Using the Weaviate blog posts as background information, we will generate 8 paragraph blog posts from questions such as:

`What is ref2vec and how can I use it to build e-commerce applications with Weaviate?`

### We will do this with a 4-layer DSPy program

```python
class BlogPostWriter(dspy.Module):
    def __init__(self):
        self.question_to_blog_outline = dspy.ChainOfThought(Question2BlogOutline)
        self.topic_to_paragraph = dspy.ChainOfThought(Topic2Paragraph)
        self.proof_reader = dspy.ChainOfThought(ProofReader)
        self.title_generator = dspy.ChainOfThought(TitleGenerator)
    
    def forward(self, question):
        contexts = dspy.Retrieve(k=5)(question).passages
        contexts = "".join(contexts)
        raw_blog_outline = self.question_to_blog_outline(question=question, context=contexts).blog_outline
        blog_outline = raw_blog_outline.split(",") #Add type hint in expanded Signature
        blog = ""
        for topic in blog_outline:
            topic_contexts = dspy.Retrieve(k=5)(topic).passages
            topic_contexts = "".join(topic_contexts)
            blog += self.topic_to_paragraph(topic=topic, contexts=topic_contexts).paragraph
            blog += "\n \n"
        blog = self.proof_reader(blog_post=blog).proofread_blog_post
        title = self.title_generator(blog_outline=raw_blog_outline).title
        final_blog = f"{title} \n \n {blog}"
        return dspy.Prediction(blog=final_blog)
```

## You can follow along this demo with this [video](https://youtu.be/ickqCzFxWj0?feature=shared)!

## Load Data into Weaviate
**You need a running Weaviate cluster with data**:
1. Learn about the installation options [here](https://weaviate.io/developers/weaviate/installation)
2. Import your data:
    1. You can follow the `Weaviate-Import.ipynb` notebook to load in the Weaviate blogs (recipes/integrations/dspy/Weaviate-Import.ipynb)
    2. Or follow this [Quickstart Guide](https://weaviate.io/developers/weaviate/quickstart)

## Installation

In [2]:
# Setup DSPy
# Connect to Weaviate Retriever and configure LLM
import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate

gpt4 = dspy.OpenAI(model="gpt-4", max_tokens=2000, model_type="chat")

weaviate_client = weaviate.Client("http://localhost:8080")
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)
dspy.settings.configure(lm=gpt4, rm=retriever_model)

            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


### Steps to Generate a Blog Post
#### 1. Outline
#### 2. Topic to paragraph
#### 3. Review
#### 4. Title

## Question2BlogOutline Signature and Demo

In [3]:
class Question2BlogOutline(dspy.Signature):
    """
    Your task is to write a blog post that will help answer the given question. 
    Please use the contexts to evaluate the structure of the blog post.
    """
    
    question = dspy.InputField()
    contexts = dspy.InputField()
    blog_outline = dspy.OutputField(desc="A comma separated list of topics.")

In [5]:
toy_question = "How does Hybrid Search in Weaviate work?"
toy_context = dspy.Retrieve(k=5)(toy_question).passages
toy_context = "".join(toy_context)
dspy.ChainOfThought(Question2BlogOutline)(question=toy_question, contexts=toy_context).blog_outline

"Introduction to Hybrid Search in Weaviate, The Working of Hybrid Search, Understanding Dense and Sparse Vectors, Fusion Algorithms in Weaviate's Hybrid Search, Example of a Hybrid Search, Combining Results of Vector and Keyword Searches."

In [6]:
gpt4.inspect_history(n=1)





Your task is to write a blog post that will help answer the given question. Please use the contexts to evaluate the structure of the blog post.

---

Follow the following format.

Question: ${question}

Contexts: ${contexts}

Reasoning: Let's think step by step in order to ${produce the blog_outline}. We ...

Blog Outline: A comma separated list of topics.

---

Question: How does Hybrid Search in Weaviate work?

Contexts: ### Further resources - [How-to: Hybrid search](/developers/weaviate/search/hybrid) - [API References: Hybrid search](/developers/weaviate/api/graphql/search-operators#hybrid)But do you know **how** hybrid search combines these results? And that Weaviate recently added a new algorithm for how this is done? In this post, we’ll dive into exactly the world of hybrid search to discuss how it works, how results are produced, the algorithms used, and more. So let’s get into it! :::info - Vector search and keyword search are also known as dense vector search and sparse 

## Topic to Paragraph

In [7]:
class Topic2Paragraph(dspy.Signature):
    """
    Your task is to write a paragraph that explains a topic based on the retrieved contexts.
    """
    
    topic = dspy.InputField(desc="A topic to write a paragraph about based on the information in the contexts.")
    contexts = dspy.InputField(desc="contains relevant information about the topic.")
    paragraph = dspy.OutputField()

In [8]:
toy_topic = "How does Hybrid Search in Weaviate work?"
toy_topic_context = dspy.Retrieve(k=5)(toy_topic).passages
toy_topic_context = "".join(toy_topic_context)
dspy.ChainOfThought(Topic2Paragraph)(topic=toy_topic, contexts=toy_topic_context).paragraph

'Hybrid search in Weaviate is a sophisticated technique that combines multiple search algorithms to enhance the accuracy and relevance of search results. It amalgamates the best features of keyword-based search algorithms and vector search techniques, providing a more effective search experience. In Weaviate, hybrid search uses sparse and dense vectors to encapsulate the meaning and context of search queries and documents. It performs a vector search to find the most similar objects to the vector of your query, and a keyword search, which ranks results based on the frequency of the query terms. The results of these two searches are then combined. Weaviate offers two fusion algorithms for this purpose: `rankedFusion` and `relativeScoreFusion`. This hybrid approach ensures a comprehensive and accurate search result, leveraging the strengths of different algorithms.'

In [9]:
gpt4.inspect_history(n=1)





Your task is to write a paragraph that explains a topic based on the retrieved contexts.

---

Follow the following format.

Topic: A topic to write a paragraph about based on the information in the contexts.

Contexts: contains relevant information about the topic.

Reasoning: Let's think step by step in order to ${produce the paragraph}. We ...

Paragraph: ${paragraph}

---

Topic: How does Hybrid Search in Weaviate work?

Contexts: ### Further resources - [How-to: Hybrid search](/developers/weaviate/search/hybrid) - [API References: Hybrid search](/developers/weaviate/api/graphql/search-operators#hybrid)But do you know **how** hybrid search combines these results? And that Weaviate recently added a new algorithm for how this is done? In this post, we’ll dive into exactly the world of hybrid search to discuss how it works, how results are produced, the algorithms used, and more. So let’s get into it! :::info - Vector search and keyword search are also known as dense vector search

## Proof Reader Signature

In [10]:
class ProofReader(dspy.Signature):
    """
    Proofread a blog post and output a more well written version of the original post.
    """
    
    blog_post = dspy.InputField()
    proofread_blog_post = dspy.OutputField()    

## Title Generator

In [11]:
class TitleGenerator(dspy.Signature):
    """
    Write a title for a blog post given a description of the topics the blog covers as input.
    """
    
    blog_outline = dspy.InputField()
    title = dspy.OutputField()

## Piecing it all together

In [12]:
class BlogPostWriter(dspy.Module):
    def __init__(self):
        self.question_to_blog_outline = dspy.ChainOfThought(Question2BlogOutline)
        self.topic_to_paragraph = dspy.ChainOfThought(Topic2Paragraph)
        self.proof_reader = dspy.ChainOfThought(ProofReader)
        self.title_generator = dspy.ChainOfThought(TitleGenerator)
    
    def forward(self, question):
        contexts = dspy.Retrieve(k=5)(question).passages
        contexts = "".join(contexts)
        raw_blog_outline = self.question_to_blog_outline(question=question, context=contexts).blog_outline
        blog_outline = raw_blog_outline.split(",") #Add type hint in expanded Signature
        blog = ""
        for topic in blog_outline:
            topic_contexts = dspy.Retrieve(k=5)(topic).passages
            topic_contexts = "".join(topic_contexts)
            blog += self.topic_to_paragraph(topic=topic, contexts=topic_contexts).paragraph
            blog += "\n \n"
        blog = self.proof_reader(blog_post=blog).proofread_blog_post
        title = self.title_generator(blog_outline=raw_blog_outline).title
        final_blog = f"{title} \n \n {blog}"
        return dspy.Prediction(blog=final_blog)

In [13]:
hybrid_search_topic = "How does Hybrid Search in Weaviate work?"
hybrid_search_blog = BlogPostWriter()(hybrid_search_topic)
print(hybrid_search_blog.blog)

"Exploring Hybrid Search in Weaviate: A Deep Dive into Machine Learning and Its Benefits" 
 
 Weaviate is an open-source, cloud-native, modular, real-time vector database designed to scale machine learning models. Unique in its ability to store not only data but also its automatically derived context, Weaviate creates a network of knowledge or a data graph. Its modularity allows for a wide range of applications, and it is agnostic of how vectors are generated. The Weaviate community offers various resources for learning and collaboration, including a YouTube channel, a blog, a dedicated Hacktoberfest channel in their community forum, and a Slack channel. They also provide a quickstart guide and other documentation for those interested in using Weaviate.

One of Weaviate's key features is its hybrid search capability. This powerful technique combines multiple search algorithms to enhance the accuracy and relevance of search results. It leverages the strengths of both keyword-based searc

In [14]:
ecommerce_topic = "e-Commerce reimagined with Weaviate"
ecommerce_blog = BlogPostWriter()(ecommerce_topic)
print(ecommerce_blog.blog)

"Leveraging Weaviate for e-Commerce: Capabilities, Benefits, and Future Prospects" 
 
 Weaviate is an open-source, cloud-native, modular, real-time vector database designed to scale machine learning models. It uniquely stores not only data but also its automatically derived context, including linguistic information and relations to other concepts, resulting in a knowledge network or data graph. Weaviate's modularity allows it to cover a wide range of applications and it is agnostic of how vectors are generated. 

The Weaviate community offers various resources for learning and collaboration, including a YouTube channel, a blog, a dedicated Hacktoberfest channel in their community forum, and a Slack channel. They also provide a quickstart guide and other documentation for those interested in using Weaviate. 

Weaviate combines the speed and capabilities of Approximate Nearest Neighbors (ANN) algorithms with the features of a traditional database, such as backups, real-time queries, pers

## BlogRater Program

In [15]:
class BlogRater(dspy.Signature):
    """
    Rate a blog post on a scale of 1 to 5 on how well-written it is.
    """
    
    blog = dspy.InputField(desc="a blog post")
    rating = dspy.OutputField(desc="a quality rating on a scale of 1 to 5. IMPORTANT!! ONLY OUTPUT THE RATING AS A FLOAT VALUE AND NOTHING ELSE!")

class MetricProgram(dspy.Module):
    def __init__(self):
        self.rater = dspy.ChainOfThought(BlogRater)
    
    def forward(self, gold, pred, trace=None):
        blog = pred.blog
        gold = gold.question
        return float(self.rater(blog=blog).rating)

def MetricWrapper(gold, pred, trace=None):
    return MetricProgram()(gold=gold, pred=pred)

In [16]:
MetricProgram()(gold=dspy.Example(question=hybrid_search_topic).with_inputs("question"), pred=hybrid_search_blog)

4.5

In [17]:
MetricProgram()(gold=dspy.Example(question=ecommerce_topic).with_inputs("question"), pred=ecommerce_blog)

4.5

In [18]:
gpt4.inspect_history(n=4)





Proofread a blog post and output a more well written version of the original post.

---

Follow the following format.

Blog Post: ${blog_post}
Reasoning: Let's think step by step in order to ${produce the proofread_blog_post}. We ...
Proofread Blog Post: ${proofread_blog_post}

---

Blog Post: Weaviate is an open-source, cloud-native, modular, real-time vector database designed to scale machine learning models. It is unique in its ability to not only store data but also its automatically derived context, which includes linguistic information and relations to other concepts. This results in a network of knowledge or a data graph. Weaviate's modularity allows it to cover a wide range of applications and it is agnostic of how vectors are generated. The Weaviate community offers various resources for learning and collaboration, including a YouTube channel, a blog, and a dedicated Hacktoberfest channel in their community forum and Slack channel. They also provide a quickstart guide and 

In [19]:
trainset = [
    dspy.Example(question="What is multi-tenancy?").with_inputs("question"),
    dspy.Example(question="What is product quantization?").with_inputs("question"),
    dspy.Example(question="How does asynchronous indexing work in Weaviate?").with_inputs("question"),
    dspy.Example(question="What are cross encoders?").with_inputs("question")
]
testset = [
    dspy.Example(question="What is a vector database?").with_inputs("question"),
    dspy.Example(question="What is HNSW?").with_inputs("question"),    
    dspy.Example(question="What can I build with Cohere and Weaviate?").with_inputs("question"),
    dspy.Example(question="What is a race condition in database systems?").with_inputs("question"),
    dspy.Example(question="How do inverted indexes work?").with_inputs("question")
]

##  Evaluate Uncompiled

In [20]:
from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(devset=testset, num_threads=1, display_progress=True, display_table=5)

evaluate(BlogPostWriter(), metric=MetricWrapper)

Average Metric: 22.5 / 5  (450.0): 100%|██████████| 5/5 [02:53<00:00, 34.69s/it]
  df = df.applymap(truncate_cell)


Average Metric: 22.5 / 5  (450.0%)


Unnamed: 0,question,blog,MetricWrapper
0,What is a vector database?,"""Vector Databases: Understanding and Applying Vector Data"" A vector database is a specialized type of database that indexes, stores, and provides access to both structured...",4.8
1,What is HNSW?,"""Understanding HNSW: From Small-World Phenomenon to Practical Applications"" Hierarchical Navigable Small World (HNSW) is a type of vector index that is primarily used for vector...",4.5
2,What can I build with Cohere and Weaviate?,"""Exploring Cohere and Weaviate: Features, Applications, and Building Your Own"" Cohere is an AI platform specializing in Large Language Models (LLMs) for text generation and...",4.2
3,What is a race condition in database systems?,"""Understanding and Preventing Race Conditions in Database Systems: A Comprehensive Guide"" Race conditions are a type of bug that can occur in parallel computing, where...",4.5
4,How do inverted indexes work?,"""Inverted Indexes: Understanding Their Importance, Construction, and Applications"" An inverted index is a critical component in databases, particularly used to power searches and provide efficient...",4.5


450.0

## Compile

In [21]:
from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=MetricWrapper, max_bootstrapped_demos=1,max_rounds=1)

# also common to init here, e.g. Rag()
compiled_blog_writer = teleprompter.compile(BlogPostWriter(), trainset=trainset)

 25%|███████████▎                                 | 1/4 [00:01<00:03,  1.27s/it]

Bootstrapped 1 full traces after 2 examples in round 0.





## Save Compiled Program

In [22]:
# Save the program
compiled_blog_writer.save("compiled_blog_writer.json")

## Load Compiled Program

In [23]:
# Load the program
compiled_blog_writer = BlogPostWriter()
compiled_blog_writer.load("compiled_blog_writer.json")

## LGTM Test

In [24]:
print(compiled_blog_writer("What is a Self-Driving Database?").blog)

"Exploring Self-Driving Databases: AI Integration, Benefits, and Their Role in IT Automation" 
 
 Self-driving databases represent the third wave of AI-first database technology. These databases, which store data indexed by machine learning models, are capable of brilliantly answering queries posed in natural language. However, their capabilities extend beyond text searches, as they can also be used to search anything from images to DNA. Much of the software involved in these databases is open source, allowing for transparency and customization to meet specific user needs. Self-driving databases are characterized by their consistency, resiliency, and scalability, making them a key component in the future of database technology. These databases use AI and machine learning to process, store, and search through data. AI can also generate code snippets based on user requirements, creating a feedback loop with tests to improve the code over time. However, the use of AI and machine learning 

In [25]:
gpt4.inspect_history(n=10)





Write a title for a blog post given a description of the topics the blog covers as input.

---

Follow the following format.

Blog Outline: ${blog_outline}
Reasoning: Let's think step by step in order to ${produce the title}. We ...
Title: ${title}

---

Blog Outline: Definition of Multi-Tenancy, Benefits of Multi-Tenancy, Comparison between Multi-Tenancy and Single-Tenancy, Technical Aspects of Multi-Tenancy, Examples of Multi-Tenancy in Real-World Applications.
Reasoning: Let's think step by step in order to[32m produce the title. We are discussing the concept of Multi-Tenancy, its benefits, and how it compares to Single-Tenancy. We are also delving into its technical aspects and providing real-world examples. Therefore, the title should reflect all these aspects.
Title: "Understanding Multi-Tenancy: Benefits, Comparison with Single-Tenancy, Technical Insights, and Real-World Examples"[0m







Rate a blog post on a scale of 1 to 5 on how well-written it is.

---

Follow the f

## Evaluate Compiled

In [26]:
evaluate(compiled_blog_writer, metric=MetricWrapper)

Average Metric: 22.8 / 5  (456.0): 100%|██████████| 5/5 [07:23<00:00, 88.71s/it]

Average Metric: 22.8 / 5  (456.0%)



  df = df.applymap(truncate_cell)


Unnamed: 0,question,blog,MetricWrapper
0,What is a vector database?,"""Exploring Vector Databases: Understanding Vector Data, GIS Applications, Supported Tasks, Technical Aspects, and Spatial Query Examples"" A vector database is a modern type of database...",4.5
1,What is HNSW?,"""Exploring HNSW: Understanding the Small-World Network Model, Technical Insights, and Applications in High-Dimensional Data"" Hierarchical Navigable Small World (HNSW) is a type of vector index...",4.8
2,What can I build with Cohere and Weaviate?,"""Harnessing the Power of Cohere and Weaviate: Benefits, Application Examples, and a Beginner's Guide"" --- Blog Outline: Introduction to Veganism, Health Benefits of Veganism, Environmental...",4.5
3,What is a race condition in database systems?,"""Race Conditions in Database Systems: Understanding, Problems, Prevention Techniques, and Real-World Examples"" A race condition is a situation in a system where the outcome is...",4.5
4,How do inverted indexes work?,"""Demystifying Inverted Index: Understanding Its Name, Functioning, Benefits, and Real-World Applications"" --- Blog Outline: Introduction to Machine Learning, Types of Machine Learning, How Machine Learning...",4.5


456.0