# Unit 3

# Creating Your Own Signature in DSPy

Welcome to our third lesson in the DSPy Programming course\! In the previous lesson, we explored how to work with **language models** in DSPy. These models are the computational engine that powers everything in DSPy.

Today, we'll build on that foundation by diving into **signatures**, which are the way we define the expected input and output behavior of our language model tasks. Think of signatures as contracts that specify what information goes into a task and what should come out. They're a critical part of DSPy's approach to structured programming with language models.

While we briefly mentioned signatures in our introduction to DSPy, now we'll learn how to create custom signatures tailored to your specific needs. This is where DSPy really starts to shine compared to traditional prompt engineering. Instead of writing lengthy prompt templates, you'll define clear input and output specifications that DSPy will use to generate appropriate prompts behind the scenes.

Signatures in DSPy serve several important purposes:

  * They provide a structured way to interact with language models.
  * They make your code more readable and maintainable.
  * They enable DSPy's optimization capabilities.
  * They allow for modular composition of complex AI systems.

By the end of this lesson, you'll be able to create both simple string-based signatures and more complex class-based signatures for a variety of tasks. This knowledge will form the foundation for building sophisticated DSPy modules and programs in future lessons.

Let's start by looking at the simplest way to define signatures in DSPy.

-----

## String-Based Signatures

The most straightforward way to create a signature in DSPy is by using a **string format**. This concise syntax is perfect for simple tasks where you need to quickly define the relationship between inputs and outputs.

String-based signatures follow this general pattern:

```
"input_name1, input_name2, ... -> output_name1, output_name2, ..."
```

The arrow (`->`) separates inputs from outputs, making it clear what goes in and what comes out. Let's look at a basic example:

```python
import dspy

# Initialize a language model (assuming you've already set this up)
# dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Create a simple signature for sentiment classification
classify = dspy.Predict('sentence -> sentiment: bool')
```

In this example, we've created a signature that takes a `sentence` as input and produces a `sentiment` as output. The `: bool` part specifies that the output should be a boolean value (`True` or `False`).

Now let's use this signature to classify the sentiment of a sentence:

```python
message = "it's a charming and often affecting journey."  # example from the SST-2 dataset

# Call the classifier with our input
result = classify(sentence=message)

# Access the output
print(result.sentiment)  # Output: True
```

When we run this code, the language model analyzes the sentence and determines that it has a positive sentiment, returning `True`. The DSPy framework handles all the prompt engineering behind the scenes, instructing the language model to perform sentiment classification and return a boolean result.

Let's try another example with a different task — summarization:

```python
# Create a signature for summarization
summarize = dspy.Predict('document -> summary')
```

In this case, our signature simply specifies that we want to transform a `document` into a `summary`. Since we didn't specify a type for the summary, it defaults to a string.

The string-based signature format is powerful in its simplicity. It allows you to quickly define tasks without a lot of boilerplate code. However, as your tasks become more complex, you might need more control over the inputs and outputs, which brings us to our next topic.

-----

### Working with Multiple Parameters

Many real-world tasks require multiple inputs or produce multiple outputs. DSPy's signature system handles this elegantly, allowing you to define complex parameter relationships.

To specify multiple inputs in a string-based signature, simply separate them with commas:

```python
# Retrieval-Augmented Question Answering
qa = dspy.Predict("context: list[str], question: str -> answer: str")
```

This signature defines a question-answering task that takes two inputs:

  * `context`: A list of strings containing relevant information
  * `question`: A string representing the question to be answered

It produces a single output:

  * `answer`: A string containing the response to the question

Notice how we can specify types for both inputs and outputs. In this case, we're telling DSPy that `context` should be a list of strings, while `question` and `answer` are individual strings.

Similarly, you can define signatures with multiple outputs:

```python
# Multiple-Choice Question Answering with Reasoning
mcqa = dspy.Predict("question, choices: list[str] -> reasoning: str, selection: int")
```

This signature defines a multiple-choice question-answering task with:

  * **Inputs**: A `question` and a list of `choices`
  * **Outputs**: The `reasoning` behind the answer and the `selection` (an integer representing the chosen option)

When you call a module with multiple inputs, you need to provide all of them:

```python
# Example usage of the QA signature
contexts = ["Einstein developed the theory of relativity.",
            "The theory of relativity revolutionized physics."]
question = "What did Einstein develop?"

response = qa(context=contexts, question=question)
print(response.answer)  # Output might be: "Einstein developed the theory of relativity."
```

Similarly, when a signature produces multiple outputs, you can access each one individually:

```python
# Example usage of the MCQA signature
question = "Which planet is closest to the sun?"
choices = ["Earth", "Venus", "Mercury", "Mars"]

response = mcqa(question=question, choices=choices)
print(f"Reasoning: {response.reasoning}")
print(f"Selected option: {choices[response.selection]}")
```

This might produce output like:

```
Reasoning: The planet closest to the sun in our solar system is Mercury, followed by Venus, then Earth, and then Mars.
Selected option: Mercury
```

The ability to work with multiple parameters makes DSPy signatures extremely versatile. You can model complex tasks with interdependent inputs and outputs, all while maintaining a clean and readable syntax.

However, as your signatures become more complex, you might find the string-based format limiting. For more advanced use cases, DSPy provides a class-based approach to signature definition.

-----

## Class-Based Signature Definition

While string-based signatures are convenient for simple tasks, **class-based signatures** offer more control and expressiveness for complex scenarios. They allow you to:

  * Add detailed documentation
  * Provide field descriptions
  * Enforce stricter type constraints
  * Create more complex input/output structures

To create a class-based signature, you define a Python class that inherits from `dspy.Signature`. Here's a basic example:

```python
from typing import Literal

class Emotion(dspy.Signature):
    """Classify emotion."""
    sentence: str = dspy.InputField()
    sentiment: Literal['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'] = dspy.OutputField()
```

This signature defines an emotion classification task. Let's break down its components:

  * The class inherits from `dspy.Signature`.
  * The **docstring** provides a brief description of the task.
  * `sentence` is defined as an input field of type `str`.
  * `sentiment` is defined as an output field with a specific set of allowed values.

The `Literal` type from Python's typing module is particularly useful here. It constrains the output to one of the specified values, ensuring that the language model's response falls within our expected categories.

Now let's use this signature:

```python
sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me"

classify = dspy.Predict(Emotion)

result = classify(sentence=sentence)
print(result.sentiment)  # Output might be: "fear"
```

Class-based signatures also allow you to provide more detailed descriptions for each field using the `desc` parameter:

```python
class QuestionAnswering(dspy.Signature):
    """Answer questions based on the provided context."""
    context: str = dspy.InputField(desc="Relevant information to answer the question")
    question: str = dspy.InputField(desc="The question to be answered")
    answer: str = dspy.OutputField(desc="A concise answer based solely on the provided context")
```

These descriptions serve two important purposes:

1.  They provide documentation for developers using your signature.
2.  They give the language model more guidance about what each field represents.

The `desc` parameter is particularly valuable for output fields, as it helps steer the language model toward generating responses that match your expectations.

-----

### Advanced Type Annotations

One of the most powerful features of DSPy signatures is their support for Python's **type system**. By leveraging type annotations, you can create highly structured inputs and outputs that guide the language model's responses.

Let's explore some advanced type annotations with a complex example:

```python
from typing import TypedDict, Optional

class Entity(TypedDict):
    name: str
    type: str
    description: Optional[str]

class ExtractEntities(dspy.Signature):
    """Extract named entities from text."""
    text: str = dspy.InputField()
    entities: list[Entity] = dspy.OutputField(desc="List of entities found in the text")
```

This signature uses a `TypedDict` to define a structured entity representation, with fields for the entity's name, type, and an optional description.

Python's type system provides a rich vocabulary for expressing complex data structures, and DSPy leverages this to create highly structured interactions with language models. By using appropriate type annotations, you can guide the model to produce outputs that match your expected format, making it easier to integrate language models into larger applications.

-----

## Summary and Practice Preview

In this lesson, we've explored how to create custom signatures in DSPy, which are essential for defining the input/output behavior of your language model tasks. We've covered both string-based and class-based approaches, as well as advanced type annotations for more complex scenarios.

Here are the key takeaways:

  * **String-based signatures** provide a concise syntax for simple tasks, using the arrow (`->`) to separate inputs from outputs.
  * You can work with **multiple parameters** by separating them with commas and specifying types where needed.
  * **Class-based signatures** offer more control and expressiveness, allowing you to add documentation and field descriptions.
  * **Advanced type annotations** enable you to create highly structured inputs and outputs, guiding the language model's responses.

These concepts build directly on the language model foundation we covered in the previous lesson. The LMs we learned to initialize and configure are the computational engine that powers these signatures, turning your structured specifications into natural language interactions.

In the upcoming practice exercises, you'll have the opportunity to apply these concepts by creating various types of signatures for different tasks. You'll experiment with both string-based and class-based approaches, and you'll see how different type annotations affect the language model's responses.

As you work through these exercises, remember that effective signature design is about finding the right balance between flexibility and constraint. Too little structure might lead to unpredictable outputs, while too much might unnecessarily limit the language model's capabilities.

In our next lesson, we'll build on this foundation by exploring **DSPy modules**, which allow you to compose multiple signatures into more complex AI systems. You'll learn how to use built-in modules like `ChainOfThought` and `ReAct`, and how to create your own custom modules for specific tasks.

For now, focus on mastering signature creation, as it's the building block for everything else we'll do in DSPy. The more comfortable you become with defining clear input/output specifications, the more effectively you'll be able to harness the power of language models in your applications.

# Your First Sentiment Classifier Signature

Now that you've learned about string-based signatures in DSPy, it's time to put that knowledge into practice! In this exercise, you'll create your first signature for a sentiment classification task.

You'll need to define a signature that takes a "sentence" as input and returns a "sentiment" as a boolean value (True for positive sentiment, False for negative sentiment). The code is already set up to test your signature with both positive and negative examples.

This hands-on experience with a simple string-based signature will help you understand the fundamental building blocks of DSPy programming before we move on to more complex signatures in future exercises.

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# TODO: Define a signature for sentiment classification that takes a sentence as input
# and returns a sentiment as a boolean value (True for positive, False for negative)

# Test with a positive example
positive_sentence = "it's a charming and often affecting journey."
result_positive = classify(sentence=positive_sentence)
print(f"Sentence: '{positive_sentence}'")
print(f"Sentiment: {result_positive.sentiment}")

# Test with a negative example
negative_sentence = "the film is a huge disappointment with poor acting and a weak plot."
result_negative = classify(sentence=negative_sentence)
print(f"Sentence: '{negative_sentence}'")
print(f"Sentiment: {result_negative.sentiment}")
```

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# TODO: Define a signature for sentiment classification that takes a sentence as input
# and returns a sentiment as a boolean value (True for positive, False for negative)
classify = dspy.Predict("sentence -> sentiment: bool")

# Test with a positive example
positive_sentence = "it's a charming and often affecting journey."
result_positive = classify(sentence=positive_sentence)
print(f"Sentence: '{positive_sentence}'")
print(f"Sentiment: {result_positive.sentiment}")

# Test with a negative example
negative_sentence = "the film is a huge disappointment with poor acting and a weak plot."
result_negative = classify(sentence=negative_sentence)
print(f"Sentence: '{negative_sentence}'")
print(f"Sentiment: {result_negative.sentiment}")
```

# Create Your First Summarization Signature

Now that you've learned about string-based signatures in DSPy, let's put that knowledge into practice! In this exercise, you'll create your first signature for a text summarization task.

String-based signatures are a simple yet powerful way to define what goes into and comes out of your language model tasks. The arrow (->) is the key element that separates inputs from outputs.

Your task is to:

Create a signature that takes a "document" as input and produces a "summary" as output.
Use this signature with the provided example document.
Print the resulting summary and compare its length to the original.
This exercise will help you understand how DSPy signatures work as contracts between your code and language models. Once you master this basic pattern, you'll be ready to create more complex signatures for advanced tasks.

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# Create a string-based signature for summarization
# TODO: Define a signature that takes a document as input and returns a summary as output

# Example document to summarize
document = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

# TODO: Use the signature to generate a summary

# TODO: Print the summary

```

To create a summarization signature in DSPy, you'll use the `dspy.Predict` class with a string that defines the input and output fields. The `document` will be the input, and the `summary` will be the output.

Here is the completed code that defines the signature, uses it to summarize the provided text, and prints the result.

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# Create a string-based signature for summarization
summarize = dspy.Predict("document -> summary")

# Example document to summarize
document = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

# Use the signature to generate a summary
result = summarize(document=document)

# Print the summary
print("Original Document:")
print(document)
print("\nGenerated Summary:")
print(result.summary)
```

# Building a Multi-Input Question Answering Signature

Now that you've created your first basic signature for summarization, let's take it up a notch! In this exercise, you'll work with multiple inputs by creating a signature for a retrieval-augmented question-answering system.

Many real-world AI applications need to answer questions based on specific information sources. Your signature will help define this relationship clearly.

Your task is to:

Create a signature that takes two inputs: a list of context passages and a question.
Ensure your signature specifies the proper types (list of strings for context, string for question).
Define a single output (answer) as a string.
Test your signature with the provided example data about the Great Wall of China.
Remember that in string-based signatures with multiple inputs, you separate them with commas, and the arrow (->) marks where inputs end and outputs begin.

This exercise will help you build more sophisticated AI systems that can process multiple pieces of information at once — a key skill for creating practical applications with DSPy.

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# TODO: Create a string-based signature for retrieval-augmented question answering
# The signature should take a context (list of strings) and a question (string) as inputs
# and return an answer (string) as output

# Example context passages
contexts = [
    "The Great Wall of China is a series of fortifications that were built across the historical northern borders of ancient Chinese states and Imperial China as protection against various nomadic groups from the Eurasian Steppe.",
    "Construction began in the 7th century BC and the walls were built and rebuilt over many centuries. The most well-known sections were built by the Ming dynasty (1368–1644).",
    "The Great Wall is approximately 13,171 miles (21,196 kilometers) long, although not all sections are still visible today."
]

# Example question
question = "When was the Great Wall of China built?"

# TODO: Use the signature to generate an answer

# TODO: Print the answer
```

To create a signature for this multi-input question-answering task, you'll define two inputs, `context` and `question`, and one output, `answer`. Remember to specify the type for each.

Here is the completed code that defines the signature, uses it to answer the question based on the provided context, and prints the result.

```python
import dspy
import os

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# Create a string-based signature for retrieval-augmented question answering
# The signature should take a context (list of strings) and a question (string) as inputs
# and return an answer (string) as output
qa_signature = dspy.Predict("context: list[str], question: str -> answer: str")

# Example context passages
contexts = [
    "The Great Wall of China is a series of fortifications that were built across the historical northern borders of ancient Chinese states and Imperial China as protection against various nomadic groups from the Eurasian Steppe.",
    "Construction began in the 7th century BC and the walls were built and rebuilt over many centuries. The most well-known sections were built by the Ming dynasty (1368–1644).",
    "The Great Wall is approximately 13,171 miles (21,196 kilometers) long, although not all sections are still visible today."
]

# Example question
question = "When was the Great Wall of China built?"

# Use the signature to generate an answer
response = qa_signature(context=contexts, question=question)

# Print the answer
print(response.answer)
```

# Creating Emotion Classifiers with Class Signatures

You've mastered string-based signatures, both simple and with multiple inputs. Now it's time to explore the more powerful class-based signatures in DSPy!

In this exercise, you'll create an emotion classifier that can identify specific feelings in text. Class-based signatures give you more control and clarity than string-based ones.

Your task is to:

Create an Emotion class that inherits from dspy.Signature.
Write a clear docstring explaining what your signature does.
Define an input field for the sentence to analyze.
Define an output field that uses Literal to limit emotions to specific options.
Add helpful descriptions to both fields using the desc parameter.
Test your signature with the provided example.
This exercise will show you how class-based signatures make your code more self-documenting and help guide language models to produce exactly the outputs you need. These skills will be essential as you build more complex AI systems in DSP

```python
import dspy
import os
from typing import Literal

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# TODO: Create a class-based signature for emotion classification
# The class should inherit from dspy.Signature and include a descriptive docstring

# TODO: Define input field for the sentence and output field for the sentiment
# Use Literal type to constrain the output to specific emotions
# Add helpful descriptions to both fields using the desc parameter

# Example sentence to classify
sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me"

# TODO: Create a predictor using your Emotion signature

# TODO: Use the predictor to classify the sentence

# TODO: Print the result

```



## Building a Fact Checking Signature

Excellent work with class-based signatures so far! After creating emotion classifiers, let's apply your skills to a practical fact-checking task.

In this exercise, you'll build a citation faithfulness checker that verifies whether claims are supported by evidence. This is a perfect use case for class-based signatures with complex outputs.

Your task is to:

Create a CheckCitationFaithfulness class that inherits from dspy.Signature.
Define two input fields: context (trusted facts) and text (claim to verify).
Create two output fields: a boolean for faithfulness and a dictionary for evidence.
Use proper type annotations, especially for the structured evidence dictionary.
Add helpful descriptions to guide the language model.
This signature will be valuable for applications like research assistants, fact-checkers, and educational tools that need to verify information against trusted sources. The structured output format makes it easy to understand not just whether a claim is supported, but exactly which evidence supports or contradicts it.

```python
import dspy
import os
from typing import Dict, List

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# TODO: Create a class-based signature for citation faithfulness checking
# The class should inherit from dspy.Signature and include a descriptive docstring
# explaining that it verifies if claims are supported by context

# TODO: Define two input fields:
# - context (string with description indicating facts are trusted)
# - text (string for the claim to verify)

# TODO: Define two output fields:
# - faithfulness (boolean indicating if claim is supported)
# - evidence (dictionary mapping strings to lists of strings for supporting evidence)

# Example context and claim
context = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

text = "Lee scored 3 goals for Colchester United."

# TODO: Create a predictor using your signature

# TODO: Use the predictor to check the claim

# TODO: Print the results showing both the faithfulness assessment and the evidence
```

To create a class-based signature for citation faithfulness, you'll define a class inheriting from `dspy.Signature`. You'll define two inputs, `context` and `text`, and two outputs, `faithfulness` and `evidence`, with appropriate type annotations and descriptions to guide the language model.

Here is the completed code that defines the signature, uses it to check the claim, and prints the results.

```python
import dspy
import os
from typing import Dict, List

# Configure a language model
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'], api_base=os.environ['OPENAI_BASE_URL'])
dspy.configure(lm=lm)

# Create a class-based signature for citation faithfulness checking
class CheckCitationFaithfulness(dspy.Signature):
    """Verifies whether a given claim is supported by the provided context."""

    # Define two input fields
    context: str = dspy.InputField(desc="A trusted source of facts.")
    text: str = dspy.InputField(desc="The claim to be verified.")

    # Define two output fields
    faithfulness: bool = dspy.OutputField(desc="Is the claim supported by the context?")
    evidence: Dict[str, List[str]] = dspy.OutputField(desc="A dictionary of supporting evidence. Keys are categories, values are lists of quotes from the context.")

# Example context and claim
context = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

text = "Lee scored 3 goals for Colchester United."

# Create a predictor using your signature
check_faithfulness = dspy.Predict(CheckCitationFaithfulness)

# Use the predictor to check the claim
result = check_faithfulness(context=context, text=text)

# Print the results showing both the faithfulness assessment and the evidence
print(f"Claim: '{text}'")
print(f"Is faithful to the context? {result.faithfulness}")
print(f"Evidence: {result.evidence}")
```