<a href="https://colab.research.google.com/github/tirtharajdash/CS-F407_Artificial-Intelligence/blob/main/Lab1_Tranducers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lab: Transducer with Agents for Question Answering

In this lab, you will design and implement a transducer-based question answering (QA) system composed of two cooperating agents:

*   Question Transducer (QT) – reformulates and normalizes user questions.

*   Answering Transducer (AT) – reformulates, grounds, and generates answers based on the processed question.

Each transducer is modeled with ***agent(s)*** that maps an input representation to an output representation, potentially changing structure, intent, or level of abstraction.

What is a Transducer?

A transducer is a system that transforms one sequence or representation into another. In NLP, transducers often map:

*   Raw text → structured text
*   Ambiguous queries → canonical queries
*   Draft answers → refined answers



**System Architecture**

User Question →
[ Question Transducer ]
→
Reformulated Question
→
[ Answering Transducer ]
→
Final Answer

Question Processing Transducer (QT) with one agent:

Role

The QT agent transforms a raw user question into a form that is easier to answer.

Responsibilities:
*  Remove ambiguity
*  Expand abbreviations
*  Resolve references ("this", "that")
*  Convert to a canonical or explicit form

Example

Input Question:
"Why does it fail sometimes?"

QT Output:
"Why does the neural network training process fail to converge in some runs?"



Design Constraints:
Must not answer the question and must preserve original intent

Agent 2: Answering Transducer (AT) with one agent
Role

The AT agent reformulates the answer based on the processed question.

Responsibilities


*   Structure the answer (steps, bullets, explanation)
*   Adjust tone or verbosity
*   Add assumptions or clarifications

Example

Input (from QT):
"Why does the neural network training process fail to converge in some runs?"

AT Output:
"Neural network training may fail to converge due to poor weight initialization, an inappropriate learning rate, noisy data, or vanishing gradients..."

**Transducer with ASQA (Answer Summaries for Questions which are Ambiguous) dataset**

Demo Setup:

The QT agent generates multiple clarified question rewrites

The AT answers each clarified question independently and finally gives refined answer

**Step 1: Question Transducer Agent**

QT Input:
"Who played Batman?"

QPT Output (Multiple Reformulations):

"Which actor portrayed Batman in The Dark Knight (2008)?"

"Which actor portrayed Batman in Batman v Superman: Dawn of Justice (2016)?"

"Who was the first actor to play Batman in a live-action film?"

**Step 2: Answering Transducer Agent**

Each reformulated question is passed independently to the AT.

AT Input → Output Pairs

Q1: "Which actor portrayed Batman in The Dark Knight (2008)?"
A1: "Christian Bale portrayed Batman in The Dark Knight (2008)."

Q2: "Which actor portrayed Batman in Batman v Superman: Dawn of Justice (2016)?"
A2: "Ben Affleck portrayed Batman in Batman v Superman: Dawn of Justice (2016)."

Q3: "Who was the first actor to play Batman in a live-action film?"
A3: "Adam West was the first actor to play Batman in a live-action film."

Agent outputs the final answer

In [None]:
from datasets import load_dataset

ds = load_dataset("din0s/asqa")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

dataset_infos.json: 0.00B [00:00, ?B/s]

data/train-00000-of-00001-87b7d64f7913b5(…):   0%|          | 0.00/5.27M [00:00<?, ?B/s]

data/dev-00000-of-00001-58a9a40c6e69f07b(…):   0%|          | 0.00/1.46M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/4353 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/948 [00:00<?, ? examples/s]

In [None]:
ds.keys()

dict_keys(['train', 'dev'])

In [None]:
ds['train'][:5]

{'ambiguous_question': ["When does the new bunk'd come out?",
  'Who won the 2016 ncaa football national championship?',
  'When was the last time the death penalty was used in pa?',
  'Where will failure of the left ventricle cause increased pressure?',
  'Who won the war between ethiopia and italy?'],
 'qa_pairs': [[{'context': 'No context provided',
    'question': "When does episode 42 of bunk'd come out?",
    'short_answers': ['May 24, 2017'],
    'wikipage': None},
   {'context': 'No context provided',
    'question': "When does episode 41 of bunk'd come out?",
    'short_answers': ['April 28, 2017'],
    'wikipage': None},
   {'context': 'No context provided',
    'question': "When does episode 40 of bunk'd come out?",
    'short_answers': ['April 21, 2017'],
    'wikipage': None}],
  [{'context': "The 13–1 Alabama Crimson Tide won the game, holding off the undefeated Clemson Tigers 45–40 in the fourth quarter. Accompanied by a talented receiving corps, Clemson's Heisman Finali

In [None]:
#Extracting Ambiguous question and its correct answer
extracted_data = []
for i in range(5):
    ambiguous_q = ds['train'][i]['ambiguous_question']
    long_ans = ds['train'][i]['annotations'][0]['long_answer']
    extracted_data.append({
        'ambiguous_question': ambiguous_q,
        'long_answer': long_ans
    })

for item in extracted_data:
    print(f"Ambiguous Question: {item['ambiguous_question']}")
    print(f"Long Answer: {item['long_answer']}")
    print("\n")

Ambiguous Question: When does the new bunk'd come out?
Long Answer: The new bunk'd episode 41 comes out on April 21, 2017, episode 42 comes out on April 28, 2017 and episode 42 is due to come out on May 24, 2017. 


Ambiguous Question: Who won the 2016 ncaa football national championship?
Long Answer: The 2015 - 2016 season's ncaa national football championship game was played between the Clemson Tigers and the Alabama Crimson Tide on January 11, 2016. The Alabama Crimson Tide won the game by holding off the undefeated Clemson Tigers 45–40 in the fourth quarter.


Ambiguous Question: When was the last time the death penalty was used in pa?
Long Answer: The last time the death penalty was used in pa was on July 6, 1999. 


Ambiguous Question: Where will failure of the left ventricle cause increased pressure?
Long Answer: "Backward" failure of the left ventricle causes congestion of the lungs' blood vessels, and therefore causes increased pressure in the lungs. These symptoms are predomi

In [None]:
class QuestionAgent:
    def __init__(self, example):
        self.example = example

    def reformulate_question(self, question):
        # Placeholder for question reformulation logic
        print(f"QuestionAgent processing: {question}")
        # Based on the example provided in the notebook, this agent should return a list of clarified questions
        # For now, it will return the original question as a single-element list
        return [question + " (clarified by QuestionAgent)"]


In [None]:
class QuestionProcessingTransducer:
    def __init__(self, agent):
        self.agent = agent

    def transduce(self, ambiguous_question):
        print(f"QuestionProcessingTransducer input: {ambiguous_question}")
        # The QT agent generates multiple clarified question rewrites
        return self.agent.reformulate_question(ambiguous_question)

In [None]:
class AnswerAgent:
    def __init__(self, example):
        self.example = example

    def generate_answer(self, question):
        # Placeholder for answer generation logic
        print(f"AnswerAgent generating answer for: {question}")
        # For demonstration, return a generic answer
        return f"Answer for '{question}' (generated by AnswerAgent)"

In [None]:
class AnswerRefinementAgent:
    def refine_answer(self, answers):
        # Placeholder for answer refinement logic
        print(f"AnswerRefinementAgent refining answers: {answers}")
        # For demonstration, combine and refine answers
        return " | ".join(answers) + " (refined by AnswerRefinementAgent)"

In [None]:
class AnsweringTransducer:
    def __init__(self, answer_agent, refinement_agent):
        self.answer_agent = answer_agent
        self.refinement_agent = refinement_agent

    def transduce(self, clarified_questions):
        print(f"AnsweringTransducer input: {clarified_questions}")
        generated_answers = []
        for q in clarified_questions:
            generated_answers.append(self.answer_agent.generate_answer(q))
        return self.refinement_agent.refine_answer(generated_answers)

In [None]:
def run_pipeline(example):
    ambiguous_question = example["question"]

    # Question Transducer
    q_transducer = QuestionProcessingTransducer(
        QuestionAgent(example)
    )

    clarified_questions = q_transducer.transduce(ambiguous_question)

    # Answering Transducer (with refinement)
    a_transducer = AnsweringTransducer(
        answer_agent= AnswerAgent(example),
        refinement_agent= AnswerRefinementAgent()
    )

    final_answer = a_transducer.transduce(clarified_questions)
    return final_answer

In [None]:
first_example = {"question": extracted_data[0]['ambiguous_question']}
run_pipeline(first_example)

QuestionProcessingTransducer input: When does the new bunk'd come out?
QuestionAgent processing: When does the new bunk'd come out?
AnsweringTransducer input: ["When does the new bunk'd come out? (clarified by QuestionAgent)"]
AnswerAgent generating answer for: When does the new bunk'd come out? (clarified by QuestionAgent)
AnswerRefinementAgent refining answers: ["Answer for 'When does the new bunk'd come out? (clarified by QuestionAgent)' (generated by AnswerAgent)"]


"Answer for 'When does the new bunk'd come out? (clarified by QuestionAgent)' (generated by AnswerAgent) (refined by AnswerRefinementAgent)"

In [None]:
def evaluate_pipeline(dataset):
    results = []
    for i, example in enumerate(dataset):
        ambiguous_question = example['ambiguous_question']
        actual_long_answer = example['annotations'][0]['long_answer']

        pipeline_input = {"question": ambiguous_question}
        generated_answer = run_pipeline(pipeline_input)

    return results


In [None]:
evaluation_results = evaluate_pipeline(ds['train'].select(range(5)))
print(evaluation_results)

{'ambiguous_question': "When does the new bunk'd come out?", 'qa_pairs': [{'context': 'No context provided', 'question': "When does episode 42 of bunk'd come out?", 'short_answers': ['May 24, 2017'], 'wikipage': None}, {'context': 'No context provided', 'question': "When does episode 41 of bunk'd come out?", 'short_answers': ['April 28, 2017'], 'wikipage': None}, {'context': 'No context provided', 'question': "When does episode 40 of bunk'd come out?", 'short_answers': ['April 21, 2017'], 'wikipage': None}], 'wikipages': [{'title': "List of Bunk'd episodes", 'url': 'https://en.wikipedia.org/wiki/List%20of%20Bunk%27d%20episodes'}], 'annotations': [{'knowledge': [{'content': None, 'wikipage': "List of Bunk'd episodes"}], 'long_answer': "The new bunk'd episode 41 comes out on April 21, 2017, episode 42 comes out on April 28, 2017 and episode 42 is due to come out on May 24, 2017. "}], 'sample_id': '-5742327688291876861'}
QuestionProcessingTransducer input: When does the new bunk'd come ou