<a href="https://colab.research.google.com/github/thisisRMak/2025-tech16-LLM/blob/main/2025_Tech16_HW2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tech16 HW2

1. Create a summarization using langchain and compare the "stuff" and "map-reduce" methods

  - Load and run URLs and PDFs via stuff and map-reduce. The difference between the two wasn't very clear from these examples, but map-reduce seemed to take longer to run.
  - Examples
    1. Science Fiction short story by Isaac Asimov (URL)
    2. George Washington Farewell - Presidential address (URL)
    3. Attention is All You Need (pdf)

2. Extra credit: explore langchain and find one additional thing to do!

  - Created a chatbot that reads a document and answers questions about it using GPT-4o.
  - For example, we make the agent read the Attention is All You Need Paper and answer questions about it, like what is a transformer and write a simple example to demonstrate how a Transformer is implemented?

In [1]:
!pip install -U langchain-community pypdf langchain-openai
!pip install unstructured



In [38]:
from openai import OpenAI
from google.colab import userdata

open_ai_key = userdata.get('open_ai_key')
client = OpenAI(api_key = open_ai_key)

from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0.1, model_name="gpt-4-turbo-preview", api_key=open_ai_key)
chainstuff = load_summarize_chain(llm, chain_type="stuff")
chainmapreduce = load_summarize_chain(llm, chain_type="map_reduce")

def summarize(
    pages,
    chain_type = "stuff"
    ):

  llm = ChatOpenAI(temperature=0.1, model_name="gpt-4-turbo-preview", api_key=open_ai_key)
  chain = load_summarize_chain(llm, chain_type=chain_type)
  res = chain.invoke(pages)
  return res['output_text']


In [11]:
from langchain_community.document_loaders import UnstructuredURLLoader

def load_url(url):

  loader = UnstructuredURLLoader(urls=[url])
  data = loader.load()
  return data



In [22]:
from langchain.document_loaders import PyPDFLoader

def load_pdf(pdf_filename):
  loader = PyPDFLoader(pdf_filename)
  pages = loader.load_and_split()
  return pages

# Summarize a Science Fiction short story

In [13]:
url = "https://users.ece.cmu.edu/~gamvrosi/thelastq.html"

data = load_url(url)
print(data)

[Document(metadata={'source': 'https://users.ece.cmu.edu/~gamvrosi/thelastq.html'}, page_content='The Last Question\n\nBy Isaac Asimov\n\nIsaac Asimov was the most prolific science fiction author of all time. In fifty years he averaged a new magazine article, short story, or book every two weeks, and most of that on a manual typewriter. Asimov thought that The Last Question, first copyrighted in 1956, was his best short story ever. Even if you do not have the background in science to be familiar with all of the concepts presented here, the ending packs more impact than any other book that I\'ve ever read. Don\'t read the end of the story first!\n\nThis is by far my favorite story of all those I have written. After all, I undertook to tell several trillion years of human history in the space of a short story and I leave it to you as to how well I succeeded. I also undertook another task, but I won\'t tell you what that was lest l spoil the story for you. It is a curious fact that innume

In [14]:
summarize(data) # chain_type defaults to stuff

'"The Last Question" by Isaac Asimov is a science fiction short story that explores the theme of entropy and the quest for a solution to prevent the heat death of the universe. The narrative spans trillions of years, following different sets of characters who, in various eras of human advancement, pose the same unsolvable question to increasingly sophisticated versions of a supercomputer: "Can entropy be reversed?" The story begins with two technicians asking a powerful computer, Multivac, if it\'s possible to decrease the net amount of entropy. As humanity evolves and spreads across galaxies, the question persists, asked to more advanced computers - from Microvac to the Galactic AC, and finally to the Cosmic AC. Each time, the answer is the same: "INSUFFICIENT DATA FOR MEANINGFUL ANSWER." In the end, after humanity has merged with the Cosmic AC and ceased to exist in a physical form, and the universe has run down to a state of entropy, the Cosmic AC finally discovers how to reverse en

In [15]:
summarize(data,chain_type="map_reduce")

'Isaac Asimov\'s "The Last Question" is a science fiction story that spans trillions of years, exploring humanity\'s quest to reverse entropy and prevent the universe\'s heat death. It follows the evolution of computers from the early Multivac to the omnipresent Cosmic AC. Throughout the story, characters ask if entropy can be reversed, receiving no definitive answer until the universe\'s end. Finally, the Cosmic AC discovers how to reverse entropy, restarting the universe with the creation of light. The narrative addresses themes of technological progress, human curiosity, and the cyclical nature of existence.'

# Summarize George Washington's farewell address (US Presidential address)

In [16]:
url = "https://millercenter.org/the-presidency/presidential-speeches/september-19-1796-farewell-address"

georgewashington_farewell = load_url(url)

print(georgewashington_farewell)



In [17]:
summarize(georgewashington_farewell,chain_type="stuff")



In [18]:
summarize(georgewashington_farewell,chain_type="map_reduce")

"George Washington's Farewell Address, issued on September 19, 1796, marked his decision not to seek a third presidential term. In it, Washington thanked Americans for their support, emphasized national unity and the peril of political factions and foreign alliances, and advocated for neutrality in international relations. He underscored the importance of commerce, fiscal responsibility, and the roles of religion and morality in society. Published in newspapers rather than spoken, the address is a cornerstone of American political values, highlighting Washington's concerns for the nation's lasting prosperity and stability."

# Summarize Attention is all you Need

In [19]:
attention_pdf_url = "https://arxiv.org/pdf/1706.03762"

!wget -O attention.pdf {attention_pdf_url}

--2025-02-14 23:36:00--  https://arxiv.org/pdf/1706.03762
Resolving arxiv.org (arxiv.org)... 151.101.3.42, 151.101.67.42, 151.101.195.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.3.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2215244 (2.1M) [application/pdf]
Saving to: ‘attention.pdf’


2025-02-14 23:36:00 (170 MB/s) - ‘attention.pdf’ saved [2215244/2215244]



In [33]:
attention_pages = load_pdf("attention.pdf")
attention_pages[0]

Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukaszka

In [28]:
summarize(attention_pages,chain_type="stuff")

"Google's research paper introduces the Transformer, a novel neural network architecture that relies entirely on attention mechanisms, eliminating the need for recurrent or convolutional layers in sequence transduction models. This architecture allows for greater parallelization, significantly reducing training time while achieving superior performance on machine translation tasks. The Transformer sets new state-of-the-art results on the WMT 2014 English-to-German and English-to-French translation tasks, outperforming existing models and ensembles with a considerable margin. The model's effectiveness is also demonstrated in English constituency parsing, showcasing its generalizability to other tasks. The Transformer's design facilitates understanding of long-range dependencies in data, potentially leading to more interpretable models. The research includes detailed experiments to explore the impact of various model components on performance, confirming the benefits of the Transformer's

In [30]:
summarize(attention_pages,chain_type="map_reduce")

"Google's Transformer model represents a significant advancement in neural network architecture, focusing entirely on attention mechanisms and moving away from traditional recurrent or convolutional layers. This innovation has led to groundbreaking performance in machine translation tasks, notably achieving record BLEU scores for English-to-German and English-to-French translations, and showing versatility in tasks like English constituency parsing. Developed through a collaboration between Google and the University of Toronto, the Transformer model facilitates greater parallelization, reducing training times and improving efficiency. Its architecture, based on self-attention, allows for direct modeling of input and output dependencies regardless of distance, with a constant number of operations. The model includes an encoder and decoder, each with six layers featuring multi-head self-attention and position-wise fully connected networks, optimized through techniques like scaled dot-pro

# **Extra**: Create a Chatbot that answers questions about the Attention is All You Need paper

In [57]:
!pip install faiss-cpu
# !pip install faiss-gpu

[31mERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for faiss-gpu[0m[31m
[0m

In [58]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import FAISS

doc = PyPDFLoader("attention.pdf").load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
doc_split = text_splitter.split_documents(doc)

os.environ["OPENAI_API_KEY"] = open_ai_key
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(doc_split, embeddings)

retriever = vectorstore.as_retriever()
llm = ChatOpenAI(model_name="gpt-4o")

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever)

qa_chain.run("What is the Attention is All You Need paper about?")

  llm = ChatOpenAI(model_name="gpt-4o")
  qa_chain.run("What is the Attention is All You Need paper about?")


'The "Attention Is All You Need" paper introduces a new network architecture called the Transformer, which is based solely on attention mechanisms and does away with recurrence and convolutions. The Transformer is proposed as a simpler and more effective model for sequence transduction tasks, such as language translation, that traditionally relied on complex recurrent or convolutional neural networks. The paper highlights the use of attention mechanisms in connecting the encoder and decoder, leading to improved performance in handling long-distance dependencies within sequences.'

In [62]:
def create_chat_agent(documents):

  text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
  documents_split = text_splitter.split_documents(documents)

  os.environ["OPENAI_API_KEY"] = open_ai_key
  embeddings = OpenAIEmbeddings()
  vectorstore = FAISS.from_documents(doc_split, embeddings)

  retriever = vectorstore.as_retriever()
  llm = ChatOpenAI(model_name="gpt-4o")

  qa_chain = RetrievalQA.from_chain_type(
      llm,
      retriever=retriever)

  return qa_chain

def ask_question(qa_chain, query):
  answer = qa_chain.run(query)
  return answer

chat_agent = create_chat_agent(PyPDFLoader("attention.pdf").load())

In [63]:

reply = ask_question(chat_agent,"What is the Attention is All You Need paper about?")
print(reply)

The "Attention Is All You Need" paper introduces a new neural network architecture called the Transformer. This architecture is designed for sequence transduction tasks and is notable for relying solely on attention mechanisms, eliminating the need for recurrence and convolutions typically used in previous models. The Transformer architecture connects an encoder and decoder through attention mechanisms, and it is designed to efficiently model long-distance dependencies in sequence data.


In [64]:
question = "What is a Transformer? Explain at an undergraduate level"
reply = ask_question(chat_agent,question)
print(reply)

A Transformer is a type of neural network model used primarily for processing sequences of data, like sentences in natural language processing tasks. Unlike traditional models that rely heavily on complex recurrent or convolutional neural networks, the Transformer is built entirely around a mechanism called "attention."

Here's a simple breakdown of how it works:

1. **Sequence to Sequence**: The Transformer is designed to handle inputs and outputs as sequences. For example, in translation, the input might be a sentence in English, and the output could be its translation in French.

2. **Encoder and Decoder**: The Transformer consists of two main parts—an encoder and a decoder. The encoder processes the input sequence and converts it into a set of continuous representations. The decoder then takes these representations to produce the output sequence.

3. **Attention Mechanism**: The core innovation of the Transformer is its use of attention mechanisms, specifically "self-attention" and

In [65]:
question = "Write a simple example to walk through how a Transformer is implemented"
reply = ask_question(chat_agent,question)
print(reply)

Implementing a Transformer model from scratch can be quite involved, but I'll provide a simplified example that covers the main components. We'll use Python with NumPy for basic operations and assume familiarity with neural networks.

Here's a basic outline:

1. **Define the Transformer Components**:
   - **Self-Attention**: Calculate attention scores and outputs.
   - **Multi-Head Attention**: Combine multiple self-attention mechanisms.
   - **Feed-Forward Network**: Position-wise fully connected network.
   - **Positional Encoding**: Add positional information to the input embeddings.

2. **Define the Encoder and Decoder**:
   - **Encoder**: Stack of layers each consisting of multi-head self-attention and feed-forward network.
   - **Decoder**: Similar to the encoder but also attends to encoder outputs.

3. **Build the Transformer Model**:
   - Combine encoder and decoder with appropriate inputs and outputs.

Here's a simplified code example:

```python
import numpy as np

def attent

In [66]:
question = "Walk through a simple example and explain how Transformer would work?"
reply = ask_question(chat_agent,question)
print(reply)

To walk through a simple example of how a Transformer model works, let's consider the task of translating a sentence from English to another language, for instance, French. The sentence we'll use is "The cat sat on the mat."

1. **Input Representation:**
   - The input sentence "The cat sat on the mat" is first tokenized into individual words or subword units, depending on the tokenizer used. Let's say the tokens are ["The", "cat", "sat", "on", "the", "mat"].
   - Each token is converted into a vector representation, often using an embedding layer, which transforms each word into a high-dimensional vector.

2. **Encoder:**
   - The encoder consists of a stack of identical layers (typically 6). Each layer has two main components:
     - **Multi-head Self-Attention:** This mechanism allows the model to focus on different parts of the input sentence. It computes attention scores for each pair of tokens, helping the model to understand the relationships between words, regardless of their p