# LangChain Demonstration Notebook
This notebook demonstrates how to:
- Load a text file.
- Split it into manageable chunks.
- Generate embeddings using OpenAI.
- Use FAISS for vector storage and retrieval.
- Set up a question-answering system with LangChain.

In [1]:
# Install necessary packages
%pip install langchain langchain-community openai faiss-cpu langchain-openai

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import required libraries
import os
from getpass import getpass
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

### Step 1: Retrieve the OpenAI API key
The API key is securely entered using `getpass` to ensure it is not visible in the notebook.

In [3]:
# Enter the OpenAI API key securely
openai_api_key = getpass("Enter your OpenAI API key: ")

### Step 2: Load the text file
The text file is loaded using `TextLoader`. Replace the file path with your desired text file.

In [4]:
# Load the text file
file_path = "../0-Data/paul_graham_short.txt"  # Replace with your text file path
loader = TextLoader(file_path)
documents = loader.load()
print(documents[0])  # Display the first document to verify loading 

page_content='

What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.

The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

The language we used was an early version of Fortran. You had to type programs on punch cards, then s

### Step 3: Split the text into manageable chunks
To handle large documents, the text is split into smaller chunks with overlap for better context.

In [5]:
# Split the text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents)

### Step 4: Generate embeddings for the text chunks
Embeddings are generated for each chunk using OpenAI's embeddings model.

In [6]:
# Generate embeddings for text chunks
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
vector_store = FAISS.from_documents(split_docs, embeddings)

### Step 5: Set up a retrieval-based QA system
The system retrieves relevant chunks and uses OpenAI's model to answer queries.

In [7]:
# Set up a retrieval-based QA system
retriever = vector_store.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-3.5-turbo", openai_api_key=openai_api_key),
    retriever=retriever,
    return_source_documents=True
)

### Step 6: Ask questions
A loop is provided to allow users to ask questions interactively. Type `exit` to quit.

In [8]:
# Interactive QA loop
while True:
    query = input("\nEnter your question (or 'exit' to quit): ")
    if query.lower() == "exit":
        print("Exiting...")
        break
    result = qa_chain.invoke({"query": query})
    print("\nAnswer:")
    print(result["result"])


Answer:
The author worked on writing short stories and programming on an IBM 1401 computer in 9th grade, as well as delved into editing Lisp expressions for user-defined page styles in Viaweb. Later on, the author applied to grad schools, eventually attending Harvard and realizing during the first year that AI, as practiced at the time, was a hoax.
Exiting...
