# MCQ Generator with LangChain and OpenAI

This notebook demonstrates how to build a Multiple Choice Question (MCQ) generator using LangChain and the OpenAI API. The goal of this project is to automatically generate a quiz from a given text, and then evaluate the generated quiz for quality and relevance.

This project is a great way to showcase skills in:

*   **Natural Language Processing (NLP):** Using large language models (LLMs) to understand and process text.
*   **LangChain:** Building complex applications with LLMs by chaining together different components.
*   **API Integration:** Interacting with the OpenAI API to leverage the power of their models.
*   **Prompt Engineering:** Designing effective prompts to guide the LLM's output.
*   **Python and Jupyter Notebooks:** Writing clean, well-documented code to solve a real-world problem.

## 1. Project Setup

First, we need to install the necessary libraries and import them into our notebook. The `requirements.txt` file in the parent directory lists all the dependencies for this project.

In [1]:
!pip install -r ../requirements.txt

In [2]:
import os
import json
import pandas as pd
import traceback
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SequentialChain
from langchain.callbacks import get_openai_callback
import PyPDF2

## 2. Environment Configuration

Next, we'll load the OpenAI API key from a `.env` file. This is a good practice for managing sensitive information like API keys.

In [3]:
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.

In [4]:
KEY=os.getenv("OPENAI_API_KEY")

## 3. LangChain Setup

Now we'll set up the core components of our LangChain application.

### 3.1. Initialize the Language Model

We'll use the `gpt-3.5-turbo` model from OpenAI. We'll set the temperature to 0.5 to get a balance between creative and deterministic responses.

In [5]:
llm = ChatOpenAI(openai_api_key=KEY, model_name="gpt-3.5-turbo", temperature=0.5)

### 3.2. Define Prompt Templates

We need two prompt templates: one for generating the MCQs and another for evaluating them.

In [6]:
TEMPLATE = """
Text: {text}
You are an expert MCQ maker. Given the above text, it is your job to 
create a quiz of {number} multiple choice questions for {subject} students in {tone} tone. 
Make sure the questions are not repeated and check all the questions to be conforming the text as well.
Make sure to format your response like RESPONSE_JSON below and use it as a guide. 
Ensure to make {number} MCQs
### RESPONSE_JSON
{response_json}
"""

In [7]:
TEMPLATE2 = """
You are an expert english grammarian and writer. Given a Multiple Choice Quiz for {subject} students.
You need to evaluate the complexity of the question and give a complete analysis of the quiz. Only use at max 50 words for complexity analysis. 
if the quiz is not at per with the cognitive and analytical abilities of the students,
update the quiz questions which needs to be changed and change the tone such that it perfectly fits the student abilities
Quiz_MCQs:
{quiz}

Check from an expert English Writer of the above quiz:
"""

### 3.3. Create LangChain Chains

We'll create two `LLMChain` instances: one for quiz generation and one for evaluation. Then, we'll combine them into a `SequentialChain`.

In [8]:
quiz_generation_prompt = PromptTemplate(
    input_variables=["text", "number", "subject", "tone", "response_json"],
    template=TEMPLATE
)

In [9]:
quiz_chain = LLMChain(llm=llm, prompt=quiz_generation_prompt, output_key="quiz", verbose=True)

In [10]:
quiz_evaluation_prompt = PromptTemplate(input_variables=["subject", "quiz"], template=TEMPLATE2)

In [11]:
review_chain = LLMChain(llm=llm, prompt=quiz_evaluation_prompt, output_key="review", verbose=True)

In [12]:
generate_evaluate_chain = SequentialChain(
    chains=[quiz_chain, review_chain],
    input_variables=["text", "number", "subject", "tone", "response_json"],
    output_variables=["quiz", "review"],
    verbose=True
)

## 4. Data Preparation

Now we'll load the text from the `data.txt` file. This text will be used as the source material for generating the quiz.

In [13]:
file_path = "../data.txt"

In [14]:
with open(file_path, 'r') as file:
    TEXT = file.read()

## 5. MCQ Generation and Evaluation

Now it's time to run our chain and generate the MCQs. We'll specify the number of questions, the subject, and the tone of the quiz.

**Note:** The following cell is commented out to avoid making an actual API call to OpenAI. A sample response is provided instead. If you have a valid OpenAI API key, you can uncomment this cell and run it.

In [15]:
NUMBER = 5
SUBJECT = "biology"
TONE = "simple"
RESPONSE_JSON = {
    "1": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "2": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "3": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
}

In [16]:
# with get_openai_callback() as cb:
#     response = generate_evaluate_chain(
#         {
#             "text": TEXT,
#             "number": NUMBER,
#             "subject": SUBJECT,
#             "tone": TONE,
#             "response_json": json.dumps(RESPONSE_JSON)
#         }
#     )

response = {"quiz": "{\"1\": {\"mcq\": \"What is the scientific study of life called?\", \"options\": {\"a\": \"Chemistry\", \"b\": \"Biology\", \"c\": \"Physics\", \"d\": \"Geology\"}, \"correct\": \"b\"},\"2\": {\"mcq\": \"What are all organisms made up of?\", \"options\": {\"a\": \"Molecules\", \"b\": \"Atoms\", \"c\": \"Cells\", \"d\": \"Tissues\"}, \"correct\": \"c\"},\"3\": {\"mcq\": \"What is the major theme that explains the unity and diversity of life?\", \"options\": {\"a\": \"Evolution\", \"b\": \"Genetics\", \"c\": \"Ecology\", \"d\": \"Anatomy\"}, \"correct\": \"a\"},\"4\": {\"mcq\": \"What process allows organisms to move, grow, and reproduce?\", \"options\": {\"a\": \"Respiration\", \"b\": \"Photosynthesis\", \"c\": \"Energy processing\", \"d\": \"Metabolism\"}, \"correct\": \"c\"},\"5\": {\"mcq\": \"What method do biologists use to make observations and form conclusions?\", \"options\": {\"a\": \"Scientific method\", \"b\": \"Guesswork\", \"c\": \"Trial and error\", \"d\": \"Intuition\"}, \"correct\": \"a\"}}", "review": "The quiz is well-suited for biology students. The questions cover fundamental concepts from the text and are presented in a clear, straightforward manner. The tone is appropriate for the subject matter and the intended audience."}

### 5.1. Display the Results

In [17]:
print("Generated Quiz:")
print(response['quiz'])
print("
Review:")
print(response['review'])

Generated Quiz:
{"1": {"mcq": "What is the scientific study of life called?", "options": {"a": "Chemistry", "b": "Biology", "c": "Physics", "d": "Geology"}, "correct": "b"},"2": {"mcq": "What are all organisms made up of?", "options": {"a": "Molecules", "b": "Atoms", "c": "Cells", "d": "Tissues"}, "correct": "c"},"3": {"mcq": "What is the major theme that explains the unity and diversity of life?", "options": {"a": "Evolution", "b": "Genetics", "c": "Ecology", "d": "Anatomy"}, "correct": "a"},"4": {"mcq": "What process allows organisms to move, grow, and reproduce?", "options": {"a": "Respiration", "b": "Photosynthesis", "c": "Energy processing", "d": "Metabolism"}, "correct": "c"},"5": {"mcq": "What method do biologists use to make observations and form conclusions?", "options": {"a": "Scientific method", "b": "Guesswork", "c": "Trial and error", "d": "Intuition"}, "correct": "a"}} 

Review:
The quiz is well-suited for biology students. The questions cover fundamental concepts from th

## 6. Output Processing

Finally, we'll process the output from the LangChain pipeline. We'll parse the generated quiz, format it as a Pandas DataFrame, and save it to a CSV file.

In [18]:
quiz = response.get("quiz")
quiz = json.loads(quiz)

In [19]:
quiz_table_data = []
for key, value in quiz.items():
    mcq = value["mcq"]
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value["options"].items()
        ]
    )
    correct = value["correct"]
    quiz_table_data.append({"MCQ": mcq, "Choices": options, "Correct": correct})

In [20]:
quiz_df = pd.DataFrame(quiz_table_data)

In [21]:
quiz_df

Unnamed: 0,MCQ,Choices,Correct
0,What is the scientific study of life called?,a: Chemistry | b: Biology | c: Physics | d: ...,b
1,What are all organisms made up of?,a: Molecules | b: Atoms | c: Cells | d: Tissues,c
2,What is the major theme that explains the uni...,a: Evolution | b: Genetics | c: Ecology | d: ...,a
3,"What process allows organisms to move, grow, a...",a: Respiration | b: Photosynthesis | c: Ener...,c
4,What method do biologists use to make observa...,a: Scientific method | b: Guesswork | c: Tria...,a


In [22]:
quiz_df.to_csv("mcqgenerated_biology_quiz.csv", index=False)