# MCQ Generator with LangChain and Hugging Face

This notebook demonstrates how to build a Multiple Choice Question (MCQ) generator using LangChain and a model from the Hugging Face Hub. The goal of this project is to automatically generate a quiz from a given text, and then evaluate the generated quiz for quality and relevance.

This project is a great way to showcase skills in:

*   **Natural Language Processing (NLP):** Using large language models (LLMs) to understand and process text.
*   **LangChain:** Building complex applications with LLMs by chaining together different components.
*   **API Integration:** Interacting with the Hugging Face Hub to leverage a wide range of open-source models.
*   **Prompt Engineering:** Designing effective prompts to guide the LLM's output.
*   **Python and Jupyter Notebooks:** Writing clean, well-documented code to solve a real-world problem.

In [36]:
!pip install langchain-google-genai

Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.12-py3-none-any.whl.metadata (7.1 kB)
Collecting google-ai-generativelanguage<1,>=0.7 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.7.0-py3-none-any.whl.metadata (10 kB)
Collecting filetype<2,>=1.2 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0,>=1.34.1 (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0,>=1.34.1->google-ai-generativelanguage<1,>=0.7->langchain-google-genai)
  Downloading google_api_core-2.25.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-auth!=2.24.0,!=2.25.0,<3.0.0,>=2.14.1 (from google-ai-generativelanguage<1,>=0.7->langchain-google-genai)
  Downloading google_auth-2.40.3-py2.py3-none-any.whl.metadata (6.2 kB)
Collec

## 1. Import Necessary Libraries

In [37]:
import os
import json
import pandas as pd
import traceback
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SequentialChain
from langchain_google_genai import ChatGoogleGenerativeAI
import PyPDF2

## 2. Environment Configuration

Next, we'll load the Hugging Face API key from a `.env` file. This is a good practice for managing sensitive information like API keys.

In [40]:
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.

True

In [41]:
# Replace with your Hugging Face API Key
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

## 3. LangChain Setup

Now we'll set up the core components of our LangChain application.

### 3.1. IMPORTANT: Accept Gemma's License Terms

Before using the `google/gemma-7b` model, you **must** visit its page on the Hugging Face Hub and accept the license terms. If you fail to do this, the API call will not work.

[Click here to go to the `google/gemma-7b` model page](https://huggingface.co/google/gemma-7b) and accept the terms.

### 3.2. Initialize the Language Model

We'll use the `google/gemma-7b` model from the Hugging Face Hub. It is a powerful text-generation model from Google.

In [42]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.1"

In [55]:
# llm = HuggingFaceEndpoint(
#     repo_id=repo_id,
#     huggingfacehub_api_token=HUGGINGFACE_API_KEY,
#     task="text-generation",
#     max_new_tokens=512,
#     temperature=0.5
# )

# Initialize the Gemini-Pro model
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.7)

### 3.3. Define Prompt Templates

We need two prompt templates: one for generating the MCQs and another for evaluating them.

In [56]:
TEMPLATE = '''
Text: {text}
You are an expert MCQ maker. Given the above text, it is your job to 
create a quiz of {number} multiple choice questions for {subject} students in {tone} tone. 
Make sure the questions are not repeated and check all the questions to be conforming the text as well.
Make sure to format your response like RESPONSE_JSON below and use it as a guide. 
Ensure to make {number} MCQs
### RESPONSE_JSON
{response_json}
'''

In [57]:
TEMPLATE2 = '''
You are an expert english grammarian and writer. Given a Multiple Choice Quiz for {subject} students.
You need to evaluate the complexity of the question and give a complete analysis of the quiz. Only use at max 50 words for complexity analysis. 
if the quiz is not at per with the cognitive and analytical abilities of the students,
update the quiz questions which needs to be changed and change the tone such that it perfectly fits the student abilities
Quiz_MCQs:
{quiz}

Check from an expert English Writer of the above quiz:
'''

### 3.4. Create LangChain Chains

We'll create two `LLMChain` instances: one for quiz generation and one for evaluation. Then, we'll combine them into a `SequentialChain`.

In [58]:
quiz_generation_prompt = PromptTemplate(
    input_variables=["text", "number", "subject", "tone", "response_json"],
    template=TEMPLATE
)

In [59]:
quiz_chain = LLMChain(llm=llm, prompt=quiz_generation_prompt, output_key="quiz", verbose=True)

In [60]:
quiz_evaluation_prompt = PromptTemplate(input_variables=["subject", "quiz"], template=TEMPLATE2)

In [61]:
review_chain = LLMChain(llm=llm, prompt=quiz_evaluation_prompt, output_key="review", verbose=True)

In [62]:
generate_evaluate_chain = SequentialChain(
    chains=[quiz_chain, review_chain],
    input_variables=["text", "number", "subject", "tone", "response_json"],
    output_variables=["quiz", "review"],
    verbose=True
)

## 4. Data Preparation

Now we'll load the text from the `data.txt` file. This text will be used as the source material for generating the quiz.

In [63]:
file_path = "../data.txt"

In [64]:
with open(file_path, 'r') as file:
    TEXT = file.read()

## 5. MCQ Generation and Evaluation

Now it's time to run our chain and generate the MCQs. We'll specify the number of questions, the subject, and the tone of the quiz.

In [65]:
NUMBER = 5
SUBJECT = "biology"
TONE = "simple"
RESPONSE_JSON = {
    "1": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here"
        },
        "correct": "correct answer"
    },
    "2": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here"
        },
        "correct": "correct answer"
    },
    "3": {
        "mcq": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here"
        },
        "correct": "correct answer"
    }
}

In [66]:
# Make sure you have accepted the license for the model on Hugging Face Hub
response = generate_evaluate_chain.invoke(
    {
        "text": TEXT,
        "number": NUMBER,
        "subject": SUBJECT,
        "tone": TONE,
        "response_json": json.dumps(RESPONSE_JSON)
    }
)



[1m> Entering new SequentialChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Text: Biology is the scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of cells that process hereditary information encoded in genes, which can be transmitted to future generations. Another major theme is evolution, which explains the unity and diversity of life.[1][2][3] Energy processing is also important to life as it allows organisms to move, grow, and reproduce.[1][2][3] Finally, all organisms are able to regulate their own internal environments.[1][2][3][4][5]

Biologists are able to study life at multiple levels of organization,[1] from the molecular biology of a cell to the anatomy and physiology of plants and animals, and evolution of populations.[1][6] Hence, there are multiple subdisciplin


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are an expert english grammarian and writer. Given a Multiple Choice Quiz for biology students.
You need to evaluate the complexity of the question and give a complete analysis of the quiz. Only use at max 50 words for complexity analysis. 
if the quiz is not at per with the cognitive and analytical abilities of the students,
update the quiz questions which needs to be changed and change the tone such that it perfectly fits the student abilities
Quiz_MCQs:
```json
{
  "1": {
    "mcq": "What is biology?",
    "options": {
      "a": "The study of rocks and minerals",
      "b": "The study of the Earth's atmosphere",
      "c": "The scientific study of life",
      "d": "The study of human behavior"
    },
    "correct": "c"
  },
  "2": {
    "mcq": "Which of the following is NOT a unifying theme in biology?",
    "options": {
      "a": "All organisms are made of cells",
   

### 5.1. Display the Results

In [70]:
print("--- RAW MODEL OUTPUT FOR 'quiz' ---")
print(repr(response['quiz']))
print("--- END OF RAW OUTPUT ---")

--- RAW MODEL OUTPUT FOR 'quiz' ---
'```json\n{\n  "1": {\n    "mcq": "What is biology?",\n    "options": {\n      "a": "The study of rocks and minerals",\n      "b": "The study of the Earth\'s atmosphere",\n      "c": "The scientific study of life",\n      "d": "The study of human behavior"\n    },\n    "correct": "c"\n  },\n  "2": {\n    "mcq": "Which of the following is NOT a unifying theme in biology?",\n    "options": {\n      "a": "All organisms are made of cells",\n      "b": "Hereditary information is encoded in genes",\n      "c": "Organisms are unaffected by their environment",\n      "d": "Evolution explains the unity and diversity of life"\n    },\n    "correct": "c"\n  },\n  "3": {\n    "mcq": "What is a major process that allows organisms to grow and reproduce?",\n    "options": {\n      "a": "Photosynthesis only",\n      "b": "Energy processing",\n      "c": "Cellular respiration only",\n      "d": "Water absorption only"\n    },\n    "correct": "b"\n  },\n  "4": {\n    

## 6. Output Processing

Finally, we'll process the output from the LangChain pipeline. We'll parse the generated quiz, format it as a Pandas DataFrame, and save it to a CSV file.

In [71]:
quiz = response.get("quiz")
quiz = json.loads(quiz)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [19]:
quiz_table_data = []
for key, value in quiz.items():
    mcq = value["mcq"]
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value["options"].items()
        ]
    )
    correct = value["correct"]
    quiz_table_data.append({"MCQ": mcq, "Choices": options, "Correct": correct})

In [20]:
quiz_df = pd.DataFrame(quiz_table_data)

In [21]:
quiz_df

In [22]:
quiz_df.to_csv("mcqgenerated_biology_quiz.csv", index=False)