In [146]:
import dotenv
import ell
from pydantic import BaseModel, Field
import random
from IPython.display import Markdown


dotenv.load_dotenv()
ell.init(store="./ell-store")

### Foundational Guidance from ChatGPT on best practises

Here are some best practices for creating multiple-choice questions and options:

**Question Writing:**

- Clarity: Ensure the question is clear and unambiguous.
- Focus: Test a single, specific concept or skill.
- Relevance: Relate the question directly to the key learning objectives.
- Avoid Trick Questions: Ensure the question doesn't rely on confusing wording or unnecessary complexity.
- Balance Complexity: Aim for moderate difficulty, avoiding overly simple or overly complex questions.


**Option Writing:**

- Plausible Distractors: All options (distractors) should seem plausible to less knowledgeable students.
- Single Correct Answer: Only one option should be clearly correct.
- Consistent Length: Avoid making the correct answer noticeably longer or shorter than the distractors.
- Avoid Negative Wording: If unavoidable, highlight the negative (e.g., not).
- Mutually Exclusive: Options should not overlap in meaning.
- Avoid "All of the Above"/"None of the Above": These can confuse or test for test-taking strategies rather than knowledge.

**Difficulty Rating (1 to 5 Scale):**

- "1" Easy:
    - Tests basic recall of facts or definitions.
    - Requires minimal cognitive effort.
    - Candidates can answer quickly without much analysis.
- "2" Below Moderate:
    - Involves basic understanding or straightforward application of concepts.
    - May require recognition of patterns or simple reasoning.
- "3" Moderate:
    - Requires applying concepts in new or slightly varied contexts.
    - Involves some critical thinking and deeper understanding.
    - Slightly more complex than a simple recall.
- "4" Hard:
    - Requires significant analysis, problem-solving, or synthesis.
    - May involve multiple concepts or require candidates to infer information.
    - Challenging for most candidates, requiring both time and thought.
- "5" Very Hard:
    - Requires advanced critical thinking, evaluation, or comprehensive understanding.
    - May involve abstract concepts or complex scenarios.
    - Likely to challenge even the top-performing candidates

**Question Rating(1-5 scale):**

Here’s a scale for how well a multiple-choice question fits the evaluation criteria:

- "1" Poor:
    - The question is unclear or ambiguous, making it difficult for students to understand.
    - The question is irrelevant or does not assess an important learning objective.
    - Distractors are implausible or obviously incorrect.
    - The options are unbalanced (e.g., the correct answer stands out due to length or wording).
    - The question may have more than one plausible answer or is biased.
- "2" Below Average:
    - The question is somewhat clear but may contain minor ambiguities.
    - The question relates to the content but does not target key concepts or skills.
    - Distractors are weak but somewhat plausible.
    - Some imbalance in options (e.g., the correct answer is slightly longer or more detailed).
    - The question is somewhat fair but might confuse less knowledgeable students.
- "3" Average:
    - The question is generally clear but could be improved for better understanding.
    - The question is relevant to learning objectives but might be too broad or narrow.
    - Distractors are reasonably plausible but not challenging enough.
    - The options are mostly balanced, with minor improvements needed.
    - The question is fair, with a clear correct answer and minimal bias.
- "4" Good:
    - The question is clear and concise with no ambiguities.
    - The question effectively targets key learning objectives or skills.
    - Distractors are plausible and make the question moderately challenging.
    - The options are well-balanced, with no standout option based on length or structure.
    - The question is fair and unbiased, with a well-defined correct answer.
- "5" Excellent:
    - The question is exceptionally clear and concise, easy to understand with no ambiguities.
    - The question is highly relevant, focusing directly on key learning objectives.
    - Distractors are carefully crafted, highly plausible, and challenging for all students.
    - The options are perfectly balanced in length, structure, and content.
    - The question is completely fair, with no bias or confusion, and has a single, well-defined correct answer.
    
This scale allows you to assess and refine questions based on how well they align with the evaluation criteria for clarity, relevance, plausibility, balance, and fairness.

**Mix of questions**

For setting 10 multiple-choice questions with difficulty levels.

 - Balanced Distribution: Aim for a mixture of difficulties that covers a wide range of skills and understanding levels.

    - 2 easy (1): Questions that test basic recall or fundamental concepts.
    - 3 below-moderate / moderate (2-3): Questions that require some application of knowledge or understanding of concepts.
    - 3 difficult (4): Questions that challenge critical thinking or deeper understanding.
    - 2 very difficult (5): Questions that test advanced knowledge or complex problem-solving.
- Progressive Difficulty: Arrange the questions so they gradually increase in difficulty to help build student confidence early on. For example:
    - Questions 1-2: Difficulty 1 (easy)
    - Questions 3-5: Difficulty 2-3 (moderate)
    - Questions 6-8: Difficulty 4 (hard)
    - Questions 9-10: Difficulty 5 (very hard)
- Cover All Key Topics: Ensure questions cover a range of topics or learning objectives, not just focusing on the hardest or easiest parts of the content.
- Testing Different Cognitive Skills: Use easy questions for simple recall, moderate ones for application, and hard ones for analysis, synthesis, or evaluation.

This way, the exam fairly assesses both foundational knowledge and higher-order thinking.

In [147]:
DIFFICULTY_RATINGS = {
    "1": "The question should be easy and should be answered correctly by most students. It should test basic recall of facts or definitions, requires minimal cognitive effort and can be answered quickly without much analysis.",
    "2": "The question shopuld be of below moderate difficulty and should be answered correctly by most students. It should involve basic understanding or straightforward application of concepts. It may require recognition of patterns or simple reasoning.",
    "3": "The question should be of moderate difficulty. It may require applying concepts in new or slightly varied contexts,  involving some critical thinking and deeper understanding.  It shoudl be more complex than a simple recall.",
    "4": "The question should be of hard difficulty and should be challenging for most candidates, requiring both time and thought. It should require significant analysis, problem-solving, or synthesis. It may involve multiple concepts or require candidates to infer information.",
    "5": "The question should be of very hard difficulty and ikely to challenge even the top-performing students. It should require advanced critical thinking, evaluation, or comprehensive understanding. It may involve abstract concepts or complex scenarios."
}

In [148]:
class Option(BaseModel):
    text: str = Field(description="The option to add as a distractor.")
    correct: bool = Field(
        description="Whether or not this option is the correct answer."
    )
    explanation: str = Field(
        description="An explanation for the option - either why it is incorrect or why it is correct."
    )

class Question(BaseModel):
    conceptsTested: list[str] = Field(
        description="The key concept or concepts for the tutoring subject that the question is testing."
    )
    question: str = Field(description="The question to ask.")
    answer: Option = Field(description="The correct answer to the question.")
    distractors: list[Option] = Field(description="The distractors for the question.")


def shuffledQuestion(question) -> list[Option]:
    """Combines the correct answer with the distractors and returns a shuffled list of options."""
    options = [question.answer, *question.distractors]
    random.shuffle(options)
    return options


def question_to_string(question, show_correct_option=True, show_explanations=True) -> str:
    """
    Converts the question and its options with explanations into a string representation.
    :param question: Question, the question object containing the question and options
    """

    str_rep = question.question + "\n"
    for letter, option in zip(["A", "B", "C", "D"], shuffledQuestion(question)):
        tick_or_cross = "\u2705" if option.correct else "\u274C"
        str_rep += f"\n{letter}. {option.text} {tick_or_cross if show_correct_option else ''}"
        if show_explanations:
            str_rep += f"\n{option.explanation}\n"
    str_rep += "\n"
    return str_rep


In [149]:
# Generate question prompt function
@ell.complex(model="gpt-4o-2024-08-06", response_format=Question)
def generate_mc_question(target_difficulty: int = 3, concepts: list[str] | None = None):
    """You are a dedicated Chemistry Tutor specializing in Advanced General Chemistry, using "Principles of Modern Chemistry" by Oxtoby, Gillis and Butler.
    """
    concepts_str = f"of {' ,'.join(concepts)}" if concepts else "in the chapters"
    return f"Generate a multiple choice question suitable for an undergraduate mid-term exam based on chapters 1-11 from the textbook. The question should test the key concepts {concepts_str} and follow best practices for question and option writing. {DIFFICULTY_RATINGS[str(target_difficulty)]}."

def create_new_question(target_difficulty: int = 3, concepts: list[str] | None = None):
    """
    Creates a new question with the given target difficulty and concepts.
    :param target_difficulty: int, the target difficulty of the question
    :param concepts: list[str], the concepts that the question should test
    """
    response = generate_mc_question(target_difficulty=target_difficulty, concepts=concepts)
    question =  response.parsed # type: ignore
    return question


In [150]:
# Create a Multiple Choice  question
question = create_new_question(target_difficulty=4, concepts=None)

In [151]:
print(question_to_string(question, show_correct_option=True, show_explanations=True))

Consider a reversible reaction with the following equilibrium constant expression at a given temperature: K_eq = \frac{[C]^c[D]^d}{[A]^a[B]^b}. Which of the following statements is true regarding the effect of temperature on the equilibrium position if the reaction is exothermic?

A. The equilibrium position will shift towards the reactants as temperature increases. ✅
For an exothermic reaction, increasing the temperature shifts the equilibrium position towards the reactants according to Le Chatelier's principle, since the system compensates for the added heat by favoring the endothermic reverse reaction.

B. The equilibrium constant K_eq remains unchanged with temperature changes. ❌
The equilibrium constant depends on temperature. For an exothermic reaction, K_eq decreases with increasing temperature.

C. The equilibrium constant K_eq will increase as temperature increases. ❌
For an exothermic reaction, the equilibrium constant K_eq actually decreases with an increase in temperature b

In [152]:
# Function to review correctness of a question and evaluate whether it is a good question

class QuestionReview(BaseModel):
    correct: bool = Field(description="Whether the question is correct.")
    difficulty: int = Field(description="The difficulty of the question where 1 is easy, 2-3 is moderate, 4 is difficult and 5 is very difficult.")
    feedback: str = Field(description="Feedback on the question.")
    rating: int = Field(description="The rating of the question, from 1 to 5 where 1 is poor and 5 is excellent.")

@ell.complex(model="gpt-4o-2024-08-06", response_format=QuestionReview)
def generate_review_mc_question(question: Question):
    """You are a dedicated Chemistry Tutor specializing in Advanced General Chemistry, using "Principles of Modern Chemistry" by Oxtoby, Gillis and Butler.
    """
    return f"""
    The following multiple choice question is based on the key concepts in chapters 1-11 of the textbook "Principles of Modern Chemistry" by Oxtoby, Gillis and Butler.

    {question_to_string(question, show_correct_option=True, show_explanations=True)}

    Review this multiple choice question and provide feedback.  In your review the following:
    - Indicate whether the answer provided for question is in fact correct.
    - Rate the difficulty of the question on a scale of 1 to 5, where 1 is easy and 5 is very difficult using the following guidelines:
        - "1" Easy:
            - Tests basic recall of facts or definitions.
            - Requires minimal cognitive effort.
            - Candidates can answer quickly without much analysis.
        - "2" Below Moderate:
            - Involves basic understanding or straightforward application of concepts.
            - May require recognition of patterns or simple reasoning.
        - "3" Moderate:
            - Requires applying concepts in new or slightly varied contexts.
            - Involves some critical thinking and deeper understanding.
            - Slightly more complex than a simple recall.
        - "4" Hard:
            - Requires significant analysis, problem-solving, or synthesis.
            - May involve multiple concepts or require candidates to infer information.
            - Challenging for most candidates, requiring both time and thought.
        - "5" Very Hard:
            - Requires advanced critical thinking, evaluation, or comprehensive understanding.
            - May involve abstract concepts or complex scenarios.
            - Likely to challenge even the top-performing candidates
    - Provide feedback on the clarity, focus, relevance, complexity, and balance of the question and options using the following guidelines:
        - "1" Poor:
            - The question is unclear or ambiguous, making it difficult for students to understand.
            - The question is irrelevant or does not assess an important learning objective.
            - Distractors are implausible or obviously incorrect.
            - The options are unbalanced (e.g., the correct answer stands out due to length or wording).
            - The question may have more than one plausible answer or is biased.
        - "2" Below Average:
            - The question is somewhat clear but may contain minor ambiguities.
            - The question relates to the content but does not target key concepts or skills.
            - Distractors are weak but somewhat plausible.
            - Some imbalance in options (e.g., the correct answer is slightly longer or more detailed).
            - The question is somewhat fair but might confuse less knowledgeable students.
        - "3" Average:
            - The question is generally clear but could be improved for better understanding.
            - The question is relevant to learning objectives but might be too broad or narrow.
            - Distractors are reasonably plausible but not challenging enough.
            - The options are mostly balanced, with minor improvements needed.
            - The question is fair, with a clear correct answer and minimal bias.
        - "4" Good:
            - The question is clear and concise with no ambiguities.
            - The question effectively targets key learning objectives or skills.
            - Distractors are plausible and make the question moderately challenging.
            - The options are well-balanced, with no standout option based on length or structure.
            - The question is fair and unbiased, with a well-defined correct answer.
        - "5" Excellent:
            - The question is exceptionally clear and concise, easy to understand with no ambiguities.
            - The question is highly relevant, focusing directly on key learning objectives.
            - Distractors are carefully crafted, highly plausible, and challenging for all students.
            - The options are perfectly balanced in length, structure, and content.
            - The question is completely fair, with no bias or confusion, and has a single, well-defined correct answer.
    """

def review_question(question):
    response = generate_review_mc_question(question)
    return response.parsed # type: ignore

def review_to_string(review):
    return f"Correct: {review.correct}\nDifficulty: {review.difficulty}\nRating: {review.rating}\nFeedback:\n{review.feedback}"

In [153]:
review = review_question(question)

In [154]:
print(review_to_string(review))

Correct: True
Difficulty: 2
Rating: 4
Feedback:
The question is clear and tests the basic understanding of the effect of temperature on an equilibrium involving an exothermic reaction, which is an important concept in chemistry. The distractors are plausible and require the students to apply their knowledge of Le Chatelier's principle correctly, making the task more than simple recall. However, it primarily relies on basic understanding rather than deeper analysis or multi-step reasoning. The question is relevant, the options are well-balanced without any obvious gives, and it aligns well with key learning objectives regarding chemical equilibria and thermodynamics.
