## 1. API and Other Setup
Here I use the Meta Llama 3 "llama3-8b-8192" model available through groq. 
The task has two steps: the first step uses the LLM to generate a quiz with questions related to a statistics area. The user (a.k.a, human) can specify the number of questions, the area, and the grade or grade level (e.g., "middle school"). The user can provide some text. The second step asks the LLM to evaluate the quiz.

In [279]:
# import getpass
import os
import json
import pandas as pd

# os.environ["OPENAI_API_KEY"] = getpass.getpass()

from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.
KEY=os.getenv("GROQ_API_KEY")


In [280]:
from operator import itemgetter
from langchain_core.prompts import PromptTemplate
# from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
# !pip install -qU langchain-groq
from langchain_groq import ChatGroq


model = ChatGroq(groq_api_key= KEY, model="llama3-8b-8192")
# parser = StrOutputParser()
# model_parser = model | parser

In [281]:
RESPONSE_JSON = {
    "1": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "2": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "3": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
}


In [299]:
TEMPLATE="""
Text:{text}
You are an expert STATSQA maker. Given the above text, it is your job to \
create a quiz of {number} multiple choice questions related to {area} for students in {grade}. 
Make sure the questions are not repeated. Your response should be formated like RESPONSE_JSON below with {number} of items. \
Ensure to generate {number} questions.
### RESPONSE_JSON
{response_json} \

"""

In [300]:
TEMPLATE2="""
You are an expert of English grammar. \
You are given a STATSQA quiz: {quiz} of multiple choice questions related to {area} in statistics.\
You need to evaluate the complexity of the quiz. Use at most 50 words for complexity analysis. \
If the quiz is too easy or too difficult for students in {grade}, 
update the quiz questions to make it more suitable for the students in {grade}.

Check from an expert English Writer of the above quiz:
"""

## 2. Use LangChain Expression Language (LCEL)

In [301]:
quiz_generation_prompt = PromptTemplate(
    template=TEMPLATE, 
    input_variables=["text", "number", "area", "grade", "response_json"]
)
# quiz_generation_prompt = PromptTemplate.from_template(
#     template=TEMPLATE
# )

In [302]:
quiz_evaluation_prompt = PromptTemplate.from_template(TEMPLATE2)

In [303]:
quiz_generation_chain= quiz_generation_prompt | model

In [304]:
quiz_evaluation_chain = quiz_evaluation_prompt | model

In [305]:
dirname = os.getcwd()
file_path=os.path.join(dirname, "experiment", "data.txt")
with open(file_path, 'r') as file:
    TEXT = file.read()

In [306]:
NUMBER=6
AERA ="estimation"
GRADE="middle school"

In [290]:
# quiz_generation_chain.invoke(
#     {
#         "text": TEXT,
#         "number": NUMBER,
#         "area": AERA,
#         "grade": GRADE,
#         "response_json": json.dumps(RESPONSE_JSON),
#     }
# )

In [291]:
# quiz_evaluation_chain.invoke(
#     {
#         "area": "point estimation",
#         "grade": "high school",
#         "quiz": quiz_generation_chain
#     }
# )

In [308]:
all_result = complete_chain.invoke(
    {
        "text": TEXT,
        "number": NUMBER,
        "area": AERA,
        "grade": GRADE,
        "response_json": json.dumps(RESPONSE_JSON)
    }
)

In [309]:
all_result

{'text': 'Estimation statistics, or simply estimation, is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results.[1] It complements hypothesis testing approaches such as null hypothesis significance testing (NHST), by going beyond the question is an effect present or not, and provides information about how large an effect is.[2][3] Estimation statistics is sometimes referred to as the new statistics.[3][4][5]\n\nThe primary aim of estimation methods is to report an effect size (a point estimate) along with its confidence interval, the latter of which is related to the precision of the estimate.[6] The confidence interval summarizes a range of likely values of the underlying population effect. Proponents of estimation see reporting a P value as an unhelpful distraction from the important business of reporting an effect size with its confidence intervals,[7] and

In [310]:
all_result.get("eval")

AIMessage(content="I'd be happy to evaluate the complexity of the quiz!\n\nThe quiz appears to be moderately complex, focusing on estimation statistics and its applications. The language used is clear and concise, making it accessible to students with a basic understanding of statistics.\n\nHowever, some questions may be challenging for middle school students, particularly questions 4 and 5, which require a deeper understanding of statistical concepts and terminology.\n\nTo make the quiz more suitable for middle school students, I would recommend rephrasing questions 4 and 5 to make them more concrete and relatable. For example:\n\n* Question 4: What is a new way to think about statistics that focuses on estimating effects rather than testing hypotheses? (Answer: Estimation statistics)\n* Question 5: Why do some statisticians prefer to report an estimate of an effect rather than just saying whether it's statistically significant? (Answer: Because it provides more information about the 

Process the string outcome in "quiz" to be JSON string

In [311]:
import json

def extract_between_braces(s):
    start = s.find('{')
    end = s.rfind('}')
    if start != -1 and end != -1 and end > start:
        return "{" + s[start+1:end] + "}"
    return ""

quiz_string = all_result.get("quiz").content
quiz_string = extract_between_braces(quiz_string)
print(quiz_string)


{"1": {"STATSQA": "What is estimation statistics used for?", "options": {"a": "To only ask if an effect is present or not", "b": "To plan experiments, analyze data, and interpret results", "c": "To only report P values", "d": "To only calculate confidence intervals"}, "correct": "b"},
"2": {"STATSQA": "What is the primary aim of estimation methods?", "options": {"a": "To only report P values", "b": "To only calculate confidence intervals", "c": "To report an effect size along with its confidence interval", "d": "To only plan experiments"}, "correct": "c"},
"3": {"STATSQA": "What does a confidence interval summarize?", "options": {"a": "A range of likely values of the underlying population effect", "b": "A measure of the precision of the estimate", "c": "A point estimate of the effect size", "d": "A null hypothesis"}, "correct": "a"},
"4": {"STATSQA": "What is sometimes referred to as the new statistics?", "options": {"a": "Hypothesis testing", "b": "Null hypothesis significance testing

In [313]:
quiz = json.loads(quiz_string)

In [274]:
quiz

{'1': {'STATSQA': 'What is the purpose of estimation in statistics?',
  'options': {'a': 'To make precise predictions',
   'b': 'To make educated guesses',
   'c': 'To analyze data',
   'd': 'To create graphs'},
  'correct': 'b'},
 '2': {'STATSQA': 'Why is estimation important in real-life situations?',
  'options': {'a': "Because it's always accurate",
   'b': 'Because it helps us make decisions',
   'c': "Because it's always wrong",
   'd': "Because it's always boring"},
  'correct': 'b'},
 '3': {'STATSQA': 'What type of data is typically used for estimation?',
  'options': {'a': 'Qualitative data',
   'b': 'Quantitative data',
   'c': 'Categorical data',
   'd': 'Attribute data'},
  'correct': 'b'},
 '4': {'STATSQA': 'What is an example of estimation in everyday life?',
  'options': {'a': 'Predicting the weather',
   'b': 'Estimating the cost of groceries',
   'c': 'Counting the number of people in a room',
   'd': 'Measuring the distance between two cities'},
  'correct': 'b'},
 '5

In [314]:
quiz_table_data = []
for key, value in quiz.items():
    STATSQA = value["STATSQA"]
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value["options"].items()
            ]
        )
    correct = value["correct"]
    quiz_table_data.append({"STATSQA": STATSQA, "Choices": options, "Correct": correct})

In [315]:
quiz=pd.DataFrame(quiz_table_data)
quiz

Unnamed: 0,STATSQA,Choices,Correct
0,What is estimation statistics used for?,a: To only ask if an effect is present or not ...,b
1,What is the primary aim of estimation methods?,a: To only report P values | b: To only calcul...,c
2,What does a confidence interval summarize?,a: A range of likely values of the underlying ...,a
3,What is sometimes referred to as the new stati...,a: Hypothesis testing | b: Null hypothesis sig...,c
4,Why do proponents of estimation see reporting ...,a: Because it is difficult to understand | b: ...,b
5,What do proponents of estimation believe shoul...,a: Hypothesis testing | b: Estimation statisti...,b


In [316]:
quiz.to_csv(f"quiz_on_{AERA}.csv",index=False)