## 1. API and Other Setup
Here I use the Meta Llama 3 "llama3-8b-8192" model available through groq. 
The task has two steps: the first step uses the LLM to generate a quiz with questions related to a statistics area. The user (a.k.a, human) can specify the number of questions, the area, and the grade or grade level (e.g., "middle school"). The user can provide some text. The second step asks the LLM to evaluate the quiz.

In [1]:
# import getpass
import os
import json
import pandas as pd

# os.environ["OPENAI_API_KEY"] = getpass.getpass()

from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.
KEY=os.getenv("GROQ_API_KEY")


In [2]:
from operator import itemgetter
from langchain_core.prompts import PromptTemplate
# from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
# !pip install -qU langchain-groq
from langchain_groq import ChatGroq


model = ChatGroq(groq_api_key= KEY, model="llama3-8b-8192")
# parser = StrOutputParser()
# model_parser = model | parser

In [3]:
RESPONSE_JSON = {
    "1": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "2": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
    "3": {
        "STATSQA": "multiple choice question",
        "options": {
            "a": "choice here",
            "b": "choice here",
            "c": "choice here",
            "d": "choice here",
        },
        "correct": "correct answer",
    },
}


In [4]:
TEMPLATE="""
Text:{text}
You are an expert STATSQA maker. Given the above text, it is your job to \
create a quiz of {number} multiple choice questions related to {area} for students in {grade}. 
Make sure the questions are not repeated. Your response should be formated like RESPONSE_JSON below with {number} of items. \
Ensure to generate {number} questions.
### RESPONSE_JSON
{response_json} \

"""

In [5]:
TEMPLATE2="""
You are an expert of English grammar. \
You are given a STATSQA quiz: {quiz} of multiple choice questions related to {area} in statistics.\
You need to evaluate the complexity of the quiz. Use at most 50 words for complexity analysis. \
If the quiz is too easy or too difficult for students in {grade}, 
update the quiz questions to make it more suitable for the students in {grade}.

Check from an expert English Writer of the above quiz:
"""

## 2. Use LangChain Expression Language (LCEL)

In [6]:
quiz_generation_prompt = PromptTemplate(
    template=TEMPLATE, 
    input_variables=["text", "number", "area", "grade", "response_json"]
)
# quiz_generation_prompt = PromptTemplate.from_template(
#     template=TEMPLATE
# )

In [7]:
quiz_evaluation_prompt = PromptTemplate.from_template(TEMPLATE2)

In [8]:
quiz_generation_chain= quiz_generation_prompt | model

In [9]:
quiz_evaluation_chain = quiz_evaluation_prompt | model

In [10]:
dirname = os.getcwd()
file_path=os.path.join(dirname, "..", "data.txt")
with open(file_path, 'r') as file:
    TEXT = file.read()

In [11]:
NUMBER=5
AERA ="estimation"
GRADE="middle school"

In [12]:
# quiz_generation_chain.invoke(
#     {
#         "text": TEXT,
#         "number": NUMBER,
#         "area": AERA,
#         "grade": GRADE,
#         "response_json": json.dumps(RESPONSE_JSON),
#     }
# )

In [13]:
complete_chain = ({
    "text": itemgetter("text"),
    "number": itemgetter("number"),
    "area": itemgetter("area"),
    "grade": itemgetter("grade"),
    "response_json": itemgetter("response_json"),
    "quiz": quiz_generation_chain
    }
    | RunnablePassthrough.assign(eval=quiz_evaluation_chain)
)

In [14]:
all_result = complete_chain.invoke(
    {
        "text": TEXT,
        "number": NUMBER,
        "area": AERA,
        "grade": GRADE,
        "response_json": json.dumps(RESPONSE_JSON)
    }
)

In [15]:
# quiz_evaluation_chain.invoke(
#     {
#         "area": "point estimation",
#         "grade": "high school",
#         "quiz": quiz_generation_chain
#     }
# )

In [16]:
all_result

{'text': 'Estimation statistics, or simply estimation, is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results.[1] It complements hypothesis testing approaches such as null hypothesis significance testing (NHST), by going beyond the question is an effect present or not, and provides information about how large an effect is.[2][3] Estimation statistics is sometimes referred to as the new statistics.[3][4][5]\n\nThe primary aim of estimation methods is to report an effect size (a point estimate) along with its confidence interval, the latter of which is related to the precision of the estimate.[6] The confidence interval summarizes a range of likely values of the underlying population effect. Proponents of estimation see reporting a P value as an unhelpful distraction from the important business of reporting an effect size with its confidence intervals,[7] and

In [17]:
all_result.get("eval")

AIMessage(content='What a fascinating task!\n\nAfter reviewing the quiz, I\'d rate its complexity as moderate to challenging, primarily due to the technical vocabulary and statistical concepts involved. The questions require a good understanding of estimation methods, confidence intervals, effect sizes, and null hypotheses.\n\nTo make the quiz more suitable for middle school students, I\'d suggest simplifying the language and focusing on more concrete, relatable examples. Here are some potential revisions:\n\n1. Instead of "To report an effect size with its confidence interval," use "To find the range of possible values for a result."\n2. Replace "A range of likely values of the underlying population effect" with "The possible values that an actual value could be."\n3. For "What is sometimes referred to as the \'new statistics\'?", provide a simpler explanation, such as "A way of analyzing data that focuses on describing what happened rather than testing a hypothesis."\n4. Change "Repo

Process the string outcome in "quiz" to be JSON string

In [18]:
import json

def extract_between_braces(s):
    start = s.find('{')
    # Ensure the character after the first '{' is not another '{'
    while start != -1 and start + 1 < len(s) and s[start + 1] == '{':
        start = s.find('{', start + 1)
    
    end = s.rfind('}')
    # Ensure the character before the last '}' is not another '}'
    while end != -1 and end - 1 >= 0 and s[end - 1] == '}':
        end = s.rfind('}', 0, end - 1)
    
    if start != -1 and end != -1 and end > start:
        return "{" + s[start+1:end] + "}}"
    return ""

quiz_string = all_result.get("quiz").content
quiz_string = extract_between_braces(quiz_string)
print(quiz_string)


{
"1": {
"STATSQA": "What is the main goal of estimation methods?",
"options": {
"a": "To test a null hypothesis",
"b": "To report an effect size with its confidence interval",
"c": "To compare means between groups",
"d": "To analyze data using meta-analysis"
},
"correct": "b"
},
"2": {
"STATSQA": "What does a confidence interval summarize?",
"options": {
"a": "A range of likely values of the underlying population effect",
"b": "A point estimate of the effect size",
"c": "The precision of the estimate",
"d": "The null hypothesis"
},
"correct": "a"
},
"3": {
"STATSQA": "What is sometimes referred to as the 'new statistics'?",
"options": {
"a": "Hypothesis testing",
"b": "Estimation statistics",
"c": "Data analysis",
"d": "Meta-analysis"
},
"correct": "b"
},
"4": {
"STATSQA": "What is reported along with an effect size in estimation methods?",
"options": {
"a": "A P value",
"b": "A confidence interval",
"c": "A null hypothesis",
"d": "A mean"
},
"correct": "b"
},
"5": {
"STATSQA": "What 

In [19]:
quiz = json.loads(quiz_string)

In [20]:
quiz

{'1': {'STATSQA': 'What is the main goal of estimation methods?',
  'options': {'a': 'To test a null hypothesis',
   'b': 'To report an effect size with its confidence interval',
   'c': 'To compare means between groups',
   'd': 'To analyze data using meta-analysis'},
  'correct': 'b'},
 '2': {'STATSQA': 'What does a confidence interval summarize?',
  'options': {'a': 'A range of likely values of the underlying population effect',
   'b': 'A point estimate of the effect size',
   'c': 'The precision of the estimate',
   'd': 'The null hypothesis'},
  'correct': 'a'},
 '3': {'STATSQA': "What is sometimes referred to as the 'new statistics'?",
  'options': {'a': 'Hypothesis testing',
   'b': 'Estimation statistics',
   'c': 'Data analysis',
   'd': 'Meta-analysis'},
  'correct': 'b'},
 '4': {'STATSQA': 'What is reported along with an effect size in estimation methods?',
  'options': {'a': 'A P value',
   'b': 'A confidence interval',
   'c': 'A null hypothesis',
   'd': 'A mean'},
  'co

In [21]:
quiz_table_data = []
for key, value in quiz.items():
    STATSQA = value["STATSQA"]
    options = " | ".join(
        [
            f"{option}: {option_value}"
            for option, option_value in value["options"].items()
            ]
        )
    correct = value["correct"]
    quiz_table_data.append({"STATSQA": STATSQA, "Choices": options, "Correct": correct})

In [22]:
quiz=pd.DataFrame(quiz_table_data)
quiz

Unnamed: 0,STATSQA,Choices,Correct
0,What is the main goal of estimation methods?,a: To test a null hypothesis | b: To report an...,b
1,What does a confidence interval summarize?,a: A range of likely values of the underlying ...,a
2,What is sometimes referred to as the 'new stat...,a: Hypothesis testing | b: Estimation statisti...,b
3,What is reported along with an effect size in ...,a: A P value | b: A confidence interval | c: A...,b
4,What does estimation statistics provide inform...,a: Whether an effect is present or not | b: Ho...,b


In [23]:
quiz.to_csv(f"quiz_on_{AERA}.csv",index=False)

In [24]:
quiz

Unnamed: 0,STATSQA,Choices,Correct
0,What is the main goal of estimation methods?,a: To test a null hypothesis | b: To report an...,b
1,What does a confidence interval summarize?,a: A range of likely values of the underlying ...,a
2,What is sometimes referred to as the 'new stat...,a: Hypothesis testing | b: Estimation statisti...,b
3,What is reported along with an effect size in ...,a: A P value | b: A confidence interval | c: A...,b
4,What does estimation statistics provide inform...,a: Whether an effect is present or not | b: Ho...,b
