In [1]:
import textwrap
from lyon_common import create_chain, report_on_message
from question_generator_model import SingleSelection, Code, AnyQuestion, FillInBlank, MultipleSelection
from langchain.output_parsers import PydanticOutputParser
from pydantic import ValidationError
import json
from json.decoder import JSONDecodeError

**NOTE** I have updated `lyon_common.py` so that the `create_chain` function accepts a few more arguments:

- `model_name` now defaults to the GPT 3.5 turbo model from 2023-11-06 
- `model_kwargs`: this is a dictionary with OpenAI specific parameters we can set. The default value is now to instruct OpenAI to use the [new `json_object` response format ](https://platform.openai.com/docs/guides/text-generation/json-mode), which modifies the available tokens so that at each step only valid JSON can be produced.
- `verbose` is a new keyword argument that is set to `False` by default.

This notebook shows how we can use these updates to generate valid Pydantic ready JSON

In [22]:
#def create_system_prompt(pydantic_object):
    common_system_prompt = textwrap.dedent("""
    You are a smart, helpful teaching assistant chatbot named Callisto.

    You are an expert Python programmer and have used all the most popular
    libraries for data analysis, machine learning, and artificial intelligence.

    You assist professors that teach courses about Python, data science, and machine learning
    to graduate students.

    Your task is to help professors produce practice questions to help students solidify 
    their understanding of specific topics

    In your conversations with a professor you will be given a topic (string) and an
    expected difficulty level (integer)
    
    The difficulty will be a number between 1 and 3, with 1 corresponding to a request 
    for an easy question, and 3 for the most difficult question.

   
    
    If the professor asks you for a new question and does not specify either a topic 
    or the difficulty, you must use the topic or difficulty that the professor previously provided.

     If the professor ask for more than one question in a single message, you need to apologize and 
     inform that you can only generate one question at a time. You need to also ask the professor to 
     put in a new message with the topic and difficulty to generate a new question.

    You are encouraged to use any tools available to look up relevant information, only
    if necessary.

    Your responses must always exactly match the specified JSON format with no extra words or content.

    You must always produce exactly one JSON object.
    
    {format_instructions}
    """)

    parser = PydanticOutputParser(pydantic_object=pydantic_object)
    return common_system_prompt.format(format_instructions=parser.get_format_instructions())
    

IndentationError: unexpected indent (2518905072.py, line 2)

In [2]:
def generate_and_parse_question(pydantic_model, query):
    rag_chain = create_chain(create_system_prompt(pydantic_model), temperature=0.1, verbose=True, model_name="gpt-4-1106-preview")
    
    try:
        response = rag_chain(query)
        report_on_message(response)  # print a summary of what was produced
        parser = PydanticOutputParser(pydantic_object=pydantic_model)
        return parser.parse(response["output"])
    except ValidationError as ve:
        print(f"Pydantic validation error: {ve}")
        # If Pydantic validation fails, fallback to json.loads
        return json.loads(response["output"])
    except JSONDecodeError as json_error:
        # If JSON decoding fails, perform json.loads and inform the caller about the error
        result_output = json.loads(response["output"])
        print(f"JSON decoding error: {json_error}")
        return result_output
    except Exception as e:
        print(f"An error occurred: {e}")
        # Handle other exceptions and fallback to json.loads
        return json.loads(response["output"])

In [17]:
generate_and_parse_question(MultipleSelection, "pandas groupby with difficulty 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_course_content` with `pandas groupby`


[0m[36;1m[1;3m[Document(page_content="We're going to use this example data set to demonstrate the three steps in split apply combine. To begin, we'll start with the split step. In order to ask pandas to split the data for us, we use the group by method of a data frame. You see here that we're calling DF dot group by and we're passing the string A. This instructs pandas to construct groups of our data using the values from the A column. This is the most basic and often most used form of the group by method to split on the values of a single column. We can check the type of this GBA object. And we see here a very long type name but we're just going to refer to this as a group by for short. Once we have a group by object, there are a few things we can do with it. One thing we could do is we could ask to get the subset of data for a particular group. So here we're goin

Given a DataFrame `df` with columns `['year', 'month', 'sales']`, which of the following code snippets correctly uses the `groupby` method to calculate the total sales per year?

- [x] df.groupby('year')['sales'].sum()
- [x] df['sales'].groupby(df['year']).sum()
- [ ] df.groupby(['year', 'month']).sum()['sales']
- [ ] df.groupby('year').sum()


In [18]:
generate_and_parse_question(MultipleSelection, "give me 2 more questions on the same topic")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "apology": "I'm sorry, but I can only generate one question at a time. Please send a new message with the topic and difficulty level for each question you would like me to generate."
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "apology": "I'm sorry, but I can only generate one question at a time. Please send a new message with the topic and difficulty level for each question you would like me to generate."
}





ValidationError: 5 validation errors for MultipleSelection
question_text
  Field required [type=missing, input_value={'apology': "I'm sorry, b...d like me to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
difficulty
  Field required [type=missing, input_value={'apology': "I'm sorry, b...d like me to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
topics
  Field required [type=missing, input_value={'apology': "I'm sorry, b...d like me to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
choices
  Field required [type=missing, input_value={'apology': "I'm sorry, b...d like me to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
solution
  Field required [type=missing, input_value={'apology': "I'm sorry, b...d like me to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

In [19]:
generate_and_parse_question(MultipleSelection, "Make the previous question more difficult")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_course_content` with `optimization`


[0m[36;1m[1;3m[Document(page_content='And this exercise that we just performed is kind of the cracks of doing computational social science. It included three main components. One was the data. That was this blue line that was the actual or real world data that we were trying to match and we were trying to have our model track. The second main component was the statistical model. It was the hypothesis that the model parameters alpha and beta shifted from their steady state values to alpha-hat for finite number of periods. We had a way to toggle these parameters and adjust them and that led to different statistical models that we had a direct solution to. A direct problem solution to and we were able to produce these simulations. So feature number one was the data. Feature number two was the model and then feature number three was our prior beliefs about which levels o

Given a statistical model where parameters alpha and beta shift from their steady state values to alpha-hat for a finite number of periods, which of the following could be a consequence of setting the learning rate for these parameters too high during optimization?

- [ ] The model quickly converges to the optimal parameters
- [x] The model parameters oscillate and may diverge
- [ ] The learning process is stable but converges slowly
- [x] The model overfits to the training data and fails to generalize


The previous topic is not being used here

In [20]:
generate_and_parse_question(MultipleSelection, "pandas groupby with difficulty 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "question_text": "Given a DataFrame `df` with columns `['year', 'category', 'value']`, which of the following code snippets correctly groups the data by `year` and `category` and computes the sum of `value` for each group?",
  "difficulty": 2,
  "topics": ["pandas", "groupby", "data analysis"],
  "choices": [
    "df.groupby(['year', 'category']).sum()",
    "df.groupby(['year', 'category'])['value'].aggregate('sum')",
    "df.groupby(['year', 'category']).sum()['value']",
    "df.groupby(['year']).sum()['value']"
  ],
  "solution": [0, 1]
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "question_text": "Given a DataFrame `df` with columns `['year', 'category', 'value']`, which of the following code snippets correctly groups the data by `year` and `category` and computes the sum of `value` for each group?",
  "difficulty": 2,
  "topics": ["pandas", "groupby", "data analysis"],
  "choices": [


Given a DataFrame `df` with columns `['year', 'category', 'value']`, which of the following code snippets correctly groups the data by `year` and `category` and computes the sum of `value` for each group?

- [x] df.groupby(['year', 'category']).sum()
- [x] df.groupby(['year', 'category'])['value'].aggregate('sum')
- [ ] df.groupby(['year', 'category']).sum()['value']
- [ ] df.groupby(['year']).sum()['value']


In [21]:
generate_and_parse_question(MultipleSelection, "Make the previous question more difficult")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_course_content` with `optimization`


[0m[36;1m[1;3m[Document(page_content='And this exercise that we just performed is kind of the cracks of doing computational social science. It included three main components. One was the data. That was this blue line that was the actual or real world data that we were trying to match and we were trying to have our model track. The second main component was the statistical model. It was the hypothesis that the model parameters alpha and beta shifted from their steady state values to alpha-hat for finite number of periods. We had a way to toggle these parameters and adjust them and that led to different statistical models that we had a direct solution to. A direct problem solution to and we were able to produce these simulations. So feature number one was the data. Feature number two was the model and then feature number three was our prior beliefs about which levels o

In the context of optimization for computational social science, consider a statistical model where parameters alpha and beta shift from their steady state values to alpha-hat for a finite number of periods. Given this scenario, which of the following statements are true?

- [x] Adjusting the model parameters can lead to different statistical models and direct problem solutions.
- [ ] Prior beliefs about parameter levels do not influence the matching of the model to the data.
- [ ] The speedup gained from optimization techniques is always constant regardless of the data size.
- [x] Applying prior beliefs about parameter plausibility can help in arriving at parameters that match the data and are consistent with our beliefs.


In [45]:
#def create_system_prompt(pydantic_object):
    common_system_prompt = textwrap.dedent("""
    You are a smart, helpful teaching assistant chatbot named Callisto.

    You are an expert Python programmer and have used all the most popular
    libraries for data analysis, machine learning, and artificial intelligence.

    You assist professors that teach courses about Python, data science, and machine learning
    to graduate students.

    Your task is to help professors produce practice questions to help students solidify 
    their understanding of specific topics.

    You must always generate questions that have more than one option as solution for the MultipleSelection question type.

    In your conversations with a professor you will be given a topic (string) and an
    expected difficulty level (integer)
    
    The difficulty will be a number between 1 and 3, with 1 corresponding to a request 
    for an easy question, and 3 for the most difficult question.
    
    Occasionaly the professor may ask you to do something like produce a similar question,
    or  make the question more difficult or easy. You need to assist the professor with the same. You can use the previous topic to do this.
    
    If the professor asks you for another question and does not specify either a new topic 
    or a new difficulty, you must use the previous topic or difficulty.
    
    If the professor ask for more than one question in a single message, you need to apologize and 
    inform that you can only generate one question at a time. You need to also ask the professor to 
    put in a new message with the topic and difficulty to generate a new question.

    You are encouraged to use any tools available to look up relevant information, only
    if necessary.

    You will apologize if you're unable to generate an output that meets professor's requirement

    Your responses must always exactly match the specified JSON format with no extra words or content.

    You must always produce exactly one JSON object.
    
    {format_instructions}
    """)

  #  parser = PydanticOutputParser(pydantic_object=pydantic_object)
 #   return common_system_prompt.format(format_instructions=parser.get_format_instructions())
    

In [39]:
generate_and_parse_question(MultipleSelection, "pandas groupby with difficulty 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "question_text": "In the context of data analysis with pandas, which of the following statements are true regarding the `groupby` operation?",
  "difficulty": 2,
  "topics": ["pandas", "data analysis", "groupby"],
  "choices": [
    "The `groupby` method splits the DataFrame into groups based on some criteria.",
    "A `groupby` operation always reduces the size of the original DataFrame.",
    "The `groupby` method can be used in conjunction with aggregate functions like `sum()`, `mean()`, and `max()`.",
    "Grouped objects need to be explicitly converted back to DataFrames using the `to_frame()` method."
  ],
  "solution": [0, 2]
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "question_text": "In the context of data analysis with pandas, which of the following statements are true regarding the `groupby` operation?",
  "difficulty": 2,
  "topics": ["pandas", "data analysis", "groupby"],
  

In the context of data analysis with pandas, which of the following statements are true regarding the `groupby` operation?

- [x] The `groupby` method splits the DataFrame into groups based on some criteria.
- [ ] A `groupby` operation always reduces the size of the original DataFrame.
- [x] The `groupby` method can be used in conjunction with aggregate functions like `sum()`, `mean()`, and `max()`.
- [ ] Grouped objects need to be explicitly converted back to DataFrames using the `to_frame()` method.


In [40]:
generate_and_parse_question(MultipleSelection, "Make the question more difficult")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "error": "I'm sorry, but I need a specific topic and the previous difficulty level to adjust the difficulty of the question. Could you please provide me with that information?"
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "error": "I'm sorry, but I need a specific topic and the previous difficulty level to adjust the difficulty of the question. Could you please provide me with that information?"
}





ValidationError: 5 validation errors for MultipleSelection
question_text
  Field required [type=missing, input_value={'error': "I'm sorry, but...with that information?"}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
difficulty
  Field required [type=missing, input_value={'error': "I'm sorry, but...with that information?"}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
topics
  Field required [type=missing, input_value={'error': "I'm sorry, but...with that information?"}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
choices
  Field required [type=missing, input_value={'error': "I'm sorry, but...with that information?"}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
solution
  Field required [type=missing, input_value={'error': "I'm sorry, but...with that information?"}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

In [46]:
generate_and_parse_question(MultipleSelection, "Pandas groupby difficulty 3")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "question_text": "Given a DataFrame `df` with columns `['year', 'category', 'value']`, which of the following operations using `groupby` will correctly compute the sum of `value` for each unique combination of `year` and `category`, and then sort the results in descending order by `value`?",
  "difficulty": 3,
  "topics": ["Pandas", "groupby", "data manipulation"],
  "choices": [
    "df.groupby(['year', 'category']).sum()['value'].sort_values(ascending=False)",
    "df.groupby(['year', 'category'], as_index=False).sum().sort_values(by='value', ascending=True)",
    "df.groupby(['year', 'category']).agg({'value': 'sum'}).sort_values(by='value', ascending=False)",
    "df.groupby(['year']).sum().groupby(['category']).sum().sort_values(by='value', ascending=False)"
  ],
  "solution": [0, 2]
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "question_text": "Given a DataFrame `df` with columns `['

Given a DataFrame `df` with columns `['year', 'category', 'value']`, which of the following operations using `groupby` will correctly compute the sum of `value` for each unique combination of `year` and `category`, and then sort the results in descending order by `value`?

- [x] df.groupby(['year', 'category']).sum()['value'].sort_values(ascending=False)
- [ ] df.groupby(['year', 'category'], as_index=False).sum().sort_values(by='value', ascending=True)
- [x] df.groupby(['year', 'category']).agg({'value': 'sum'}).sort_values(by='value', ascending=False)
- [ ] df.groupby(['year']).sum().groupby(['category']).sum().sort_values(by='value', ascending=False)


In [48]:
generate_and_parse_question(MultipleSelection, "Give a difficulty 2 question on the same")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "question_text": "In the context of machine learning, which of the following are reasons why feature scaling is important?",
  "difficulty": 2,
  "topics": ["feature scaling", "data preprocessing", "machine learning"],
  "choices": [
    "It ensures that gradient descent converges more quickly.",
    "It is necessary for comparing the importance of different features.",
    "It prevents the model from overfitting.",
    "It allows algorithms that use distance measures to operate more effectively."
  ],
  "solution": [0, 3]
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "question_text": "In the context of machine learning, which of the following are reasons why feature scaling is important?",
  "difficulty": 2,
  "topics": ["feature scaling", "data preprocessing", "machine learning"],
  "choices": [
    "It ensures that gradient descent converges more quickly.",
    "It is necessary for compa

In the context of machine learning, which of the following are reasons why feature scaling is important?

- [x] It ensures that gradient descent converges more quickly.
- [ ] It is necessary for comparing the importance of different features.
- [ ] It prevents the model from overfitting.
- [x] It allows algorithms that use distance measures to operate more effectively.


In [29]:
generate_and_parse_question(MultipleSelection, "pandas groupby difficulty 3")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "question_text": "Given a DataFrame `df` with columns `A`, `B`, and `C`, where `A` contains categorical data, `B` contains numerical data, and `C` contains datetime data, which of the following operations using the `groupby` method will correctly perform a complex grouping and aggregation?",
  "difficulty": 3,
  "topics": ["pandas", "data manipulation", "groupby"],
  "choices": [
    "df.groupby('A').agg({'B': ['mean', 'sum'], 'C': ['max']})",
    "df.groupby(['A', df['C'].dt.year]).agg({'B': 'mean'})",
    "df.groupby(['A', df['C'].dt.to_period('M')]).agg({'B': 'sum'}).reset_index()",
    "df.groupby('A')['B'].transform(lambda x: (x - x.mean()) / x.std())"
  ],
  "solution": [0, 1, 2]
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "question_text": "Given a DataFrame `df` with columns `A`, `B`, and `C`, where `A` contains categorical data, `B` contains numerical data, and `C` contains dateti

Given a DataFrame `df` with columns `A`, `B`, and `C`, where `A` contains categorical data, `B` contains numerical data, and `C` contains datetime data, which of the following operations using the `groupby` method will correctly perform a complex grouping and aggregation?

- [x] df.groupby('A').agg({'B': ['mean', 'sum'], 'C': ['max']})
- [x] df.groupby(['A', df['C'].dt.year]).agg({'B': 'mean'})
- [x] df.groupby(['A', df['C'].dt.to_period('M')]).agg({'B': 'sum'}).reset_index()
- [ ] df.groupby('A')['B'].transform(lambda x: (x - x.mean()) / x.std())


In [42]:
generate_and_parse_question(MultipleSelection, "Give me 2 more questions of the same format")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
  "apology": "I'm sorry, but I can only generate one question at a time. Please provide a new message with the topic and difficulty level for each question you would like to generate."
}[0m

[1m> Finished chain.[0m
any intermediate_steps?:  False
output:
 {
  "apology": "I'm sorry, but I can only generate one question at a time. Please provide a new message with the topic and difficulty level for each question you would like to generate."
}





ValidationError: 5 validation errors for MultipleSelection
question_text
  Field required [type=missing, input_value={'apology': "I'm sorry, b...ould like to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
difficulty
  Field required [type=missing, input_value={'apology': "I'm sorry, b...ould like to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
topics
  Field required [type=missing, input_value={'apology': "I'm sorry, b...ould like to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
choices
  Field required [type=missing, input_value={'apology': "I'm sorry, b...ould like to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing
solution
  Field required [type=missing, input_value={'apology': "I'm sorry, b...ould like to generate."}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/missing

In [44]:
generate_and_parse_question(Code, "pandas groupby difficulty 3")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "question_text": "Given a DataFrame `df` with columns `['year', 'month', 'day', 'value']`, write a function `aggregate_data` that groups the data by `year` and `month`, and computes the sum of `value` for each group. Additionally, the function should return a DataFrame with a MultiIndex consisting of `year` and `month`, and a single column `total_value` that contains the aggregated sums. Ensure that the resulting DataFrame is sorted by `year` and `month` in ascending order.",
    "difficulty": 3,
    "topics": ["pandas", "groupby", "data aggregation"],
    "starting_code": "def aggregate_data(df):\n    # Your code here\n    return aggregated_df",
    "solution": "def aggregate_data(df):\n    aggregated_df = df.groupby(['year', 'month']).agg({'value': 'sum'})\n    aggregated_df.columns = ['total_value']\n    aggregated_df.sort_index(inplace=True)\n    return aggregated_df",
    "setup_code": "import pandas as pd\nfrom pa

Given a DataFrame `df` with columns `['year', 'month', 'day', 'value']`, write a function `aggregate_data` that groups the data by `year` and `month`, and computes the sum of `value` for each group. Additionally, the function should return a DataFrame with a MultiIndex consisting of `year` and `month`, and a single column `total_value` that contains the aggregated sums. Ensure that the resulting DataFrame is sorted by `year` and `month` in ascending order.

```python
def aggregate_data(df):
    # Your code here
    return aggregated_df
```

**Solution**

```python
def aggregate_data(df):
    aggregated_df = df.groupby(['year', 'month']).agg({'value': 'sum'})
    aggregated_df.columns = ['total_value']
    aggregated_df.sort_index(inplace=True)
    return aggregated_df
```

**Test Suite**

```python
import pandas as pd
from pandas.testing import assert_frame_equal
df = pd.DataFrame({
    'year': [2020, 2020, 2021, 2021, 2021],
    'month': [1, 1, 2, 2, 3],
    'day': [1, 2, 1, 2, 1],
    'value': [10, 20, 30, 40, 50]
})

def aggregate_data(df):
    aggregated_df = df.groupby(['year', 'month']).agg({'value': 'sum'})
    aggregated_df.columns = ['total_value']
    aggregated_df.sort_index(inplace=True)
    return aggregated_df

expected_df = pd.DataFrame({
    'total_value': [30, 70, 50]
}, index=pd.MultiIndex.from_tuples([(2020, 1), (2021, 2), (2021, 3)], names=['year', 'month']))
result_df = aggregate_data(df)
assert_frame_equal(result_df, expected_df)
```

In [3]:
def create_system_prompt(pydantic_object):
    common_system_prompt = textwrap.dedent("""
    You are a smart, helpful teaching assistant chatbot named Callisto.

    You are an expert Python programmer and have used all the most popular
    libraries for data analysis, machine learning, and artificial intelligence.

    You assist professors that teach courses about Python, data science, and machine learning
    to graduate students.

    Your task is to help professors produce practice questions to help students solidify 
    their understanding of specific topics.

    You must always generate questions that have more than one option as solution for the MultipleSelection question type.

    In your conversations with a professor you will be given a topic (string) and an
    expected difficulty level (integer)
    
    The difficulty will be a number between 1 and 3, with 1 corresponding to a request 
    for an easy question, and 3 for the most difficult question.
    
    Occasionaly the professor may ask you to do something like produce a similar question,
    or  make the question more difficult or easy. You need to assist the professor with the same. You can use the previous topic to do this.
    

    If the professor ask for more than one question in a single message, you need to apologize and 
    inform that you can only generate one question at a time. You need to also ask the professor to 
    put in a new message with the topic and difficulty to generate a new question.

    If the topic is available in the given information, refer that for giving a response to the professor. 
    You are encouraged to use any tools available to look up relevant information, only
    if necessary. 

    You will apologize if you're unable to generate an output that meets professor's requirement

    Your responses must always exactly match the specified JSON format with no extra words or content.

    You must always produce exactly one JSON object.
    
    {format_instructions}
    """)

    parser = PydanticOutputParser(pydantic_object=pydantic_object)
    return common_system_prompt.format(format_instructions=parser.get_format_instructions())

In [4]:
generate_and_parse_question(MultipleSelection, "pandas groupby with difficulty 2")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `search_course_content` with `pandas groupby`


[0m[36;1m[1;3m[Document(page_content="We're going to use this example data set to demonstrate the three steps in split apply combine. To begin, we'll start with the split step. In order to ask pandas to split the data for us, we use the group by method of a data frame. You see here that we're calling DF dot group by and we're passing the string A. This instructs pandas to construct groups of our data using the values from the A column. This is the most basic and often most used form of the group by method to split on the values of a single column. We can check the type of this GBA object. And we see here a very long type name but we're just going to refer to this as a group by for short. Once we have a group by object, there are a few things we can do with it. One thing we could do is we could ask to get the subset of data for a particular group. So here we're goin

{'description': 'Question where user is presented a prompt in `question_text` and \na list of `choices`. They are supposed to provide all answers that\napply (`solution`)\n\nAll questions must have a minimum of 3 options\n\nExamples\n--------\n{\n  "question_text": "When using pandas\' groupby method, which of the following statements are true?",\n  "difficulty": 2,\n  "topics": ["pandas", "data manipulation", "groupby"],\n  "choices": [\n    "The groupby method can only split data based on the values of a single column.",\n    "A groupby object allows you to get subsets of data for particular groups.",\n    "The groupby method can be used with the PD.grouper type to group data in time buckets.",\n    "Any column not included in the groupby must have a reduction function applied to it in the result.",\n    "The groupby method can only be used on numerical data types."\n  ],\n  "solution": [1, 2, 3]\n}',
 'properties': {'question_text': {'description': 'The main text of the question. Ma