- Structure prompt

In [20]:
from pydantic import BaseModel
from openai import OpenAI
from openai.lib._parsing import type_to_response_format_param
import json
from getpass import getpass
from jinja2 import Template

openai_api_key = getpass('OPENAI_API_KEY')
client = OpenAI(api_key=openai_api_key)

In [21]:
prompt_template = Template("""
You are going to be given a set of data. Refine it according to the following instruction.

[PERSONA]
```
You are an English teacher. Your students are basic English learners, Korean high school students.
Korean middle school students whose CEFR level is A2.
You must produce a set of material in order to teach them effectively.
The material should consist of [DESCRIPTION], [KEY_EXPRESSION], [SUB_EXPRESSION].
```
                           
[DESCRIPTION]
```
You must provide the "given description" as [DESCRIPTION].
e.g) • be동사의 긍정문과 부정문
    • 동사의 과거형 
```

[KEY_EXPRESSION]
```
Understand the sentence structure and create a sentence using the given grammar and the given sentences. Make it different from the given sentence while keeping the sentence structure. Stay related to 'Title' topic, but do not explicitly mention it.
Provide Korean translation using '~해요' tone. Make it natural.
e.g) Input: 'Title	Spend Smart, Save Smart'
	'동사의 과거형	We sold many things at the flea market.'
	 Output: 'I spent 5000 won to buy this diary.'
					 '이 다이어리를 사기 위해 5000원을 썼어요.'
```
                           
[SUB_EXPRESSION]
```
You must generate two new sentences using the grammar mentioned. Stay related to 'Title' topic, but do not explicitly mention it. Keep the word count and the difficulty similar to the given CEFR level.
Provide Korean translation using '~해요' tone. Make it natural.
e.g) Input: 'Title	Spend Smart, Save Smart'
	 '동사의 과거형	We sold many things at the flea market.'
	 Output: 'He bought fresh vegetables from the local farmer.'
			 'She picked up a pretty hat at the flea market.'
			 '그는 지역 농부에게서 신선한 야채를 샀어요'
			 '그녀는 벼룩시장에서 예쁜 모자를 집어들었어요.'
```
                           
[EXAMPLE]
```
Input:
    {
    "SUBJECT" : "영어",
    "PUBLISHER" : "능률(김)",
    "EDUCATION" : "2015",
    "LESSON" : "1"                    
    "TITLE" : "The Part You Play",
    "STRUCTURES" : "• 전치사의 목적어로서의 동명사\nInstead of running straight ahead, the player kindly passed the ball to Ethan so that he could score a touchdown."
    }

Output:
    {
    "SUBJECT" : "영어",
    "PUBLISHER" : "능률(김)",
    "EDUCATION" : "2015",
    "LESSON" : "1"                    
    "TITLE" : "The Part You Play",
    "DESCRIPTION" : "전치사의 목적어로서의 동명사",
    "KEY_EXPRESSION" : "I am good at solving puzzles.",
    "KEY_EXPRESSION_Kor" : "나는 퍼즐을 푸는 것을 잘해요.",                           
    "SUB_EXPRESSION" : "She is afraid of speaking in public. He is interested in ()"
    "SUB_EXPRESSION_Kor" : "그녀는 대중 앞에서 말하는 것을 두려워해요. 그는 ()에 관심이 있어요."
'''
                           
[Input]
```
SUBJECT : {{subject}}
PUBLISHER : {{publisher}}
EDUCATION : {{education}}
LESSON : {{lesson}}
TITLE: {{title}}
STRUCTURES: {{structures}}
```
                           
"""
)

In [22]:
prompt = prompt_template.render(
    subject = "영어",
    publisher = "능률(김)",
    education = "2015",
    lesson = "1",
    title = "The Part You Play",
    structures = "• 전치사의 목적어로서의 동명사\nInstead of running straight ahead, the player kindly passed the ball to Ethan so that he could score a touchdown."
)

In [23]:
print(prompt)


You are going to be given a set of data. Refine it according to the following instruction.

[PERSONA]
```
You are an English teacher. Your students are basic English learners, Korean high school students.
Korean middle school students whose CEFR level is A2.
You must produce a set of material in order to teach them effectively.
The material should consist of [DESCRIPTION], [KEY_EXPRESSION], [SUB_EXPRESSION].
```
                           
[DESCRIPTION]
```
You must provide the "given description" as [DESCRIPTION].
e.g) • be동사의 긍정문과 부정문
    • 동사의 과거형 
```

[KEY_EXPRESSION]
```
Understand the sentence structure and create a sentence using the given grammar and the given sentences. Make it different from the given sentence while keeping the sentence structure. Stay related to 'Title' topic, but do not explicitly mention it.
Provide Korean translation using '~해요' tone. Make it natural.
e.g) Input: 'Title	Spend Smart, Save Smart'
	'동사의 과거형	We sold many things at the flea market.'
	 Output

In [24]:
class Structure(BaseModel):
    SUBJECT : str
    PUBLISHER : str
    EDUCATION : int
    LESSON : int
    TITLE: str
    DESCRIPTION : str
    KEY_EXPRESSION : str
    KEY_EXPRESSION_Kor : str
    SUB_EXPRESSION : str
    SUB_EXPRESSION_Kor : str

In [25]:
response_format = type_to_response_format_param(Structure)

In [26]:
response_format

{'type': 'json_schema',
 'json_schema': {'schema': {'properties': {'SUBJECT': {'title': 'Subject',
     'type': 'string'},
    'PUBLISHER': {'title': 'Publisher', 'type': 'string'},
    'EDUCATION': {'title': 'Education', 'type': 'integer'},
    'LESSON': {'title': 'Lesson', 'type': 'integer'},
    'TITLE': {'title': 'Title', 'type': 'string'},
    'DESCRIPTION': {'title': 'Description', 'type': 'string'},
    'KEY_EXPRESSION': {'title': 'Key Expression', 'type': 'string'},
    'KEY_EXPRESSION_Kor': {'title': 'Key Expression Kor', 'type': 'string'},
    'SUB_EXPRESSION': {'title': 'Sub Expression', 'type': 'string'},
    'SUB_EXPRESSION_Kor': {'title': 'Sub Expression Kor', 'type': 'string'}},
   'required': ['SUBJECT',
    'PUBLISHER',
    'EDUCATION',
    'LESSON',
    'TITLE',
    'DESCRIPTION',
    'KEY_EXPRESSION',
    'KEY_EXPRESSION_Kor',
    'SUB_EXPRESSION',
    'SUB_EXPRESSION_Kor'],
   'title': 'Structure',
   'type': 'object',
   'additionalProperties': False},
  'name': 'S

In [27]:
def completion(prompt : str) -> str:
    response = client.beta.chat.completions.parse(
        model = 'o3-mini-2025-01-31',
        reasoning_effort='low',
        messages = [
            {"role" : "system", "content" : "You are an English teacher. Your students are basic English learners, Korean high school students."},
            {"role" : "system", "content" : "You must produce a set of material in order to teach them effectively."},
            {"role" : "system", "content" : "The material should consist of [DESCRIPTION], [KEY_EXPRESSION], [SUB_EXPRESSION]."},
            {"role" : "user", "content" : prompt}
        ],
        response_format = Structure,
    )
    return response.choices[0].message.parsed

In [28]:
response = completion(prompt)

In [29]:
response_output = json.dumps(response.dict(), ensure_ascii=False, indent=2)
print(response_output)

{
  "SUBJECT": "영어",
  "PUBLISHER": "능률(김)",
  "EDUCATION": 2015,
  "LESSON": 1,
  "TITLE": "The Part You Play",
  "DESCRIPTION": "전치사의 목적어로서의 동명사",
  "KEY_EXPRESSION": "I am interested in painting landscapes.",
  "KEY_EXPRESSION_Kor": "나는 풍경화를 그리는 것에 관심이 있어요.",
  "SUB_EXPRESSION": "They succeeded in finishing the project on time. He apologized for losing his wallet.",
  "SUB_EXPRESSION_Kor": "그들은 제시간에 프로젝트를 끝내는 데 성공했어요. 그는 지갑을 잃어버린 것에 대해 사과했어요."
}


# for문 돌려서 Batch API jsonl 파일 형식 만들기

In [36]:
def StructureCuriMake(data , output_filename : str) -> str:
    jsonl_data = []

    for i in range(len(data)):
        prompt = prompt_template.render(
            subject = data.loc[i,"제목"],
            publisher = data.loc[i,"출판사"],
            education = data.loc[i,"교육과정"],
            lesson = data.loc[i,"Lesson"],
            title = data.loc[i,"Title"],
            structures = data.loc[i,"Structure"],
        )

        structure_request = {
            "custom_id" : f"request-{i+1}",
            "method" : "POST",
            "url" : "/v1/chat/completions",
            "body" : {
                "model" : "o3-mini-2025-01-31",
                "reasoning_effort" : "low",
                "messages" : [
                    {"role" : "system", "content" : "You are an English teacher. Your students are basic English learners, Korean high school students."},
                    {"role" : "system", "content" : "You must produce a set of material in order to teach them effectively."},
                    {"role" : "system", "content" : "The material should consist of [DESCRIPTION], [KEY_EXPRESSION], [SUB_EXPRESSION]."},
                    {"role": "user", "content": prompt}
                ],
                "response_format" : response_format
            }
        }

        jsonl_data.append(structure_request)

    with open(output_filename, 'w', encoding='utf-8') as jsonl_file:
        for item in jsonl_data:
            jsonl_file.write(json.dumps(item, ensure_ascii=False) + '\n')

    print(f'JSONL 파일 생성 완료 : {output_filename}-{i+1}')

In [37]:
import pandas as pd

df = pd.read_csv("DB_Structure.csv")

In [38]:
df.columns

Index(['Unnamed: 0', '제목', '출판사', '교육과정', 'Lesson', 'Title', 'Structure'], dtype='object')

In [39]:
df.head()

Unnamed: 0.1,Unnamed: 0,제목,출판사,교육과정,Lesson,Title,Structure
0,0,영어,능률(김),2015,1,The Part You Play,• 전치사의 목적어로서의 동명사\nInstead of running straight...
1,1,영어,능률(김),2015,2,The Power of Creativity,• 명사를 수식하는 과거분사(구)\nThe giant pictures made fr...
2,2,영어,능률(김),2015,3,Sound Life,"• 사역동사+목적어+동사원형\nHas a painting, a movie, or a..."
3,3,영어,능률(김),2015,4,Toward a Better World,• 관계부사\nI am so glad that this family now has ...
4,4,영어,능률(김),2015,5,What Matters Most,"• 분사구문\nReaching the hermit’s hut, the king fo..."


In [40]:
df['Structure'][0]

'• 전치사의 목적어로서의 동명사\nInstead of running straight ahead, the player kindly passed the ball to Ethan so that he could score a touchdown.'

In [41]:
StructureCuriMake(df, output_filename="Structure_High_batch.jsonl")

JSONL 파일 생성 완료 : Structure_High_batch.jsonl-329
