# Core Dietary Guidelines Summarization

This notebook extracts text from a file in the processed data folder, summarizes it using OpenAI's GPT-4, and saves the summary.

In [47]:
import openai
import os
from dotenv import load_dotenv

In [48]:
# Load environment variables
load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')

In [49]:
# Load the text file from processed data
with open('processed_data/cleaned_dietary_guidelines.txt', 'r', encoding='utf-8') as file:
    text = file.read()

In [50]:
# Split text into chunks to handle long sequences
chunk_size = 8000  
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

In [51]:
print(f"Total characters: {len(chunks)*chunk_size}")

Total characters: 96000


In [52]:
system_prompt = """You will be given a set of dietary guideline texts. Based on these, generate similar content that follows the same structure and tone, with an emphasis on practical, food- and recipe-based recommendations. Follow these instructions:

Omit any explicit mention of the target population.

Do not include research basis or references to levels of evidence.

Focus on clear, actionable dietary recommendations with food examples (e.g., “Use olive oil instead of butter” or “Include legumes like lentils and chickpeas in stews”).

Emphasize preparation methods, ingredient swaps, and daily food practices. Write in an instructional tone.

Prioritize realistic advice, especially related to meals and cooking habits.

Keep each guideline section concise and clear.

Do not use overly technical terms or abstract nutritional concepts.

You will receive the base text in sections. For each section, generate a rewritten version following the above rules"""

In [53]:
def messages_for(chunk):
    messages=[
            {"role": "system", "content": system_prompt },
            {"role": "user", "content": f"Please summarize the following text, limiting your answer to 300 words with key points:\n\n{chunk}"}
        ]
    return messages

In [54]:
# Process each chunk and collect summaries
summaries = []
for i, chunk in enumerate(chunks):
    print(f"Processing chunk {i+1}/{len(chunks)}...")
    
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages= messages_for(chunk),
        temperature=0.3,
        max_tokens=500
    )
    
    summary = response.choices[0].message.content
    summaries.append(summary)

Processing chunk 1/12...
Processing chunk 2/12...
Processing chunk 3/12...
Processing chunk 4/12...
Processing chunk 5/12...
Processing chunk 6/12...
Processing chunk 7/12...
Processing chunk 8/12...
Processing chunk 9/12...
Processing chunk 10/12...
Processing chunk 11/12...
Processing chunk 12/12...


In [55]:
# Combine all summaries
combined_summary = ' '.join(summaries)

# Save the summary to processed data
with open('processed_data/core_dietary_guidelines.txt', 'w', encoding='utf-8') as file:
    file.write(combined_summary)