In [60]:
from openai import OpenAI
from dotenv import load_dotenv
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import json
import time
import requests

load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

In [10]:

response = client.responses.create(
    model="gpt-4.1",
    tools=[{"type": "web_search_preview"}],
    input="what's the most popular item at costco right now? back this up with statistics and data. Also tell me how recent your data can be."
)

print(response.output_text)

As of May 2025, Costco's most popular item remains its Kirkland Signature Bath Tissue. The warehouse retailer sells over a billion rolls annually, generating nearly $500 million in revenue. This popularity is attributed to its affordability—approximately $23 for 30 rolls—and quality, being thicker and more absorbent than many other brands. ([fool.com](https://www.fool.com/money/personal-finance/articles/whats-costcos-best-selling-product-the-answer-might-surprise-you/?utm_source=openai))

Following closely is Costco's rotisserie chicken, priced at $4.99 each. The company sells about 100 million of these chickens every year, equating to approximately 150 per store daily. Despite being sold at a loss, this item attracts customers who often purchase additional products during their visit. ([fool.com](https://www.fool.com/money/personal-finance/articles/whats-costcos-best-selling-product-the-answer-might-surprise-you/?utm_source=openai))

In terms of regional preferences, a 2024 analysis i

In [49]:
df = pd.read_csv('data/fetchGPT.csv')
df['PRODUCT_NUMBER']
temp = df.head(100)
temp

Unnamed: 0,ORIGINAL_ITEM_TEXT,BARCODE_UPC,FIDO,DESCRIPTION,PRODUCT_NUMBER,SAMPLE_STORE,SAMPLE_RECEIPT,ITEM_COUNT
0,regular,,a9af82d3-5ac8-4985-8ae9-58a4c4558798,Fuel,,COSTCO,1be719f0-7a3d-4d83-a428-5716f7920808,361761
1,strawberries,,d18956ad-4eaa-408d-9620-26256eed1e4f,Strawberries,27003.0,COSTCO,810db7d0-8239-436c-8a3a-b23325c28258,144821
2,bananas,,45711fa9-1bfe-4bee-98da-66fd959a9ec4,Fresh Fruits,30669.0,COSTCO,a7bec3e8-9849-430a-8796-f28b5522b73d,120506
3,***kswtr40pk,,97f8b18c-640d-4105-9a2f-97f726996c5b,Kirkland Signature Purified Drinking Water,782796.0,COSTCO,666748f3-a93a-4d98-bd62-3f04247da9bf,120294
4,rotisserie,,175c65d7-e9ce-4108-838c-8148098892ec,Kirkland Signature Rotisserie Chicken,87745.0,COSTCO,81a4cc3a-afb6-4b54-9225-2ba759bdc169,113579
5,ca redemp va,69000000000.0,5251f858-25b6-4941-8fb0-fc12ab3d70de,Bottle Deposits,6900000000.0,COSTCO,d0b897a4-67a1-4b91-988f-d96305882ddf,82392
6,ks cage free,,12a677be-8339-42cc-9c15-495d6a80c0c0,"KIRKLAND SIGNATURE LARGE EGGS, CAGE FREE, 2 DOZEN",637598.0,COSTCO,814b4499-64e5-476c-b9ae-14bd721f9015,75854
7,18 ct eggs,,5875bf31-3674-4ceb-a4cc-437bc46ec250,Sauders Eggs Large White 18 Ct,1008.0,COSTCO,a5399faa-50ee-4570-91d1-ee542b22ef10,72299
8,premium,,a9af82d3-5ac8-4985-8ae9-58a4c4558798,Fuel,,COSTCO,f213c6e0-f7fa-4ca5-b66a-38613f0fa2ba,69186
9,*kswtr40pk,,97f8b18c-640d-4105-9a2f-97f726996c5b,Kirkland Signature Purified Drinking Water,782796.0,COSTCO,60d502fc-ce7e-46aa-8e27-df026db21020,61579


In [None]:

# --- Assumptions: ---
# 1. `temp` is your pandas DataFrame with columns:
#      - 'ORIGINAL_ITEM_TEXT'
#      - 'SAMPLE_STORE'
#      - optionally 'PRODUCT_NUMBER'
# 2. `client` is already initialized and authenticated, e.g.:
#      client = OpenAI(api_key="YOUR_KEY")

results = []

for idx, row in temp.iterrows():
    original_item_text = row['ORIGINAL_ITEM_TEXT']
    sample_store      = row['SAMPLE_STORE']
    product_number    = row.get('PRODUCT_NUMBER', 'N/A')
    
    prompt = f'''
    # Role and Objective
    You are an AI product description analyzer tasked with standardizing and expanding abbreviated product descriptions into clear, structured data. Your goal is to identify brands, categories, and expand abbreviated text while maintaining accuracy.

    # Instructions
    - Analyze the given abbreviated product description
    - Expand abbreviations without adding interpretive content
    - Identify the most likely brand based on text and product number
    - Categorize the product based on expanded description
    - Provide confidence scores for brand and category predictions

    ## Sub-categories for more detailed instructions
    - Expand only what is directly implied in the text (e.g., "gl" to "glass", "bl" to "bottle"), unless there are specific annotations like counts or sizes that can be verified through web search, or brand. We like the format of
    - Assign confidence scores from high/medium/low based on clarity of information
    - Consider store context when determining brand and category

    # Reasoning Steps
    1. Expand Abbreviations
       - Identify common product abbreviations
       - Convert to standard product terminology
       - Maintain original meaning without interpretation

    2. Brand Analysis
       - Look for brand indicators in text
       - Consider store-specific context
       - Assess confidence in brand identification

    3. Category Assignment
       - Analyze product characteristics
       - Determine product type
       - Assign confidence based on clarity

    # Output Format
    JSON structure with:
    {{
        "brand": "Predicted brand",
        "brand_score": "Confidence score (high/medium/low)",
        "category": "Predicted category",
        "category_score": "Confidence score (high/medium/low)",
        "expanded_description": "Expanded product description",
        "reasoning": "Reasoning for predictions and description"
    }}

    # Examples
    ## Example 1
    Input: "campari 12oz gl bl"
    Output: {{
        "brand": "Campari",
        "brand_score": high,
        "category": "Spirits",
        "category_score": high,
        "expanded_description": "Campari 12oz Glass Bottle",
        "reasoning": "Clear brand name 'Campari' present. Common abbreviations 'gl bl' clearly indicate glass bottle. Spirit category evident from brand."
    }}

    ## Example 2
    Input: "18 ct eggs"
    Output: {{
        "brand": "Sauders",
        "brand_score": high,
        "category": "Eggs",
        "category_score": high,
        "expanded_description": "Sauders Eggs Large White 18 Ct	",
        "reasoning": "Product number 1008 at retailer COSTCO leads to Sauders Large Eggs."
    }}

    # Context
    Here is the product information:
    - Original Item Text: "{original_item_text}"
    - Store: "{sample_store}"
    - Product Number: "{product_number}" (if applicable)

    # Final instructions and prompt to think step by step
    1. First, expand only the abbreviated terms in the original text
    2. Then identify brand based on expanded text
    3. Finally, categorize the product based on the complete information
    4. Provide clear reasoning for each decision
    5. Return structured JSON with confidence scores
    '''
    
    # Common Abbreviations and Terms
    # [Reserved for future implementation]

    # Store-Specific Categories
    # [Reserved for future implementation]

    # Product Naming Conventions
    # [Reserved for future implementation]


    # Send the prompt to the GPT model
    response = client.responses.create(
        model="gpt-4.1",
        tools=[{"type": "web_search_preview"}],
        input=prompt, 
        temperature=0.0
    )
    
    # Extract the JSON payload from the response
    output_text = response.output_text
    start = output_text.find('{')
    end   = output_text.rfind('}') + 1
    
    if start == -1 or end <= start:
        print(f"[Row {idx}] No JSON found. Output preview:\n{output_text[:200]}...\n")
        continue
    
    json_text = output_text[start:end]
    try:
        parsed = json.loads(json_text)
    except json.JSONDecodeError as e:
        print(f"[Row {idx}] JSON parsing error: {e}\nPayload:\n{json_text}\n")
        continue
    
    # Limit reasoning to first 50 words
    reasoning_full = parsed.get("reasoning", "")
    reasoning_words = reasoning_full.split()
    reasoning_snip  = " ".join(reasoning_words[:50])
    
    results.append({
        "ORIGINAL_ITEM_TEXT":   original_item_text,
        "SAMPLE_STORE":         sample_store,
        "PRODUCT_NUMBER":       product_number,
        "BRAND":                parsed.get("brand", "N/A"),
        "BRAND_SCORE":          parsed.get("brand_score", "N/A"),
        "CATEGORY":             parsed.get("category", "N/A"),
        "CATEGORY_SCORE":       parsed.get("category_score", "N/A"),
        "EXPANDED_DESCRIPTION": parsed.get("expanded_description", "N/A"),
        "REASONING_SNIPPET":    reasoning_snip
    })
    
    # Throttle requests to avoid rate limits
    time.sleep(0.2)

# Convert to DataFrame
results_df = pd.DataFrame(results)

# (Optional) Save to CSV
# results_df.to_csv('data/fetchGPT_results.csv', index=False)

# Show the first few rows
results_df.head()

Unnamed: 0,ORIGINAL_ITEM_TEXT,SAMPLE_STORE,PRODUCT_NUMBER,BRAND,BRAND_SCORE,CATEGORY,CATEGORY_SCORE,EXPANDED_DESCRIPTION,REASONING_SNIPPET
0,regular,COSTCO,,Unknown,low,General Grocery,low,Regular,"The original item text 'regular' contains no abbreviations to expand and provides no direct indication of brand or product type. There are no product numbers or additional context to infer a specific brand or category. The term 'regular' is too generic to assign a confident category or brand, so both"
1,strawberries,COSTCO,27003.0,Unknown,low,Fresh Produce,high,"Premium Strawberries, 2 lb Clamshell","The product description 'strawberries' corresponds to Costco's item number 27003, which is listed as 'Premium Strawberries, 2 lbs' in a clamshell container. ([costcobusinessdelivery.com](https://www.costcobusinessdelivery.com/premium-strawberries%2C-2-lbs.product.11576937.html?utm_source=openai)) The term 'Premium' is part of the product name, not an indication of a specific brand. Therefore, the brand cannot be determined from the available information. The"
2,bananas,COSTCO,30669.0,Dole,medium,Fresh Produce,high,"Bananas, Product of Guatemala","The product description 'bananas' is straightforward and does not contain abbreviations requiring expansion. The product number 30669 at Costco corresponds to 'Bananas, Product of Guatemala' as per the search results. While the brand is not explicitly mentioned, Dole is a prominent supplier of bananas to Costco, leading to a medium"
3,***kswtr40pk,COSTCO,782796.0,Kirkland Signature,high,Bottled Water,high,Kirkland Signature Water 40 Pack,The product number 782796 at Costco corresponds to Kirkland Signature purified water in a 40-pack of 500ml bottles. The abbreviation 'kswtr40pk' expands to 'Kirkland Signature Water 40 Pack'.
4,rotisserie,COSTCO,87745.0,Kirkland Signature,high,Prepared Foods,high,Kirkland Signature Whole Rotisserie Chicken,"The term 'rotisserie' refers to a cooking method involving roasting meat on a rotating spit. At Costco, the product number 87745 corresponds to their Whole Rotisserie Chicken, which is sold under the Kirkland Signature brand. This product is a ready-to-eat item, fitting into the 'Prepared Foods' category. The information is"


In [53]:
results_df

Unnamed: 0,ORIGINAL_ITEM_TEXT,SAMPLE_STORE,PRODUCT_NUMBER,BRAND,BRAND_SCORE,CATEGORY,CATEGORY_SCORE,EXPANDED_DESCRIPTION,REASONING_SNIPPET
0,regular,COSTCO,,Unknown,low,General Grocery,low,Regular,"The original item text 'regular' contains no abbreviations to expand and provides no direct indication of brand or product type. There are no product numbers or additional context to infer a specific brand or category. The term 'regular' is too generic to assign a confident category or brand, so both"
1,strawberries,COSTCO,27003.0,Unknown,low,Fresh Produce,high,"Premium Strawberries, 2 lb Clamshell","The product description 'strawberries' corresponds to Costco's item number 27003, which is listed as 'Premium Strawberries, 2 lbs' in a clamshell container. ([costcobusinessdelivery.com](https://www.costcobusinessdelivery.com/premium-strawberries%2C-2-lbs.product.11576937.html?utm_source=openai)) The term 'Premium' is part of the product name, not an indication of a specific brand. Therefore, the brand cannot be determined from the available information. The"
2,bananas,COSTCO,30669.0,Dole,medium,Fresh Produce,high,"Bananas, Product of Guatemala","The product description 'bananas' is straightforward and does not contain abbreviations requiring expansion. The product number 30669 at Costco corresponds to 'Bananas, Product of Guatemala' as per the search results. While the brand is not explicitly mentioned, Dole is a prominent supplier of bananas to Costco, leading to a medium"
3,***kswtr40pk,COSTCO,782796.0,Kirkland Signature,high,Bottled Water,high,Kirkland Signature Water 40 Pack,The product number 782796 at Costco corresponds to Kirkland Signature purified water in a 40-pack of 500ml bottles. The abbreviation 'kswtr40pk' expands to 'Kirkland Signature Water 40 Pack'.
4,rotisserie,COSTCO,87745.0,Kirkland Signature,high,Prepared Foods,high,Kirkland Signature Whole Rotisserie Chicken,"The term 'rotisserie' refers to a cooking method involving roasting meat on a rotating spit. At Costco, the product number 87745 corresponds to their Whole Rotisserie Chicken, which is sold under the Kirkland Signature brand. This product is a ready-to-eat item, fitting into the 'Prepared Foods' category. The information is"
5,ca redemp va,COSTCO,6900000000.0,Costco,high,Recycling Fee,high,California Redemption Value,"The abbreviation 'ca redemp va' expands to 'California Redemption Value,' a fee applied to beverage containers in California to encourage recycling. This fee is commonly listed on receipts at retailers like Costco. The term 'California Redemption Value' is specific to California's recycling program, and 'va' likely stands for 'value.' Given"
6,ks cage free,COSTCO,637598.0,Kirkland Signature,high,Eggs,high,"Kirkland Signature Cage Free Eggs, 24 Count","The abbreviation 'ks' is expanded to 'Kirkland Signature', Costco's private label brand. 'Cage free' refers to the type of eggs. The product number 637598 corresponds to 'Kirkland Signature Large Eggs, Cage Free, 2 Dozen' at Costco, confirming the product details."
7,18 ct eggs,COSTCO,1008.0,Kirkland Signature,high,Eggs,high,"Kirkland Signature Large Eggs, Grade A, 18 Count","The original description '18 ct eggs' indicates a package of 18 eggs. At Costco, item number 1008 corresponds to 'Large Eggs, Grade A, 18 ct' ([costcobusinessdelivery.com](https://www.costcobusinessdelivery.com/large-eggs%2C-grade-a%2C-18-ct.product.2001138745.html?utm_source=openai)). While the brand isn't explicitly stated in the product listing, Costco frequently sells eggs under its Kirkland Signature brand ([costcobusinessdelivery.com](https://www.costcobusinessdelivery.com/kirkland-signature-large-eggs%2C-free-range%2C-2-dozen.product.2001165640.html?utm_source=openai)). Therefore, it's reasonable to"
8,premium,COSTCO,,Premium,low,Uncategorized,low,Premium,"The original item text 'premium' contains no abbreviations to expand. There is no product number or additional context to identify a specific brand or product type. 'Premium' could refer to a brand or a quality descriptor, but without further information, both brand and category assignments are uncertain."
9,*kswtr40pk,COSTCO,782796.0,Kirkland Signature,high,Bottled Water,high,Kirkland Signature Water 40 Pack,"The product code '782796' corresponds to Kirkland Signature purified water in a 40-pack, as confirmed by multiple sources. The abbreviation 'kswtr40pk' expands to 'Kirkland Signature Water 40 Pack'. Given that Kirkland Signature is Costco's private label, the brand identification is highly confident. The product is categorized as bottled water based"


In [None]:
# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps
# Output Format
# Examples
## Example 1
# Context
# Final instructions and prompt to think step by step



In [55]:
# Merge the results_df with temp DataFrame
merged_df = pd.merge(
    temp, 
    results_df,
    on=['ORIGINAL_ITEM_TEXT', 'SAMPLE_STORE', 'PRODUCT_NUMBER'],
    how='left'
)

# Display the merged results
merged_df

# Optionally save to CSV
merged_df.to_csv('data/merged_results.csv', index=False)