# Save ItemDataLoader DataFrame

This notebook loads item data using ItemDataLoader and saves the resulting DataFrames to CSV files for testing and inspection purposes.

## Parameters
- `item_pdf_path`: Path to the input item CSV file (default: `data/offer_master_data.csv`)
- `output_path`: Path to the output CSV file (default: `tests/item_data_output/item_df.csv`)

In [1]:
import os
import sys

# Add project root to path
project_root = os.path.dirname(os.getcwd())
sys.path.insert(0, project_root)
os.chdir(project_root)

import pandas as pd
from services.item_data_loader import ItemDataLoader

  from .autonotebook import tqdm as notebook_tqdm


## 1. Set Parameters

Configure input and output paths here.

In [7]:
# Input parameters - modify these as needed
item_pdf_path = 'data/offer_master_data.csv'  # Input item CSV file path
output_path = 'tests/item_data_output/item_df.csv'  # Output CSV file path

print(f"Input item PDF path: {item_pdf_path}")
print(f"Output path: {output_path}")

Input item PDF path: data/offer_master_data.csv
Output path: tests/item_data_output/item_df.csv


## 2. Initialize ItemDataLoader

In [8]:
loader = ItemDataLoader(data_source='local')

## 3. Load and Prepare Items

If `load_and_prepare_items()` fails due to pandas version issues, we'll load step by step.

In [9]:
try:
    item_df, alias_df = loader.load_and_prepare_items(offer_data_path=item_pdf_path)
    print(f"Successfully loaded: item_df={item_df.shape}, alias_df={alias_df.shape}")
except Exception as e:
    print(f"Error with load_and_prepare_items: {e}")
    print("Loading step by step...")
    
    # Step 1: Load raw data
    raw_data = loader.load_raw_data(offer_data_path=item_pdf_path)
    print(f"Raw data: {raw_data.shape}")
    
    # Step 2: Normalize columns
    normalized_data = loader.normalize_columns(raw_data)
    print(f"Normalized: {normalized_data.shape}")
    
    # Step 3: Filter by domain
    filtered_data = loader.filter_by_domain(normalized_data)
    print(f"Filtered: {filtered_data.shape}")
    
    # Step 4: Load alias rules
    alias_pdf = loader.load_alias_rules()
    print(f"Alias rules: {alias_pdf.shape}")
    
    # Step 5: Skip expand_build_aliases if it fails (uses problematic query)
    try:
        alias_pdf = loader.expand_build_aliases(alias_pdf, filtered_data)
    except Exception as e2:
        print(f"Skipping expand_build_aliases: {e2}")
    
    # Step 6: Create bidirectional aliases
    alias_pdf = loader.create_bidirectional_aliases(alias_pdf)
    print(f"After bidirectional: {alias_pdf.shape}")
    
    # Step 7: Apply cascading alias rules
    with_aliases = loader.apply_cascading_alias_rules(filtered_data, alias_pdf)
    print(f"After cascading: {with_aliases.shape}")
    
    # Step 8: Add user defined entities
    with_user_entities = loader.add_user_defined_entities(with_aliases, None)
    
    # Step 9: Add domain name column
    with_domain_names = loader.add_domain_name_column(with_user_entities)
    
    # Step 10: Filter test items
    item_df = loader.filter_test_items(with_domain_names)
    alias_df = alias_pdf
    
    print(f"Final: item_df={item_df.shape}, alias_df={alias_df.shape}")

Successfully loaded: item_df=(37763, 14), alias_df=(431, 8)


## 4. Query Examples

In [11]:
# Find all aliases for a specific item
item_name = f'iPhone 17 Pro'
item_aliases = item_df[item_df['item_nm'] == item_name]

print(f"Aliases for '{item_name}':")
item_aliases[['item_nm', 'item_nm_alias', 'item_dmn']].drop_duplicates()

Aliases for 'iPhone 17 Pro':


Unnamed: 0,item_nm,item_nm_alias,item_dmn
35215,iPhone 17 Pro,iPhone 17 Pro,E
35216,iPhone 17 Pro,iphone 17 Pro,E
35217,iPhone 17 Pro,아이폰 17 Pro,E
35218,iPhone 17 Pro,IPHONE 17 Pro,E
35219,iPhone 17 Pro,iPhone 17 프로,E
35220,iPhone 17 Pro,iphone 17 프로,E
35221,iPhone 17 Pro,아이폰 17 프로,E
35222,iPhone 17 Pro,IPHONE 17 프로,E
35223,iPhone 17 Pro,iPhone 17 PRO,E
35224,iPhone 17 Pro,iPhone 17 pro,E
