# Pikachu Dataset File Renamer

This notebook renames your image files to match the filenames referenced in their JSON annotations.

**Problem:** Roboflow exports create a mismatch:
- Actual files: `not_pikachu_00001_jpg_rf_732a2...jpg` (underscores)
- JSON references: `not_pikachu_00001_jpg.rf.732a2...jpg` (dots)

**Solution:** Rename the actual files to match the JSON references.

---

## Instructions:

1. **Run Cell 1** to import libraries
2. **Run Cell 2** to configure paths (update if needed)
3. **Run Cell 3** for DRY RUN (preview only - no changes)
4. **Run Cell 4** to ACTUALLY RENAME files (after reviewing preview)

---

## Cell 1: Import Libraries

In [1]:
import json
import shutil
from pathlib import Path
from typing import List, Tuple

print("‚úì Libraries imported successfully")

‚úì Libraries imported successfully


## Cell 2: Configuration

Update these paths if your structure is different:

In [2]:
# Configuration
BASE_PATH = "pikachu_pics"  # Folder adjacent to this notebook
FOLDERS_TO_PROCESS = ["train", "valid", "test"]  # Subfolders to fix

# Verify the path exists
base_path = Path(BASE_PATH)
if not base_path.exists():
    print(f"‚ùå ERROR: Path does not exist: {base_path.absolute()}")
    print(f"\nCurrent directory: {Path.cwd()}")
    print(f"\nMake sure '{BASE_PATH}' folder is in the same directory as this notebook.")
else:
    print(f"‚úì Base path found: {base_path.absolute()}")
    print(f"\nFolders to process: {FOLDERS_TO_PROCESS}")
    
    # Show folder structure
    print("\nFolder structure:")
    for folder in FOLDERS_TO_PROCESS:
        folder_path = base_path / folder
        if folder_path.exists():
            json_count = len(list(folder_path.rglob("*.json")))
            img_count = len(list(folder_path.rglob("*.jpg"))) + len(list(folder_path.rglob("*.png")))
            print(f"  ‚úì {folder}/ - {img_count} images, {json_count} JSON files")
        else:
            print(f"  ‚ö†Ô∏è  {folder}/ - NOT FOUND")

‚úì Base path found: c:\Users\Tess\geoai_projects\lab-9\pikachu_pics

Folders to process: ['train', 'valid', 'test']

Folder structure:
  ‚úì train/ - 280 images, 281 JSON files
  ‚úì valid/ - 60 images, 61 JSON files
  ‚úì test/ - 60 images, 61 JSON files


## Cell 3: DRY RUN (Preview Only)

**Run this first to see what will be renamed WITHOUT making any changes.**

In [3]:
def analyze_and_rename_files(base_path: Path, folders: List[str], dry_run: bool = True) -> Tuple[int, int, int]:
    """
    Analyze and optionally rename image files to match JSON references.
    
    Returns:
        Tuple of (files_renamed, files_skipped, errors)
    """
    total_renamed = 0
    total_skipped = 0
    total_errors = 0
    
    print("="*80)
    print(f"PIKACHU DATASET FILE RENAMER - {'DRY RUN (PREVIEW)' if dry_run else 'LIVE MODE (RENAMING)'}")
    print("="*80)
    print(f"Base path: {base_path.absolute()}")
    print("="*80)
    print()
    
    for folder_name in folders:
        folder_path = base_path / folder_name
        
        if not folder_path.exists():
            print(f"‚ö†Ô∏è  Folder not found: {folder_name} (skipping)\n")
            continue
        
        print(f"üìÅ Processing: {folder_name}/")
        print("-" * 80)
        
        # Find all individual JSON files (skip COCO)
        json_files = [
            f for f in folder_path.rglob("*.json")
            if "_annotations" not in f.name.lower() and "coco" not in f.name.lower()
        ]
        
        print(f"Found {len(json_files)} annotation files")
        
        folder_renamed = 0
        folder_skipped = 0
        folder_errors = 0
        
        for json_file in json_files:
            try:
                # Read the JSON to get expected filename
                with open(json_file, 'r') as f:
                    data = json.load(f)
                
                if 'imagePath' not in data:
                    folder_skipped += 1
                    continue
                
                expected_name = data['imagePath']
                expected_path = json_file.parent / expected_name
                
                # Find the actual image file
                actual_path = None
                for ext in ['.jpg', '.JPG', '.png', '.PNG', '.jpeg', '.JPEG']:
                    test_path = json_file.with_suffix(ext)
                    if test_path.exists():
                        actual_path = test_path
                        break
                
                if actual_path is None:
                    if folder_errors < 3:  # Only show first 3 errors per folder
                        print(f"  ‚ö†Ô∏è  No image found for: {json_file.name}")
                    folder_errors += 1
                    continue
                
                # Check if renaming is needed
                if actual_path.name == expected_name:
                    folder_skipped += 1
                    continue
                
                # Check if target already exists
                if expected_path.exists() and expected_path != actual_path:
                    if folder_errors < 3:
                        print(f"  ‚ö†Ô∏è  Target exists: {expected_name}")
                    folder_errors += 1
                    continue
                
                # Show what will be renamed (limit output)
                if folder_renamed < 5:  # Show first 5 per folder
                    print(f"\n  {'[PREVIEW]' if dry_run else '[RENAME]'}")
                    print(f"    From: {actual_path.name}")
                    print(f"    To:   {expected_name}")
                
                # Perform the rename if not dry run
                if not dry_run:
                    shutil.move(str(actual_path), str(expected_path))
                    if folder_renamed < 5:
                        print(f"    ‚úì Success")
                
                folder_renamed += 1
                
            except Exception as e:
                if folder_errors < 3:
                    print(f"  ‚ö†Ô∏è  Error processing {json_file.name}: {str(e)}")
                folder_errors += 1
        
        # Folder summary
        if folder_renamed > 5:
            print(f"\n  ... and {folder_renamed - 5} more files")
        
        print(f"\n  Summary for {folder_name}/:")
        print(f"    Files to rename:   {folder_renamed}")
        print(f"    Already correct:   {folder_skipped}")
        print(f"    Errors:            {folder_errors}")
        print()
        
        total_renamed += folder_renamed
        total_skipped += folder_skipped
        total_errors += folder_errors
    
    # Overall summary
    print("="*80)
    print("OVERALL SUMMARY:")
    print(f"  Total files to rename:   {total_renamed}")
    print(f"  Total already correct:   {total_skipped}")
    print(f"  Total errors:            {total_errors}")
    print("="*80)
    
    return total_renamed, total_skipped, total_errors


# Run DRY RUN
print("\nüîç Running preview to see what will be renamed...\n")
renamed, skipped, errors = analyze_and_rename_files(base_path, FOLDERS_TO_PROCESS, dry_run=True)

if renamed > 0:
    print(f"\n‚ö†Ô∏è  This was a PREVIEW - no files were changed.")
    print(f"\n‚úÖ {renamed} files need to be renamed.")
    print(f"\nüëâ Run Cell 4 below to perform the actual renaming.")
elif skipped > 0:
    print(f"\n‚úÖ All {skipped} files already match! No renaming needed.")
else:
    print(f"\n‚ö†Ô∏è  No files found to process.")


üîç Running preview to see what will be renamed...

PIKACHU DATASET FILE RENAMER - DRY RUN (PREVIEW)
Base path: c:\Users\Tess\geoai_projects\lab-9\pikachu_pics

üìÅ Processing: train/
--------------------------------------------------------------------------------
Found 280 annotation files

  Summary for train/:
    Files to rename:   0
    Already correct:   280
    Errors:            0

üìÅ Processing: valid/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for valid/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

üìÅ Processing: test/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for test/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

OVERALL SUMMARY:
  Total files to rename:   0
  Total already correct:   400
  Total errors:            0

‚úÖ All 400 files already match! No r

## Cell 4: ACTUALLY RENAME FILES

**‚ö†Ô∏è WARNING: This will modify your files!**

Only run this after reviewing the preview above. This will:
- Rename image files to match JSON references
- Make the changes permanent
- Allow you to use the professor's original code

**Run this cell only once after verifying the dry run looks correct.**

In [4]:
print("\n‚ö†Ô∏è  LIVE MODE - This will actually rename files!\n")
print("Press Enter to continue, or Ctrl+C to cancel...")
input()

print("\nüîß Renaming files...\n")
renamed, skipped, errors = analyze_and_rename_files(base_path, FOLDERS_TO_PROCESS, dry_run=False)

if renamed > 0:
    print(f"\n‚úÖ SUCCESS! {renamed} files have been renamed.")
    print(f"\nüìù Your image files now match the JSON references.")
    print(f"\nüéØ You can now use the professor's original data loading code!")
elif skipped > 0:
    print(f"\n‚úÖ All {skipped} files already matched! No changes were needed.")
else:
    print(f"\n‚ö†Ô∏è  No files were processed.")

if errors > 0:
    print(f"\n‚ö†Ô∏è  Note: {errors} errors occurred. Check the output above for details.")


‚ö†Ô∏è  LIVE MODE - This will actually rename files!

Press Enter to continue, or Ctrl+C to cancel...

üîß Renaming files...

PIKACHU DATASET FILE RENAMER - LIVE MODE (RENAMING)
Base path: c:\Users\Tess\geoai_projects\lab-9\pikachu_pics

üìÅ Processing: train/
--------------------------------------------------------------------------------
Found 280 annotation files

  Summary for train/:
    Files to rename:   0
    Already correct:   280
    Errors:            0

üìÅ Processing: valid/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for valid/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

üìÅ Processing: test/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for test/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

OVERALL SUMMARY:
  Total files to rename:   0
  Total already corre

## Cell 5: Verification (Optional)

Run this to verify all files now match correctly:

In [5]:
print("\nüîç Verifying all files match...\n")
renamed, skipped, errors = analyze_and_rename_files(base_path, FOLDERS_TO_PROCESS, dry_run=True)

if renamed == 0 and skipped > 0:
    print(f"\n‚úÖ‚úÖ‚úÖ PERFECT! All {skipped} files match correctly!")
    print(f"\nüéâ You're ready to use the professor's code!")
elif renamed > 0:
    print(f"\n‚ö†Ô∏è  {renamed} files still need renaming. Run Cell 4 again.")
else:
    print(f"\n‚ö†Ô∏è  No files found to verify.")


üîç Verifying all files match...

PIKACHU DATASET FILE RENAMER - DRY RUN (PREVIEW)
Base path: c:\Users\Tess\geoai_projects\lab-9\pikachu_pics

üìÅ Processing: train/
--------------------------------------------------------------------------------
Found 280 annotation files

  Summary for train/:
    Files to rename:   0
    Already correct:   280
    Errors:            0

üìÅ Processing: valid/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for valid/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

üìÅ Processing: test/
--------------------------------------------------------------------------------
Found 60 annotation files

  Summary for test/:
    Files to rename:   0
    Already correct:   60
    Errors:            0

OVERALL SUMMARY:
  Total files to rename:   0
  Total already correct:   400
  Total errors:            0

‚úÖ‚úÖ‚úÖ PERFECT! All 400 files match correctly!

üéâ

## Cell 6: Sample Check (Optional)

Examine a few specific files to see the changes:

In [6]:
print("Sample of files in train/ folder:\n")
print("="*80)

train_path = base_path / "train"
if train_path.exists():
    # Get a few JSON files
    json_files = [f for f in train_path.rglob("*.json") if "_annotations" not in f.name.lower()]
    
    for json_file in json_files[:3]:  # Show first 3
        # Read JSON
        with open(json_file, 'r') as f:
            data = json.load(f)
        
        expected = data.get('imagePath', 'N/A')
        
        # Check for actual image
        actual_exists = False
        for ext in ['.jpg', '.png']:
            img_path = json_file.parent / expected
            if img_path.exists():
                actual_exists = True
                break
        
        status = "‚úÖ Match" if actual_exists else "‚ùå Missing"
        
        print(f"\nJSON: {json_file.name}")
        print(f"  Expected image: {expected}")
        print(f"  Status: {status}")
    
    print("\n" + "="*80)
else:
    print("Train folder not found.")

Sample of files in train/ folder:


JSON: not_pikachu_00001_jpg.rf.732a2f120882b7dd84021f787e623d07.json
  Expected image: not_pikachu_00001_jpg.rf.732a2f120882b7dd84021f787e623d07.jpg
  Status: ‚úÖ Match

JSON: not_pikachu_00002_jpg.rf.11323de1cae43adbc4a7a794673757a3.json
  Expected image: not_pikachu_00002_jpg.rf.11323de1cae43adbc4a7a794673757a3.jpg
  Status: ‚úÖ Match

JSON: not_pikachu_00003_jpg.rf.2e98c2b067a27fe13893ec406ef5ce53.json
  Expected image: not_pikachu_00003_jpg.rf.2e98c2b067a27fe13893ec406ef5ce53.jpg
  Status: ‚úÖ Match



---

## üéØ Next Steps

After successfully renaming your files, you can now use the professor's original data loading code:

```python
from glob import glob
from pathlib import Path

dataset_path = Path('pikachu_pics/train')

# Get a list of image files in the dataset
img_file_paths = get_img_files(dataset_path)

# Get a list of JSON files in the dataset
annotation_file_paths = list(dataset_path.glob('*.json'))

# Create a dictionary that maps file names to file paths
img_dict = {file.stem : file for file in img_file_paths}

# This will now work! ‚úì
```

---

## üìù What Changed?

**Before:**
```
File:         not_pikachu_00001_jpg_rf_732a2f120882b7dd84021f787e623d07.jpg  (underscores)
JSON expects: not_pikachu_00001_jpg.rf.732a2f120882b7dd84021f787e623d07.jpg  (dots)
Result:       ‚ùå Mismatch
```

**After:**
```
File:         not_pikachu_00001_jpg.rf.732a2f120882b7dd84021f787e623d07.jpg  (dots)
JSON expects: not_pikachu_00001_jpg.rf.732a2f120882b7dd84021f787e623d07.jpg  (dots)
Result:       ‚úÖ Perfect match!
```

---