# **Parsing App Screenshots**
After I failed to scrape the PAX app directly, I decided to take ~150 screenshots of each of the 

# Setup
The cells below will help to set up the rest of the notebook. 

I'll start by configuring the kernel that's running this notebook:

In [1]:
# Change the cwd
%cd ..

# Enable the autoreload module
%load_ext autoreload
%autoreload 2

# Load the environment variables
from dotenv import load_dotenv
load_dotenv(override=True)

/Users/thubbard/Documents/personal/programming/pax-pal-2025


True

Next, I'm going to import the necessary modules:

In [2]:
# General imports
import random

# Third-party imports
import pandas as pd

# Defining Methods

In [3]:
import base64
import os
import json
import time
from openai import OpenAI
from typing import List, Optional, Dict, Any
from pydantic import BaseModel

client = OpenAI()


# Function to encode the image
def encode_image(image_path: str) -> str:
    """Encode an image file to base64 string."""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Define a Pydantic model for the game data
class GameMetadata(BaseModel):
    title: str
    department: str
    booth_number: Optional[float] = None
    description: Optional[str] = None
    genres: Optional[List[str]] = None
    developer: Optional[str] = None
    release_year: Optional[int] = None


def parse_game_screenshot(image_path: str) -> Dict[str, Any]:
    """
    Parse a screenshot of a game from the PAX app and extract metadata.

    Args:
        image_path: Path to the screenshot image

    Returns:
        Dictionary containing extracted game metadata
    """
    # Encode the image
    base64_image = encode_image(image_path)

    # Create the developer message with instructions
    developer_message = """
# Role
You're a screenshot-parsing digital assistant who responds in JSON. 

# Context
Users are trying to extract information about games appearing at an upcoming conference. They've taken screenshots of the conference's app, and each screenshot ought to have some metadata about a game in a modal that's vertically centered in the screen. 

The modal can contain the following attributes: 

- `title` (always present; displayed as a header in the modal) 
- `department` (always present) 
- `booth_number` (optional)
- `developer` (optional)
- `release_year` (optional)
- `description` (optional)
- `genres` (optional)

# Task
A user will upload a screenshot. Extract all of the game metadata that you can identify within the modal. Respond in a JSON w/ the following attributes: 

- title (str)
- department (str) 
- booth_number (Optional[int]) 
- description (Optional[str])
- genres (Optional[List[str]])
- developer (Optional[str])
- release_year (Optional[int])

If information is not present, you can leave the fields as `None`. 
"""

    # Create the completion using structured output
    completion = client.beta.chat.completions.parse(
        model="gpt-4.1-mini",
        messages=[
            {
                "role": "developer",
                "content": developer_message,
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                    },
                ],
            },
        ],
        response_format=GameMetadata,
    )

    # Get the parsed data as a dictionary
    game_data = completion.choices[0].message.parsed.model_dump()

    # Add 3 second sleep timer
    time.sleep(3)

    return game_data

# Parsing the Screenshots


In [4]:
import os
from pathlib import Path
from tqdm import tqdm

# Path to the screenshots directory
screenshots_dir = Path("data/app_screenshots")

# Dictionary to store results
all_results = {}

# Find all jpg files in the directory
jpg_files = [f for f in os.listdir(screenshots_dir) if f.lower().endswith(".jpg")]

print(f"Found {len(jpg_files)} screenshots to process")

# Process each file with tqdm progress bar
for jpg_file in tqdm(jpg_files, desc="Processing screenshots", unit="file"):
    file_path = screenshots_dir / jpg_file

    try:
        # Parse the screenshot
        result = parse_game_screenshot(str(file_path))

        # Store the result with the filename as key
        all_results[jpg_file] = result

    except Exception as e:
        tqdm.write(f"Error processing {jpg_file}: {str(e)}")

print(f"Processed {len(all_results)} screenshots successfully")
all_game_data = all_results

Found 152 screenshots to process


Processing screenshots: 100%|██████████| 152/152 [22:29<00:00,  8.88s/file]

Processed 152 screenshots successfully





Next, I'll create a `DataFrame` with all of the results, and save it. 

In [5]:
# Create a DataFrame from the results
game_data_df = pd.DataFrame.from_records(
    [
        {
            "filename": filename,
            **game_data,
        }
        for filename, game_data in all_game_data.items()
    ]
)

# Save the dataframe to a .json file
game_data_df.to_json(
    "data/games_from_app.json",
    orient="records",
    indent=2,
)

# Reloading the Data
If I want to reload the data, I can do it below:

In [11]:
# Reload the game_data_df
game_data_df = pd.read_json("data/games_from_app.json", orient="records")