New notebook + 1 video + 1 image file #1700

anurag-openai · 2025-03-04T16:06:58Z

Summary

Briefly describe the changes and the goal of this PR. Make sure the PR title summarizes the changes effectively.

This PR introduces a detailed notebook demonstrating how to leverage GPT-4o's vision capabilities for analyzing video frames to extract structured operational insights in a manufacturing warehouse. It provides step-by-step instructions, best practices for bounding boxes, structured data extraction, confidence scoring, and cost considerations to effectively implement an AI-driven monitoring system.

Motivation

Why are these changes necessary? How do they improve the cookbook?

Warehouse managers often lack real-time visibility into their operations, relying instead on delayed or manual reporting methods, leading to reactive rather than proactive decision-making. This contribution addresses these issues by using GPT-4o's vision capabilities to analyze video footage, enabling rapid identification of safety concerns, monitoring space utilization, and detecting operational inefficiencies in near-real-time. This significantly improves decision-making speed, enhances safety compliance, and reduces operational inefficiencies.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
[X ] I have conducted a self-review of my content based on the contribution guidelines:
- [X ] Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- [X ] Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- [X ] Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- [X ] Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

danial-openai

nit: "Without live real-time tracking" - maybe drop real-time

danial-openai

style nit: add period '.' to end of every line; some lines are missing

danial-openai

nit: "Using computer vision to analyze warehouse videos and provide real-time operational insights" - Highlight the product e.g. "Using GPT-4o Vision capabilities..."

danial-openai

nit: "Simple Workflow:" - I would frame this a bit differently e.g. "In this cookbook, we will leverage GPT-4o Vision capabilities to analyze warehouse videos and provide operational insights. Here is our proposed approach: ..."

danial-openai

video = cv2.VideoCapture("/Users/anurag/github/openai-cookbook/openai-cookbook/examples/data/manufacturing/warehouse_operations.mp4")

Use the relative path to the video that you uploaded on github if you want them to use that directly e.g. "data/manufacturing/warehouse_operations.mp4" or just generalize this with a placeholder e.g. "<PATH_TO_YOUR_VIDEO>

danial-openai

prompting nit: "based on the MfgEvent model"

Would change this to something like "Your task is to analyze each frame and return a response in the specified format..." for more clarity

danial-openai

In the pandas df it would be nice to format so that the explanation isn't cut off e.g. add this before you display pd.set_option("display.max_colwidth", None)

danial-openai

"Step 5: 💸 Cost Considerations & Best Practices" - add a bit more description here. What are Resolution and Detail Mode? How do you set these parameters?

danial-openai

In your cost estimation, also provide more description upfront including your assumptions e.g. "Assuming that we take 1 image per minute, every hour of the day, for 365 days in a year..." etc...

It also looks like your printed output is duplicated:
Total annual cost: $1451.97
Token cost per image: 1105
Annual token cost (1 image per minute): 1451.97

danial-openai

In your analysis, I would highlight what works well e.g. we correctly identify the 5 workers in the first two frames with high confidence... in the last frame we miss a worker, but we also have lower confidence... gpt-4o is really good at respecting bounding boxes and it never counts workers outside the bounding box etc...

Would be good to see some commentary on the results

danial-openai

"Implementing advanced function calling to streamline interactions between YOLO detections and GPT-4o analysis." - What are YOLO detections?

"Exploring real-time Vision APIs, currently under development, to achieve true real-time insights and faster decision-making." - Not sure what this is referring to? Note this is a public resource!

New notebook + 1 video + 1 image file

2373355

anurag-openai requested a review from danial-openai March 4, 2025 16:07

danial-openai reviewed Mar 11, 2025

View reviewed changes

Anurag Sakhamuri and others added 2 commits March 12, 2025 11:42

1st update

e090019

Merge branch 'main' into anurag-mfgvision-newcookbook

232045a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New notebook + 1 video + 1 image file #1700

New notebook + 1 video + 1 image file #1700

anurag-openai commented Mar 4, 2025

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

danial-openai left a comment

New notebook + 1 video + 1 image file #1700

Are you sure you want to change the base?

New notebook + 1 video + 1 image file #1700

Conversation

anurag-openai commented Mar 4, 2025

Summary

Motivation

For new content

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment

danial-openai left a comment

Choose a reason for hiding this comment