-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New notebook + 1 video + 1 image file #1700
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Without live real-time tracking" - maybe drop real-time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: add period '.' to end of every line; some lines are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Using computer vision to analyze warehouse videos and provide real-time operational insights" - Highlight the product e.g. "Using GPT-4o Vision capabilities..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Simple Workflow:" - I would frame this a bit differently e.g. "In this cookbook, we will leverage GPT-4o Vision capabilities to analyze warehouse videos and provide operational insights. Here is our proposed approach: ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
video = cv2.VideoCapture("/Users/anurag/github/openai-cookbook/openai-cookbook/examples/data/manufacturing/warehouse_operations.mp4")
Use the relative path to the video that you uploaded on github if you want them to use that directly e.g. "data/manufacturing/warehouse_operations.mp4" or just generalize this with a placeholder e.g. "<PATH_TO_YOUR_VIDEO>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prompting nit: "based on the MfgEvent
model"
Would change this to something like "Your task is to analyze each frame and return a response in the specified format..." for more clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pandas df it would be nice to format so that the explanation isn't cut off e.g. add this before you display pd.set_option("display.max_colwidth", None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Step 5: 💸 Cost Considerations & Best Practices" - add a bit more description here. What are Resolution and Detail Mode? How do you set these parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your cost estimation, also provide more description upfront including your assumptions e.g. "Assuming that we take 1 image per minute, every hour of the day, for 365 days in a year..." etc...
It also looks like your printed output is duplicated:
Total annual cost: $1451.97
Token cost per image: 1105
Annual token cost (1 image per minute): 1451.97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your analysis, I would highlight what works well e.g. we correctly identify the 5 workers in the first two frames with high confidence... in the last frame we miss a worker, but we also have lower confidence... gpt-4o is really good at respecting bounding boxes and it never counts workers outside the bounding box etc...
Would be good to see some commentary on the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Implementing advanced function calling to streamline interactions between YOLO detections and GPT-4o analysis." - What are YOLO detections?
"Exploring real-time Vision APIs, currently under development, to achieve true real-time insights and faster decision-making." - Not sure what this is referring to? Note this is a public resource!
Summary
Briefly describe the changes and the goal of this PR. Make sure the PR title summarizes the changes effectively.
This PR introduces a detailed notebook demonstrating how to leverage GPT-4o's vision capabilities for analyzing video frames to extract structured operational insights in a manufacturing warehouse. It provides step-by-step instructions, best practices for bounding boxes, structured data extraction, confidence scoring, and cost considerations to effectively implement an AI-driven monitoring system.
Motivation
Why are these changes necessary? How do they improve the cookbook?
Warehouse managers often lack real-time visibility into their operations, relying instead on delayed or manual reporting methods, leading to reactive rather than proactive decision-making. This contribution addresses these issues by using GPT-4o's vision capabilities to analyze video footage, enabling rapid identification of safety concerns, monitoring space utilization, and detecting operational inefficiencies in near-real-time. This significantly improves decision-making speed, enhances safety compliance, and reduces operational inefficiencies.
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.