-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YOLOv8 pose-estimation model #2028
Comments
👋 Hello @AshishRaghani23, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results. InstallPip install the pip install ultralytics EnvironmentsYOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit. |
Hi @AshishRaghani23, to get the keypoints of a person and their respective x, y coordinates you can try the following changes:
# Run YOLOv8 inference on the frame
output = model.forward(frame)
pose_tensor = output[:, model.model.names.index('pose')]
keypoint_data = pose_tensor[0].cpu().detach().numpy() Here, |
@glenn-jocher Thanks for response. I have another doubt like this human pose-estimation model of YOLOv8 has 17 keypoints not 19 keypoints right? |
Yes @AshishRaghani23, you are correct. The YOLOv8 human pose estimation model detects 17 keypoints: 5 keypoints for the spine, 4 keypoints for the left arm, 4 keypoints for the right arm, 2 keypoints for the left leg, and 2 keypoints for the right leg. My apologies for the mistake in my previous message. Let me know if you have any further questions or concerns. |
@glenn-jocher output = model.forward(frame) but model don't have any method called forward. |
I apologize for the confusion earlier, @AshishRaghani23. YOLOv8 does not have a method called |
@glenn-jocher # Run YOLOv8 inference on the frame |
Understood, @AshishRaghani23. If you would like to modify the bounding boxes and detected key-points of a person in each frame, you can access them using the results variable that you already have. Results is a list, and each element of the list contains information about the detected objects in a single frame. You can access the bounding boxes and key-points of each detected person in the frame by iterating through this list. Once you have the bounding boxes and key-points, you can perform any necessary modifications on them. To implement these modifications in your demo.py file, you can simply modify the code that iterates through the results list, perform the modifications that you need, and then move on to the next frame. If you have any additional questions, let me know. |
@glenn-jocher Thank you for quick reply, my script is working fine right now. There is one issue that whenever I run my script it saves result images in run/detect/train folder which is not necessary for me because I'm running script on videos and also saving output videos. Also, this result image runs on every frame and save image and make my script slow. So, how to stop saving this images using my script. |
@AshishRaghani23 pass model(frame, save=False) But logically the script should not save anything by default even you don't pass |
@glenn-jocher @Laughing-q I'm using this script and in this script if I put save = False in results1 = predictor(frame, save = False) it gives me error that argument is not matched. import cv2 class PosePredictor(DetectionPredictor):
Load the Yolov8 modelmodel = YOLO('yolov8n-pose.pt') Open the video filevideo_path = "dance.mp4" Create a pose predictor objectpredictor = PosePredictor(overrides=dict(model='yolov8n-pose.pt')) Loop through the video frameswhile cap.isOpened():
Release the video capture object and close the display windowcap.release() |
@AshishRaghani23 I don't see any error in my test. it works well to me. |
@Laughing-q Thank you I have find my mistake, the error is solved now. Again thank you for helping out. |
@AshishRaghani23 sure! :) then I'm closing this issue, please feel free to reopen it if you have related issue. |
I want to extract some specific keypoints (and their locations) from this results list variable. Is there an overview of the structure of this list? And is there a way to get a specific keypoint (like left shoulder) via e.g. a function call? |
Yes, @662781, the
Here, the To extract the location of a specific keypoint (e.g. left shoulder) from the |
Thank you for the quick reply! So if I understand correctly, there is no convenient way to check which keypoint corresponds to which body part? For example, at the moment I use the "yolov8n-pose.pt" model. I let the keypoints and boxes get drawn with |
Yes, that is correct, @662781. There is no built-in way to directly determine which keypoint corresponds to which body part. The ordering and naming of the keypoints may vary depending on the specific model architecture used. However, you can often determine the mapping between the keypoint indices and the corresponding body parts by inspecting the output of the model and visually matching the detected keypoints with their associated body parts. Alternatively, you may be able to find the corresponding keypoint indices for a specific model architecture online by checking the documentation or source code for that model. |
Please assist, this model prints out something like this in the terminal windows: But how can I capture this output to get the value of amount of persons and save it to a file, tried using stdout, but it is running VSCode in Windows 10 and cannot work out how to capture it please? `import cv2 print('Starting...') Load the YOLOv8 model#model = YOLO('yolov8n.pt') Open the video file#video_path = "D:\Downloads\people.mp4" Define the desired size of the output imagewidth = 1400 Initialize timer and frame counterstart_time = time.time() Define the codec and create VideoWriter objectfourcc = cv2.VideoWriter_fourcc(*'XVID') Initialize results variableresults = None Loop through the video frameswhile cap.isOpened():
Release the video capture object and close the display windowcap.release() |
@662781 hello, you can get specific keypoints of specific body part. In yolov8 pose-estimation model all 17 keypoints are pre defined with particular body part. For example, left-hand have keypoints number 5,7 and 9. also, right-hand have 6,8 and 10 keypoints. So, If you want to print particular keypoints just run a loop for keypoints and get particular keypoints. |
@ocnuybear Hello, the output file you want to save is already saving the .txt file and also results folder containing images with detection. And, also If you print results you can get tensor value of detected person class and their pose keypoints and also co-ordinates values. |
@AshishRaghani23 The only thing it is saving is the captured output image with the estimated pose drawn on it, the best thing I came up so far is to try catch the index of the code above os if there is no one detected it says 0 otherwise 1, but I still need to get the amount of persons detected, here is the code so far: `import cv2 print('Starting...') Load the YOLOv8 model#model = YOLO('yolov8n.pt') Open the video file#video_path = "D:\Downloads\people.mp4" Define the desired size of the output imagewidth = 1400 Initialize timer and frame counterstart_time = time.time() Define the codec and create VideoWriter objectfourcc = cv2.VideoWriter_fourcc(*'XVID') Initialize results variableresults = None Loop through the video frameswhile cap.isOpened():
Release the video capture object and close the display windowcap.release() |
@ocnuybear one way to determine the number of persons detected by the model is to check for the presence of bounding boxes and associated keypoints in the |
Is there an overview of which bodypart corresponds to which index number in the keypoints list? You say 5, 7 & 9 correspond to the left-hand, where would I find an overview of this for the yolov8n-pose.pt model? |
@glenn-jocher thank you, this is one BIG study, i'm completely new at both computer vision & python and have gone through the ultralytics Jupyter openvino_notebooks, but there is so much detail to wrap around my head, it is lots of fun, it just takes time :) |
@glenn-jocher @662781 @ocnuybear this is what I get from using yolov8 pose-estimation model. I can get every keypoints and their co-ordinates frame by frame. Also, with modification I can get particular keypoints only. |
@AshishRaghani23 Right, this is exactly what I'm looking for! Thanks! Could you also provide a code sample for this? Because I don't know how you got this result. |
@glenn-jocher I used VsCode to debug the results[0].keypoints[0] code above and could find the strinf person, but it seems different from the ones used by the standard yolov8n.pt model that looks for general objects, looked at both scenarios when person was picked up and not and could not manually find any person count in the debug after setting a breakpoint at this code, although there is so much variables in there to go through mayeb I missed it :) |
@AshishRaghani23 so it seems like up to 17 points per person, so if more the 17 means more then one person, maybe ? |
@AshishRaghani23, The YOLOv8 pose-estimation model detects 17 keypoints for each person in an image or video frame. If there are more than 17 keypoints detected, that would suggest the presence of multiple people in the scene. However, the exact number of people cannot be determined based solely on the number of keypoints. It would be necessary to apply additional criteria (such as the number of detected bounding boxes or the context of the scene) to accurately determine the number of people. |
@AshishRaghani23 Could you please provide the code that you used for this? Thanks! |
@AshishRaghani23 asking for code snippets isn't exactly within our support guidelines, as we aim to provide professional written support instead. However, to visualize the keypoint information for the YOLOv8 pose-estimation model, you can iterate through the keypoints list of the detected results and use the 'plot' function to visualize the location of each keypoint. Additionally, you can access the x and y coordinates of each keypoints by indexing into the keypoints list for a particular keypoint, and selecting the 'xy' property. |
I visualized the numbers of the keypoints this way:
I don't really understand why you wouldn't give the community a code snippet, so hopefully someone can use my code to their advantage. Have a nice day! |
Thank you for sharing your code snippet @662781. Your approach is a good way to visualize the numbered keypoints detected by the YOLOv8 pose-estimation model. It is important to note that the keypoint indices and coordinates may change depending on the specific keypoint labeling scheme used by the model, so it is important to refer to the labeling scheme documentation for the specific model being used to ensure accurate labeling. |
Hello, I noticed that the Yolov8n-pose model returns bounding boxes along with the pose results. Is there a way to run the pose model independently, or is it dependent on the detection results? Thanks in advance! |
Hello @EddSB , In Yolov8 pose-estimation model, firstly people detection happens after detecting person bbox keypoints visulize on person, so it's dependent on the detection results. But if you want to remove bbox of detection you can do that. |
Hello @AshishRaghani23, Thank you for your question. The YOLOv8 pose-estimation model does rely on the detection results to identify people and their corresponding keypoints. The model first detects the people in the scene and then estimates their keypoints based on the corresponding bounding boxes. However, if you want to remove the bounding box detection and only focus on the keypoint estimation, you can achieve this by modifying the YOLOv8 code. Specifically, you can modify the inference code to only return the keypoint information and not the bounding box information. I hope this information helps. Please let me know if you have any further questions or concerns. Best regards. |
import cv2
#Load the Yolov8 model while True:
Hello @AshishRaghani23 i copy code and i have this bug |
@hoanglmv hello, The error you're encountering is due to trying to convert a Keypoints object into an integer. The keypoint values you're trying to access are stored within the Keypoints object, not the object itself. To access the x and y coordinates of each keypoint, you'd need to index into the keypoints object. This can be done using the 'xy' attribute of the Keypoints object, like so: After getting the x and y coordinates, you can convert them into integers. So, your print statement would look like This will now print the keypoint index along with the x and y coordinates of each keypoint as integer values. Let me know if you need further assistance. |
@glenn-jocher thanks for your reply, now I see the message you replied and it looks like I'm getting a few errors like this print(f"Keypoint {idx}: ({int(kpt.xy[0])}, {int(kpt.xy[1])})") |
@hoanglmv hello, The error you're encountering stems from trying to convert a tensor that has more than one element into a Python scalar. The function In the print statement you posted, you're attempting to convert A potential solution is to ensure that you are accessing individual elements within the tensor. If I hope that helps clarify the issue somewhat. If you need further assistance, please provide more details about your tensors and what you're trying to achieve. Best regards. |
Hello |
@Soheil-Nrf hello, To extract the coordinates of the pose keypoints of each person detected in an image using YOLOv8, you will first need to perform inference on your input image using the YOLOv8 model. Upon performing inference, the 'pred' tensor that you get contains the bounding box and pose information for each detected object in the image. The bounding box information is at 'pred[:,:4]' while the pose keypoints' information begins at 'pred[:,6:]'. Each keypoint is represented by a pair of x,y-coordinates relative to the top-left corner of the detected person's bounding box. To extract this information, you need to know the geometric layout (or topology) of these keypoints as defined by the model. Each person detection would have its own 'pred' tensor and thus, its own set of pose keypoints. For defining conditions for the position of each keypoint, you may need to establish certain rules or thresholds based on the relative positions of keypoints. For example, you may be interested in whether a particular keypoint is above, below or to the right of another keypoint. However, defining these conditions will largely depend on your specific use case or application. I hope this helps and please don't hesitate to offer more details about your ultimate objective if you require further assistance. Best regards. |
Hello, |
@aratamakino hello, No need to apologize, your question is clear. In the YOLOv8 pose estimation model, output for each person detected by the model includes identified keypoints for various points on the body, which include ankle joints. You should be able to extract this data from the prediction output tensor. The model provides predictions starting with bounding box detections followed by keypoints detections. Specifically, the pose keypoints information begins at 'pred[:,6:]' in the tensor. Each keypoint is represented by a pair of x,y-coordinates. To get ankle joint keypoints, you would first need to know the layout or topology of these keypoints as defined by the model. The detected keypoints are ordered as per this topology. You will need to identify the indices for the left and right ankles within this set of keypoints and use those to pull out their coordinates. Should the indices not be documented, you may need to perform some trial and error testing. You could use visualization tools to plot each keypoint one-by-one and identify which indices correspond to the ankle joints. We hope this information assists you with your project. If you have additional questions, feel free to ask! Best regards. |
|
@pillai-karthik hello, Thank you for sharing your code snippet. Here you're using your model to make predictions on an input image and then extracting the keypoints from the first detection in the results. The tolist() method you're using on Afterwards, you're iterating through each keypoint and converting its x and y coordinates into integers using the int() function. Each element kp in keypoints is indeed a tensor representing a keypoint of either (x, y) coordinates. If you experience any issues with this piece of code or have further questions, do not hesitate to ask. Your contribution helps enhance YOLOv8 and its community. Best regards. |
Is there a way to output only the coordinates of both ankles? I would appreciate it if you could let me know. |
@aratamakino hello, In the YOLOv8 pose estimation model, the model's output includes keypoint detections for each detected person, which would also include information about the ankles. This keypoint data is accessible from the prediction tensor returned by the model, typically starting at 'pred[:,6:]'. Each keypoint is represented by a pair of x,y-coordinates. To detect only the ankles or to get their coordinates, you would need to know the layout of these keypoints (often called topology) as defined by the model. The keypoints are ordered as per this topology. You will need to find out which indices correspond to the left and right ankles within this set of keypoints and fetch their coordinates. If the keypoints topology isn't documented, you might have to conduct some exploratory work like making some visualizations and plot each keypoint individually to identify which ones correspond to the ankles. To get detections for all humans in an image, you will have to iterate over all the detections made by the model, and for each detection repeat the above process to extract the ankle keypoint coordinates. Please ensure to preprocess the image correctly and adjust any parameters as needed to help the model make accurate detections. |
Hi @glenn-jocher. I am using a webcam to detect some key points. I can see the key point coordinates in the terminal however, when I am trying to access those values in real time I am getting an error.
However, if I am doing this for key points I am getting the following error:
I am detecting just a single class with single key point. I think it's because when the whole tensor is empty it is giving me this error. Do you have any idea how to resolve the issue? Or anybody can help me out in this regard? Thanks in advance. |
@NafBZ hello, Your question relates to detecting keypoints of objects in real time using a webcam feed. Firstly, I want to note that your approach to extracting bounding boxes appears correct, as you're accessing each box's xywh property efficiently. When working with keypoints, your model includes detections for keypoints within the returned results tensor. Specifically, the keypoints data typically starts from 'results[:,6:]'. Each keypoint corresponds to a pair of x,y-coordinates. The error message you're seeing, "IndexError: index 0 is out of bounds for dimension 0 with size 0", suggests that there might be instances when no keypoints are detected in an image frame from your webcam feed. In such cases, the tensor representing keypoints might be empty. Consequently, when your code attempts to access the first element (index 0), it throws an error because no elements are present. To avoid this, you could check the size of the keypoints tensor before trying to access individual keypoints. If the tensor is empty (i.e., no keypoints were detected in the frame), the code would skip access attempts to its elements. This logic would help prevent the IndexError from occurring. Remember that the model's efficiency at detecting keypoints would depend on several factors such as the quality of your input images, the accuracy of your trained model, satisfactory lighting conditions and the correct positioning of humans within the frame. I hope this provides some guidance to resolve your issue, do not hesitate to let us know if you have more questions! Best regards. |
Thanks a lot. I have resolved the issue. |
@NafBZ hello, Great to hear that your issue has been resolved! If you have any further questions or if you encounter any other issues while working with YOLOv8, feel free to reach out again. We're here to help! Best regards, |
Search before asking
Question
I want to use yolov8 pose estimation model to detect keypoints of person. But, I want to get keypoints index and x,y coordinates according to my needs. So, I want to put all the required code in this demo.py so I can access everything from here.
import cv2
from ultralytics import YOLO
import time
import imageio
Load the Yolov8 model
model = YOLO('yolov8n-pose.pt')
Open the video file
video_path = "dance.mp4"
cap = cv2.VideoCapture(video_path)
writer = imageio.get_writer("results/output23.mp4", mode="I")
Loop through the video frames
while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()
Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()
Additional
No response
The text was updated successfully, but these errors were encountered: