Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv8 pose-estimation model #2028

Closed
1 task done
AshishRaghani23 opened this issue Apr 14, 2023 · 54 comments
Closed
1 task done

YOLOv8 pose-estimation model #2028

AshishRaghani23 opened this issue Apr 14, 2023 · 54 comments
Labels
question Further information is requested

Comments

@AshishRaghani23
Copy link

Search before asking

Question

I want to use yolov8 pose estimation model to detect keypoints of person. But, I want to get keypoints index and x,y coordinates according to my needs. So, I want to put all the required code in this demo.py so I can access everything from here.

import cv2
from ultralytics import YOLO
import time
import imageio

Load the Yolov8 model

model = YOLO('yolov8n-pose.pt')

Open the video file

video_path = "dance.mp4"
cap = cv2.VideoCapture(video_path)
writer = imageio.get_writer("results/output23.mp4", mode="I")

Loop through the video frames

while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()

if success:
    start_time = time.time()
    # Run YOLOv8 inference on the frame
    results = model(frame)
    
    # Visualize the results on the frame
    annotated_frame = results[0].plot()
    
    end_time = time.time()
    fps = 1 / (end_time - start_time)
    print("FPS :", fps)
    
    cv2.putText(annotated_frame, "FPS :"+str(int(fps)), (10, 50), cv2.FONT_HERSHEY_COMPLEX, 1.2, (255, 0, 255), 1, cv2.LINE_AA)
    
    # Display the annotated frame
    cv2.imshow("YOLOv8 Inference", annotated_frame)
    
    annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)
    writer.append_data(annotated_frame)
    
    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF ==ord('q'):
        break
else:
    # Break the loop if the end of the video is reached
    break

Release the video capture object and close the display window

cap.release()
cv2.destroyAllWindows()

Additional

No response

@AshishRaghani23 AshishRaghani23 added the question Further information is requested label Apr 14, 2023
@github-actions
Copy link

github-actions bot commented Apr 14, 2023

👋 Hello @AshishRaghani23, thank you for your interest in YOLOv8 🚀! We recommend a visit to the YOLOv8 Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Install

Pip install the ultralytics package including all requirements in a Python>=3.7 environment with PyTorch>=1.7.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

Hi @AshishRaghani23, to get the keypoints of a person and their respective x, y coordinates you can try the following changes:

  1. Get the model.forward() output:
# Run YOLOv8 inference on the frame
output = model.forward(frame)
  1. Extract the pose tensor from the output using the following line:
pose_tensor = output[:, model.model.names.index('pose')]
  1. Extract the key-points data from pose_tensor using the following line:
keypoint_data = pose_tensor[0].cpu().detach().numpy()

Here, keypoint_data will be an array of size 57, representing 19 key-points with x, y coordinates respectively at three scales.
I hope this helps!

@AshishRaghani23
Copy link
Author

@glenn-jocher Thanks for response. I have another doubt like this human pose-estimation model of YOLOv8 has 17 keypoints not 19 keypoints right?

@glenn-jocher
Copy link
Member

Yes @AshishRaghani23, you are correct. The YOLOv8 human pose estimation model detects 17 keypoints: 5 keypoints for the spine, 4 keypoints for the left arm, 4 keypoints for the right arm, 2 keypoints for the left leg, and 2 keypoints for the right leg. My apologies for the mistake in my previous message. Let me know if you have any further questions or concerns.

@AshishRaghani23
Copy link
Author

@glenn-jocher output = model.forward(frame) but model don't have any method called forward.

@glenn-jocher
Copy link
Member

I apologize for the confusion earlier, @AshishRaghani23. YOLOv8 does not have a method called forward. Instead, you can use the __call__ method to run the inference on a single input frame or a batch of input frames. To run YOLOv8 inference on a single input frame, you can perform the following operation: output = model(frame). Here, frame is a single image that you want to run inference on. For batch inference, you can pass a list of frames to model like this: output = model(frames_list). Here, frames_list is a list of images that you want to run inference on. Let me know if this helps or if you have any further questions.

@AshishRaghani23
Copy link
Author

@glenn-jocher # Run YOLOv8 inference on the frame
results = model(frame)
I'm already using this in my file and I'm getting desired output. But, after that I want to do modification in person bounding box and keypoints detected in frame and I want to do that modification in my demo.py so I don't need to change in official files of yolov8 folder.

@glenn-jocher
Copy link
Member

Understood, @AshishRaghani23. If you would like to modify the bounding boxes and detected key-points of a person in each frame, you can access them using the results variable that you already have. Results is a list, and each element of the list contains information about the detected objects in a single frame. You can access the bounding boxes and key-points of each detected person in the frame by iterating through this list. Once you have the bounding boxes and key-points, you can perform any necessary modifications on them. To implement these modifications in your demo.py file, you can simply modify the code that iterates through the results list, perform the modifications that you need, and then move on to the next frame. If you have any additional questions, let me know.

@AshishRaghani23
Copy link
Author

@glenn-jocher Thank you for quick reply, my script is working fine right now. There is one issue that whenever I run my script it saves result images in run/detect/train folder which is not necessary for me because I'm running script on videos and also saving output videos. Also, this result image runs on every frame and save image and make my script slow. So, how to stop saving this images using my script.

@Laughing-q
Copy link
Member

@AshishRaghani23 pass save=False

model(frame, save=False)

But logically the script should not save anything by default even you don't pass save=False.

@AshishRaghani23
Copy link
Author

AshishRaghani23 commented Apr 17, 2023

@glenn-jocher @Laughing-q I'm using this script and in this script if I put save = False in results1 = predictor(frame, save = False) it gives me error that argument is not matched.

import cv2
from ultralytics import YOLO
import time
import imageio
from ultralytics.yolo.engine.results import Results
from ultralytics.yolo.utils import DEFAULT_CFG, ROOT, ops
from ultralytics.yolo.v8.detect.predict import DetectionPredictor

class PosePredictor(DetectionPredictor):

def postprocess(self, preds, img, orig_img):
    preds = ops.non_max_suppression(preds,
                                    self.args.conf,
                                    self.args.iou,
                                    agnostic=self.args.agnostic_nms,
                                    max_det=self.args.max_det,
                                    classes=self.args.classes,
                                    nc=len(self.model.names))

    results1 = []
    for i, pred in enumerate(preds):
        orig_img = orig_img[i] if isinstance(orig_img, list) else orig_img
        shape = orig_img.shape
        pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], shape).round()
        if len(pred) == 0:
            pred_kpts = None
        else:
            pred_kpts = pred[:, 6:].view(len(pred), *self.model.kpt_shape)
            pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, shape)

        path, _, _, _, _ = self.batch
        img_path = path[i] if isinstance(path, list) else path
        results1.append(
            Results(orig_img=orig_img,
                    path=img_path,
                    names=self.model.names,
                    boxes=pred[:, :6],
                    keypoints=pred_kpts))

        if pred_kpts is not None:
            for idx, kpt in enumerate(pred_kpts[0]):
                print(f"Keypoint {idx}: ({kpt[0]:.2f}, {kpt[1]:.2f})")
    return results1

Load the Yolov8 model

model = YOLO('yolov8n-pose.pt')

Open the video file

video_path = "dance.mp4"
cap = cv2.VideoCapture(video_path)
writer = imageio.get_writer("results/output32.mp4", mode="I")

Create a pose predictor object

predictor = PosePredictor(overrides=dict(model='yolov8n-pose.pt'))

Loop through the video frames

while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()

if success:
    start_time = time.time()
    # Run pose detection on the frame
    results1 = predictor(frame)
    
    # Visualize the results on the frame
    annotated_frame = results1[0].plot()
    
    # print keypoints index number and x,y coordinates
    for idx, kpt in enumerate(results1[0].keypoints[0]):
        print(f"Keypoint {idx}: ({int(kpt[0])}, {int(kpt[1])})")
        annotated_frame = cv2.putText(annotated_frame, f"{idx}:({int(kpt[0])}, {int(kpt[1])})", (int(kpt[0]), int(kpt[1])), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1, cv2.LINE_AA)
    
    
    end_time = time.time()
    fps = 1 / (end_time - start_time)
    print("FPS :", fps)
    
    cv2.putText(annotated_frame, "FPS :"+str(int(fps)), (10, 50), cv2.FONT_HERSHEY_COMPLEX, 1.2, (255, 0, 255), 1, cv2.LINE_AA)
    
    # Display the annotated frame
    cv2.imshow("Pose Detection", annotated_frame)
    
    annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)
    writer.append_data(annotated_frame)
    
    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF ==ord('q'):
        break
else:
    # Break the loop if the end of the video is reached
    break

Release the video capture object and close the display window

cap.release()
cv2.destroyAllWindows()

@Laughing-q
Copy link
Member

Laughing-q commented Apr 17, 2023

@AshishRaghani23 I don't see any error in my test. it works well to me.
pic-selected-230417-1739-03

@AshishRaghani23
Copy link
Author

@Laughing-q Thank you I have find my mistake, the error is solved now. Again thank you for helping out.

@Laughing-q
Copy link
Member

@AshishRaghani23 sure! :) then I'm closing this issue, please feel free to reopen it if you have related issue.

@bas-inh
Copy link

bas-inh commented Apr 21, 2023

Understood, @AshishRaghani23. If you would like to modify the bounding boxes and detected key-points of a person in each frame, you can access them using the results variable that you already have. Results is a list, and each element of the list contains information about the detected objects in a single frame. You can access the bounding boxes and key-points of each detected person in the frame by iterating through this list. Once you have the bounding boxes and key-points, you can perform any necessary modifications on them. To implement these modifications in your demo.py file, you can simply modify the code that iterates through the results list, perform the modifications that you need, and then move on to the next frame. If you have any additional questions, let me know.

I want to extract some specific keypoints (and their locations) from this results list variable. Is there an overview of the structure of this list? And is there a way to get a specific keypoint (like left shoulder) via e.g. a function call?

@glenn-jocher
Copy link
Member

Yes, @662781, the results list that you obtained from running the pose detection model contains information about the detected objects in each frame of the input video. Specifically, each element of the results list contains the bounding box coordinates and keypoint locations for each object detected in that frame. The structure of each element in the results list is similar to the following:

{
    'names': ['person'],
    'boxes': tensor([[x1, y1, x2, y2, conf, cls_idx]]),
    'keypoints': tensor([[x1_kpt_0, y1_kpt_0, score_0], ... [x1_kpt_n, y1_kpt_n, score_n]])
}

Here, the 'boxes' field contains the bounding box coordinates and the confidence score (conf) and class index (cls_idx) of the detected object. The 'keypoints' field contains the x and y coordinates of the detected keypoints (e.g. left shoulder, right shoulder, etc.) and their associated confidence scores. Depending on the specific model architecture used, the positions of the keypoints and the order in which they are listed in 'keypoints' may vary.

To extract the location of a specific keypoint (e.g. left shoulder) from the results list, you can iterate through the list and check the names of the objects detected in each frame (which can be accessed using the 'names' field). Once you have located the object of interest (e.g. a person), you can extract the coordinates of the desired keypoint from the 'keypoints' field using indexing. However, since the position and ordering of the keypoints may change depending on the specific model architecture used, there is no built-in way to get a specific keypoint (like the left shoulder) via a function call.

@bas-inh
Copy link

bas-inh commented Apr 21, 2023

Thank you for the quick reply!

So if I understand correctly, there is no convenient way to check which keypoint corresponds to which body part?

For example, at the moment I use the "yolov8n-pose.pt" model. I let the keypoints and boxes get drawn with results[0].plot() and when I run my code, I can see those boxes and keypoints in my live webcam output. But is there then no way to know at which index a specific keypoint is stored? Or is there a list out there for every model architecture with the order in which the keypoints are stored?

@glenn-jocher
Copy link
Member

Yes, that is correct, @662781. There is no built-in way to directly determine which keypoint corresponds to which body part. The ordering and naming of the keypoints may vary depending on the specific model architecture used. However, you can often determine the mapping between the keypoint indices and the corresponding body parts by inspecting the output of the model and visually matching the detected keypoints with their associated body parts. Alternatively, you may be able to find the corresponding keypoint indices for a specific model architecture online by checking the documentation or source code for that model.

@ocnuybear
Copy link

ocnuybear commented Apr 22, 2023

Please assist, this model prints out something like this in the terminal windows:
0: 352x640 1 person, 100.9ms
Speed: 0.0ms preprocess, 100.9ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 640)

But how can I capture this output to get the value of amount of persons and save it to a file, tried using stdout, but it is running VSCode in Windows 10 and cannot work out how to capture it please?

`import cv2
import time
from ultralytics import YOLO

print('Starting...')

Load the YOLOv8 model

#model = YOLO('yolov8n.pt')
model = YOLO('yolov8s-pose.pt')

Open the video file

#video_path = "D:\Downloads\people.mp4"
video_path = 0
cap = cv2.VideoCapture(video_path)

Define the desired size of the output image

width = 1400
height = 750

Initialize timer and frame counter

start_time = time.time()
frame_count = 0

Define the codec and create VideoWriter object

fourcc = cv2.VideoWriter_fourcc(*'XVID')
output_file = "output.avi"
out = cv2.VideoWriter(output_file, fourcc, 30.0, (width, height))

Initialize results variable

results = None

Loop through the video frames

while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()

if success:
    # Resize the image to the desired size using cv2.resize()
    frame = cv2.resize(frame, (width, height))

    # Check if one second has passed since the last inference was applied
    if time.time() - start_time >= 0.5:
        # Run YOLOv8 inference on the frame
        results = model(frame)

        # Update the timer and frame counter
        start_time = time.time()
        #frame_count += 2

    # Visualize the results on the frame if they exist
    if results is not None:
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Inference", annotated_frame)

        # Write the annotated frame to the output video file
        out.write(annotated_frame)
        

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break
else:
    # Break the loop if the end of the video is reached
    break

Release the video capture object and close the display window

cap.release()
cv2.destroyAllWindows()`

@AshishRaghani23
Copy link
Author

@662781 hello, you can get specific keypoints of specific body part. In yolov8 pose-estimation model all 17 keypoints are pre defined with particular body part. For example, left-hand have keypoints number 5,7 and 9. also, right-hand have 6,8 and 10 keypoints. So, If you want to print particular keypoints just run a loop for keypoints and get particular keypoints.

@AshishRaghani23
Copy link
Author

@ocnuybear Hello, the output file you want to save is already saving the .txt file and also results folder containing images with detection. And, also If you print results you can get tensor value of detected person class and their pose keypoints and also co-ordinates values.

@ocnuybear
Copy link

ocnuybear commented Apr 22, 2023

@AshishRaghani23 The only thing it is saving is the captured output image with the estimated pose drawn on it, the best thing I came up so far is to try catch the index of the code above os if there is no one detected it says 0 otherwise 1, but I still need to get the amount of persons detected, here is the code so far:

`import cv2
import time
from ultralytics import YOLO
from datetime import datetime

print('Starting...')

Load the YOLOv8 model

#model = YOLO('yolov8n.pt')
model = YOLO('yolov8s-pose.pt')

Open the video file

#video_path = "D:\Downloads\people.mp4"
video_path = 0
cap = cv2.VideoCapture(video_path)

Define the desired size of the output image

width = 1400
height = 750

Initialize timer and frame counter

start_time = time.time()
frame_count = 0

Define the codec and create VideoWriter object

fourcc = cv2.VideoWriter_fourcc(*'XVID')
output_file = "output.avi"
out = cv2.VideoWriter(output_file, fourcc, 30.0, (width, height))

Initialize results variable

results = None

Loop through the video frames

while cap.isOpened():
# Read a frame from the video
success, frame = cap.read()

if success:
    # Resize the image to the desired size using cv2.resize()
    frame = cv2.resize(frame, (width, height))

    # Check if one second has passed since the last inference was applied
    if time.time() - start_time >= 0.5:
        # Run YOLOv8 inference on the frame
        results = model(frame)

        # Update the timer and frame counter
        start_time = time.time()
        #frame_count += 2

    # Visualize the results on the frame if they exist
    if results is not None:
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Inference", annotated_frame)

        # Write the annotated frame to the output video file
        out.write(annotated_frame)

        P = 0
        try:
            # print keypoints index number and x,y coordinates
            for idx, kpt in enumerate(results[0].keypoints[0]):
                print('Persons Detected')
                P = 1
        except:
                print('No Persons')
                P = 0
        file_object = open('sample.txt', 'a')
        file_object.write(str(P) + ' Time: ' + str(datetime.now()))
        file_object.write("\n")
        file_object.close()

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break
else:
    # Break the loop if the end of the video is reached
    break

Release the video capture object and close the display window

cap.release()
cv2.destroyAllWindows()`

@glenn-jocher
Copy link
Member

@ocnuybear one way to determine the number of persons detected by the model is to check for the presence of bounding boxes and associated keypoints in the results list. You can iterate through the list and check the number of elements (i.e. the number of detected objects) within it. For each element, you can check whether the 'names' field contains the string 'person', indicating that a person has been detected. If so, you can count that as one detected person. You can also use other criteria (such as the size, confidence score, or overlap with other detections) to filter out false positives and improve the accuracy of your person count. Finally, you can save the count to a file by opening a file object and writing the count (and possibly the current time) to it.

@bas-inh
Copy link

bas-inh commented Apr 22, 2023

@662781 hello, you can get specific keypoints of specific body part. In yolov8 pose-estimation model all 17 keypoints are pre defined with particular body part. For example, left-hand have keypoints number 5,7 and 9. also, right-hand have 6,8 and 10 keypoints. So, If you want to print particular keypoints just run a loop for keypoints and get particular keypoints.

Is there an overview of which bodypart corresponds to which index number in the keypoints list? You say 5, 7 & 9 correspond to the left-hand, where would I find an overview of this for the yolov8n-pose.pt model?

@ocnuybear
Copy link

@glenn-jocher thank you, this is one BIG study, i'm completely new at both computer vision & python and have gone through the ultralytics Jupyter openvino_notebooks, but there is so much detail to wrap around my head, it is lots of fun, it just takes time :)

@AshishRaghani23
Copy link
Author

Screenshot (247)

@AshishRaghani23
Copy link
Author

@glenn-jocher @662781 @ocnuybear this is what I get from using yolov8 pose-estimation model. I can get every keypoints and their co-ordinates frame by frame. Also, with modification I can get particular keypoints only.

@bas-inh
Copy link

bas-inh commented Apr 22, 2023

Screenshot (247)

@AshishRaghani23 Right, this is exactly what I'm looking for! Thanks! Could you also provide a code sample for this? Because I don't know how you got this result.

@ocnuybear
Copy link

@glenn-jocher I used VsCode to debug the results[0].keypoints[0] code above and could find the strinf person, but it seems different from the ones used by the standard yolov8n.pt model that looks for general objects, looked at both scenarios when person was picked up and not and could not manually find any person count in the debug after setting a breakpoint at this code, although there is so much variables in there to go through mayeb I missed it :)

@ocnuybear
Copy link

ocnuybear commented Apr 22, 2023

@AshishRaghani23 so it seems like up to 17 points per person, so if more the 17 means more then one person, maybe ?

@glenn-jocher
Copy link
Member

@AshishRaghani23, The YOLOv8 pose-estimation model detects 17 keypoints for each person in an image or video frame. If there are more than 17 keypoints detected, that would suggest the presence of multiple people in the scene. However, the exact number of people cannot be determined based solely on the number of keypoints. It would be necessary to apply additional criteria (such as the number of detected bounding boxes or the context of the scene) to accurately determine the number of people.

@bas-inh
Copy link

bas-inh commented Apr 22, 2023

Screenshot (247)

@AshishRaghani23 Right, this is exactly what I'm looking for! Thanks! Could you also provide a code sample for this? Because I don't know how you got this result.

@AshishRaghani23 Could you please provide the code that you used for this? Thanks!

@glenn-jocher
Copy link
Member

@AshishRaghani23 asking for code snippets isn't exactly within our support guidelines, as we aim to provide professional written support instead. However, to visualize the keypoint information for the YOLOv8 pose-estimation model, you can iterate through the keypoints list of the detected results and use the 'plot' function to visualize the location of each keypoint. Additionally, you can access the x and y coordinates of each keypoints by indexing into the keypoints list for a particular keypoint, and selecting the 'xy' property.

@bas-inh
Copy link

bas-inh commented Apr 24, 2023

I visualized the numbers of the keypoints this way:

from ultralytics.yolo.engine.results import Results
from ultralytics.yolo.utils.plotting import Annotator
results: Results = model.predict(frame)[0]

keypoints = results.keypoints.squeeze().tolist()
ann = Annotator(frame)
for i, kp in enumerate(keypoints):
    x = int(kp[0])
    y = int(kp[1])
    ann.text((x, y), str(i))

I don't really understand why you wouldn't give the community a code snippet, so hopefully someone can use my code to their advantage.

Have a nice day!

@glenn-jocher
Copy link
Member

Thank you for sharing your code snippet @662781. Your approach is a good way to visualize the numbered keypoints detected by the YOLOv8 pose-estimation model. It is important to note that the keypoint indices and coordinates may change depending on the specific keypoint labeling scheme used by the model, so it is important to refer to the labeling scheme documentation for the specific model being used to ensure accurate labeling.

@EddSB
Copy link

EddSB commented May 11, 2023

Hello,

I noticed that the Yolov8n-pose model returns bounding boxes along with the pose results. Is there a way to run the pose model independently, or is it dependent on the detection results?

Thanks in advance!

@AshishRaghani23
Copy link
Author

AshishRaghani23 commented May 11, 2023

Hello @EddSB ,

In Yolov8 pose-estimation model, firstly people detection happens after detecting person bbox keypoints visulize on person, so it's dependent on the detection results. But if you want to remove bbox of detection you can do that.

@glenn-jocher
Copy link
Member

Hello @AshishRaghani23,

Thank you for your question. The YOLOv8 pose-estimation model does rely on the detection results to identify people and their corresponding keypoints. The model first detects the people in the scene and then estimates their keypoints based on the corresponding bounding boxes. However, if you want to remove the bounding box detection and only focus on the keypoint estimation, you can achieve this by modifying the YOLOv8 code. Specifically, you can modify the inference code to only return the keypoint information and not the bounding box information.

I hope this information helps. Please let me know if you have any further questions or concerns.

Best regards.

@hoanglmv
Copy link

import cv2
from ultralytics import YOLO
import time
import imageio
from ultralytics.yolo.engine.results import Results
from ultralytics.yolo.utils import DEFAULT_CFG, ROOT, ops
from ultralytics.yolo.v8.detect.predict import DetectionPredictor
class PosePredictor(DetectionPredictor):

def __init__(self, cfg=DEFAULT_CFG, overrides=None, _callbacks=None):
    super().__init__(cfg, overrides, _callbacks)
    self.args.task = 'pose'

def postprocess(self, preds, img, orig_img):
    preds = ops.non_max_suppression(preds,
                                    self.args.conf,
                                    self.args.iou,
                                    agnostic=self.args.agnostic_nms,
                                    max_det=self.args.max_det,
                                    classes=self.args.classes,
                                    nc=len(self.model.names))

    results = []
    for i, pred in enumerate(preds):
        orig_img = orig_img[i] if isinstance(orig_img, list) else orig_img
        shape = orig_img.shape
        pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], shape).round()
        pred_kpts = pred[:, 6:].view(len(pred), *self.model.kpt_shape) if len(pred) else pred[:, 6:]
        pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, shape)
        path= self.batch[0]
        img_path = path[i] if isinstance(path, list) else path
        results.append(
            Results(orig_img=orig_img,
                    path=img_path,
                    names=self.model.names,
                    boxes=pred[:, :6],
                    keypoints=pred_kpts))
    return results

#Load the Yolov8 model
model = YOLO('yolov8n-pose.pt')
#open cam
cap = cv2.VideoCapture(0)
#creat a pose predictor object
predictor = PosePredictor(overrides=dict(model='yolov8n-pose.pt'))

while True:
ret,frame = cap.read()
start_time = time.time()
# Run pose detection on the frame
results1 = predictor(frame)

# Visualize the results on the frame
annotated_frame = results1[0].plot()

# print keypoints index number and x,y coordinates
for idx, kpt in enumerate(results1[0].keypoints[0]):
    x = int(float(kpt[0]))
    y = int(float(kpt[1]))
    print(f"Keypoint {idx}: ({kpt[0]}, {kpt})")
    annotated_frame = cv2.putText(annotated_frame, f"{idx}:({x}, {y})", (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1, cv2.LINE_AA)


end_time = time.time()
fps = 1 / (end_time - start_time)
print("FPS :", fps)

cv2.putText(annotated_frame, "FPS :"+str(int(fps)), (10, 50), cv2.FONT_HERSHEY_COMPLEX, 1.2, (255, 0, 255), 1, cv2.LINE_AA)

# Display the annotated frame
cv2.imshow("Pose Detection", annotated_frame)

annotated_frame = cv2.cvtColor(annotated_frame, cv2.COLOR_BGR2RGB)


# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF ==ord('q'):
    break

Hello @AshishRaghani23 i copy code and i have this bug
can you tell me how to fix
print(f"Keypoint {idx}: ({int(kpt[0])}, {int(kpt[1])})")
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Keypoints'

@glenn-jocher
Copy link
Member

@hoanglmv hello,

The error you're encountering is due to trying to convert a Keypoints object into an integer. The keypoint values you're trying to access are stored within the Keypoints object, not the object itself.

To access the x and y coordinates of each keypoint, you'd need to index into the keypoints object. This can be done using the 'xy' attribute of the Keypoints object, like so: kpt.xy.

After getting the x and y coordinates, you can convert them into integers. So, your print statement would look like print(f"Keypoint {idx}: ({int(kpt.xy[0])}, {int(kpt.xy[1])})").

This will now print the keypoint index along with the x and y coordinates of each keypoint as integer values.

Let me know if you need further assistance.

@hoanglmv
Copy link

hoanglmv commented Aug 2, 2023

@glenn-jocher thanks for your reply, now I see the message you replied and it looks like I'm getting a few errors like this print(f"Keypoint {idx}: ({int(kpt.xy[0])}, {int(kpt.xy[1])})")
ValueError: only one element tensors can be converted to Python scalars

@glenn-jocher
Copy link
Member

@hoanglmv hello,

The error you're encountering stems from trying to convert a tensor that has more than one element into a Python scalar. The function int() can only convert tensors with a single element into Python scalars.

In the print statement you posted, you're attempting to convert kpt.xy[0] and kpt.xy[1] to integers. However, it appears that one (or both) of these tensors have more than one element, which is causing the ValueError.

A potential solution is to ensure that you are accessing individual elements within the tensor. If kpt.xy is multi-dimensional, you'll need to index into each dimension until you get a single element tensor, which can then be converted to a Python scalar with the int() function. You may need to adjust your indexes depending on how your tensor is structured.

I hope that helps clarify the issue somewhat. If you need further assistance, please provide more details about your tensors and what you're trying to achieve.

Best regards.

@Soheil-Nrf
Copy link

Hello
Please assist, How can I extract the coordinates of the pose key points of each person in the image and define conditions for the position of each key point?
Thank you

@glenn-jocher
Copy link
Member

@Soheil-Nrf hello,

To extract the coordinates of the pose keypoints of each person detected in an image using YOLOv8, you will first need to perform inference on your input image using the YOLOv8 model. Upon performing inference, the 'pred' tensor that you get contains the bounding box and pose information for each detected object in the image.

The bounding box information is at 'pred[:,:4]' while the pose keypoints' information begins at 'pred[:,6:]'. Each keypoint is represented by a pair of x,y-coordinates relative to the top-left corner of the detected person's bounding box.

To extract this information, you need to know the geometric layout (or topology) of these keypoints as defined by the model. Each person detection would have its own 'pred' tensor and thus, its own set of pose keypoints.

For defining conditions for the position of each keypoint, you may need to establish certain rules or thresholds based on the relative positions of keypoints. For example, you may be interested in whether a particular keypoint is above, below or to the right of another keypoint. However, defining these conditions will largely depend on your specific use case or application.

I hope this helps and please don't hesitate to offer more details about your ultimate objective if you require further assistance.

Best regards.

@aratamakino
Copy link

Hello,
I apologize if the translation is odd as I am using a translation tool.
I would like to obtain the coordinates of both ankle joints using YOLOv8 pose estimation.
Could you please tell me how to write this? Your help would be greatly appreciated.

@glenn-jocher
Copy link
Member

@aratamakino hello,

No need to apologize, your question is clear.

In the YOLOv8 pose estimation model, output for each person detected by the model includes identified keypoints for various points on the body, which include ankle joints. You should be able to extract this data from the prediction output tensor.

The model provides predictions starting with bounding box detections followed by keypoints detections. Specifically, the pose keypoints information begins at 'pred[:,6:]' in the tensor. Each keypoint is represented by a pair of x,y-coordinates.

To get ankle joint keypoints, you would first need to know the layout or topology of these keypoints as defined by the model. The detected keypoints are ordered as per this topology. You will need to identify the indices for the left and right ankles within this set of keypoints and use those to pull out their coordinates.

Should the indices not be documented, you may need to perform some trial and error testing. You could use visualization tools to plot each keypoint one-by-one and identify which indices correspond to the ankle joints.

We hope this information assists you with your project. If you have additional questions, feel free to ask!

Best regards.

@pillai-karthik
Copy link

result = self.model.predict(image)[0]

keypoints = result.keypoints.data[0].tolist()

for i, kp in enumerate(keypoints):
    x = int(kp[0])
    y = int(kp[1])

@glenn-jocher
Copy link
Member

@pillai-karthik hello,

Thank you for sharing your code snippet. Here you're using your model to make predictions on an input image and then extracting the keypoints from the first detection in the results.

The tolist() method you're using on result.keypoints.data[0] is converting the Tensor array into a list of its elements, which are again Tensors (x and y coordinates).

Afterwards, you're iterating through each keypoint and converting its x and y coordinates into integers using the int() function. Each element kp in keypoints is indeed a tensor representing a keypoint of either (x, y) coordinates.

If you experience any issues with this piece of code or have further questions, do not hesitate to ask. Your contribution helps enhance YOLOv8 and its community.

Best regards.

@aratamakino
Copy link

Is there a way to output only the coordinates of both ankles?
Also, how can I get all the coordinates of both human ankles that could be detected?

I would appreciate it if you could let me know.

@glenn-jocher
Copy link
Member

@aratamakino hello,

In the YOLOv8 pose estimation model, the model's output includes keypoint detections for each detected person, which would also include information about the ankles. This keypoint data is accessible from the prediction tensor returned by the model, typically starting at 'pred[:,6:]'. Each keypoint is represented by a pair of x,y-coordinates.

To detect only the ankles or to get their coordinates, you would need to know the layout of these keypoints (often called topology) as defined by the model. The keypoints are ordered as per this topology. You will need to find out which indices correspond to the left and right ankles within this set of keypoints and fetch their coordinates.

If the keypoints topology isn't documented, you might have to conduct some exploratory work like making some visualizations and plot each keypoint individually to identify which ones correspond to the ankles.

To get detections for all humans in an image, you will have to iterate over all the detections made by the model, and for each detection repeat the above process to extract the ankle keypoint coordinates.

Please ensure to preprocess the image correctly and adjust any parameters as needed to help the model make accurate detections.

@NafBZ
Copy link

NafBZ commented Oct 23, 2023

Hi @glenn-jocher. I am using a webcam to detect some key points. I can see the key point coordinates in the terminal however, when I am trying to access those values in real time I am getting an error.
For example, to retrieve bounding box information I can simply do the following:


results = model(source='0', stream=True)

for result in results:
    boxes = result.boxes
    
    for box in boxes:
         x, y, w, h = box.xywh[0]

However, if I am doing this for key points I am getting the following error:

IndexError: index 0 is out of bounds for dimension 0 with size 0

I am detecting just a single class with single key point. I think it's because when the whole tensor is empty it is giving me this error. Do you have any idea how to resolve the issue? Or anybody can help me out in this regard? Thanks in advance.

@glenn-jocher
Copy link
Member

@NafBZ hello,

Your question relates to detecting keypoints of objects in real time using a webcam feed. Firstly, I want to note that your approach to extracting bounding boxes appears correct, as you're accessing each box's xywh property efficiently.

When working with keypoints, your model includes detections for keypoints within the returned results tensor. Specifically, the keypoints data typically starts from 'results[:,6:]'. Each keypoint corresponds to a pair of x,y-coordinates.

The error message you're seeing, "IndexError: index 0 is out of bounds for dimension 0 with size 0", suggests that there might be instances when no keypoints are detected in an image frame from your webcam feed. In such cases, the tensor representing keypoints might be empty. Consequently, when your code attempts to access the first element (index 0), it throws an error because no elements are present.

To avoid this, you could check the size of the keypoints tensor before trying to access individual keypoints. If the tensor is empty (i.e., no keypoints were detected in the frame), the code would skip access attempts to its elements. This logic would help prevent the IndexError from occurring.

Remember that the model's efficiency at detecting keypoints would depend on several factors such as the quality of your input images, the accuracy of your trained model, satisfactory lighting conditions and the correct positioning of humans within the frame.

I hope this provides some guidance to resolve your issue, do not hesitate to let us know if you have more questions!

Best regards.

@NafBZ
Copy link

NafBZ commented Oct 24, 2023

Thanks a lot. I have resolved the issue.

@glenn-jocher
Copy link
Member

@NafBZ hello,

Great to hear that your issue has been resolved! If you have any further questions or if you encounter any other issues while working with YOLOv8, feel free to reach out again. We're here to help!

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests