Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ultralytics 8.2.6 fix HUBDatasetStats.get_json() for empty keypoints #10415

Merged
merged 3 commits into from
May 1, 2024

Conversation

Laughing-q
Copy link
Member

@Laughing-q Laughing-q commented Apr 29, 2024

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Ultralytics library version bump to 8.2.6 with improvements in data handling πŸš€.

πŸ“Š Key Changes

  • Library version updated from 8.2.5 to 8.2.6. πŸ†™
  • Improved keypoints handling in pose estimation tasks for more efficient data processing. πŸ’ƒπŸ•Ί

🎯 Purpose & Impact

  • Version Update: Ensures users have the latest features, bug fixes, and performance improvements. πŸ”„
  • Data Handling in Pose Estimation: This update provides a more streamlined way to handle pose data (like human body keypoints). It makes it easier to process and use this data for various applications, potentially improving the accuracy and efficiency of pose-related tasks. 🎯 This could significantly impact applications in motion analysis, augmented reality, and other areas requiring precise human movement and position tracking.

@Laughing-q Laughing-q marked this pull request as ready for review April 29, 2024 12:04
Copy link

codecov bot commented Apr 29, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 37.37%. Comparing base (bbfca94) to head (fbcb39b).

❗ Current head fbcb39b differs from pull request most recent head e38216b. Consider uploading reports for the commit e38216b to get more accurate results

Files Patch % Lines
ultralytics/data/utils.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10415      +/-   ##
==========================================
- Coverage   37.39%   37.37%   -0.02%     
==========================================
  Files         122      122              
  Lines       15581    15581              
==========================================
- Hits         5826     5824       -2     
- Misses       9755     9757       +2     
Flag Coverage Ξ”
GPU 37.37% <33.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@kumarneeraj2005
Copy link

I suppose you should also check for hidden files that were generated while zipping on a MacBook, such as.datastore, and include validation.

@glenn-jocher
Copy link
Member

Absolutely, addressing hidden files like .datastore created during the zipping process on a MacBook is important for ensuring clean dataset management. 🍏 Here's a quick example of how you could include a validation step in Python to filter these out:

import os

# Your dataset directory
dataset_dir = 'path/to/your/dataset'

# Filtering out hidden files
valid_files = [f for f in os.listdir(dataset_dir) if not f.startswith('.')]

# Now 'valid_files' will only contain non-hidden files
print(valid_files)

This ensures your data loading process remains clean and focused only on relevant files. Thanks for pointing this out! πŸ‘

@glenn-jocher glenn-jocher changed the title Fix HUBDatasetStats.get_json() when keypoints are empty Fix HUBDatasetStats.get_json() when keypoints are empty May 1, 2024
@glenn-jocher glenn-jocher changed the title Fix HUBDatasetStats.get_json() when keypoints are empty ultralytics 8.1.6 fix HUBDatasetStats.get_json() for empty keypoints May 1, 2024
@glenn-jocher glenn-jocher changed the title ultralytics 8.1.6 fix HUBDatasetStats.get_json() for empty keypoints ultralytics 8.2.6 fix HUBDatasetStats.get_json() for empty keypoints May 1, 2024
@glenn-jocher glenn-jocher merged commit 6beccb6 into main May 1, 2024
12 checks passed
@glenn-jocher glenn-jocher deleted the hub-dataset branch May 1, 2024 13:59
@kumarneeraj2005
Copy link

Guys - seems your platform is not ready for Production, its really serious issue. check attached images. When dataset is small its accepting with so called background images and uploading to your platform, but when its large dataset your platform is not able to handle. I request you to please look in this issue on priority basis. O/w cancel my Pro-membership, and i am really serious.
WhatsApp Image 2024-05-01 at 19 26 47
WhatsApp Image 2024-05-01 at 19 27 09

@glenn-jocher
Copy link
Member

glenn-jocher commented May 1, 2024

I suppose you should also check for hidden files that were generated while zipping on a MacBook, such as.datastore, and include validation.

Special files like .DS_Store are already ignored by the Ultralytics package during the unzipping process, so they don't actually make their way into unzipped datasets, but thank you for the tip! We should have your original issue resolved in HUB soon, thank you for your patience. Now that it is resolved in the Ultralytics Python package in version 8.2.6 we need to restart HUB itself to utilize the latest package, but unfortunately we want to wait for a low-usage period to minimize interruptions to users.

def unzip_file(file, path=None, exclude=(".DS_Store", "__MACOSX"), exist_ok=False, progress=True):
"""
Unzips a *.zip file to the specified path, excluding files containing strings in the exclude list.
If the zipfile does not contain a single top-level directory, the function will create a new
directory with the same name as the zipfile (without the extension) to extract its contents.
If a path is not provided, the function will use the parent directory of the zipfile as the default path.
Args:
file (str): The path to the zipfile to be extracted.
path (str, optional): The path to extract the zipfile to. Defaults to None.
exclude (tuple, optional): A tuple of filename strings to be excluded. Defaults to ('.DS_Store', '__MACOSX').
exist_ok (bool, optional): Whether to overwrite existing contents if they exist. Defaults to False.
progress (bool, optional): Whether to display a progress bar. Defaults to True.
Raises:
BadZipFile: If the provided file does not exist or is not a valid zipfile.
Returns:
(Path): The path to the directory where the zipfile was extracted.
Example:
```python
from ultralytics.utils.downloads import unzip_file
dir = unzip_file('path/to/file.zip')
```
"""
from zipfile import BadZipFile, ZipFile, is_zipfile
if not (Path(file).exists() and is_zipfile(file)):
raise BadZipFile(f"File '{file}' does not exist or is a bad zip file.")
if path is None:
path = Path(file).parent # default path
# Unzip the file contents
with ZipFile(file) as zipObj:
files = [f for f in zipObj.namelist() if all(x not in f for x in exclude)]
top_level_dirs = {Path(f).parts[0] for f in files}
if len(top_level_dirs) > 1 or (len(files) > 1 and not files[0].endswith("/")):
# Zip has multiple files at top level
path = extract_path = Path(path) / Path(file).stem # i.e. ../datasets/coco8
else:
# Zip has 1 top-level directory
extract_path = path # i.e. ../datasets
path = Path(path) / list(top_level_dirs)[0] # i.e. ../datasets/coco8
# Check if destination directory already exists and contains files
if path.exists() and any(path.iterdir()) and not exist_ok:
# If it exists and is not empty, return the path without unzipping
LOGGER.warning(f"WARNING ⚠️ Skipping {file} unzip as destination directory {path} is not empty.")
return path
for f in TQDM(files, desc=f"Unzipping {file} to {Path(path).resolve()}...", unit="file", disable=not progress):
# Ensure the file is within the extract_path to avoid path traversal security vulnerability
if ".." in Path(f).parts:
LOGGER.warning(f"Potentially insecure file path: {f}, skipping extraction.")
continue
zipObj.extract(f, extract_path)
return path # return unzip dir

@glenn-jocher
Copy link
Member

Hey @kumarneeraj2005, great news! πŸŽ‰ The HUB has been fully updated with all the latest fixes from #10415, thanks to the examples you provided for debugging. πŸ› οΈ

Please give your dataset upload and training another go, and don't hesitate to reach out if you encounter any more issues or have suggestions for improvement. Your input is incredibly valuable in enhancing our product. Looking forward to hearing from you! 😊

gkinman pushed a commit to Octasic/ultralytics that referenced this pull request May 30, 2024
…nts (ultralytics#10415)

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants