Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomize instance order and crop #38

Merged
merged 3 commits into from
May 7, 2024

Conversation

aaprasad
Copy link
Contributor

@aaprasad aaprasad commented May 3, 2024

As noted in #31, the model may be overfitting to the order in which the instance reid_features are fed into the model. Especially with our dataset logic in microscopy where we just iterate over the track ids this could be especially problematic. Thus in this PR we:

  • shuffle instances in training mode.

We also add a functionality for random anchor selection when we crop to test if we can generalize the model to missing anchors

Summary by CodeRabbit

  • New Features

    • Introduced new attributes for managing centroid and anchor data, enhancing data structuring capabilities.
    • Implemented conditional data augmentation and shuffling for training mode in various datasets to optimize model training.
  • Enhancements

    • Improved bounding box calculation using a more accurate centering technique.
  • Bug Fixes

    • Standardized mode attribute handling by converting it to lowercase across multiple dataset classes to ensure consistent behavior.

Copy link
Contributor

coderabbitai bot commented May 3, 2024

Walkthrough

The updates across various modules in the BioGTR project focus on enhancing functionality related to data structure manipulation, especially concerning centroids and anchors. Key changes include the introduction of new properties and methods for handling centroids and anchors, conditional augmentation based on training mode, and improved bounding box calculations. These enhancements aim to streamline data handling and augment data processing capabilities in training scenarios.

Changes

Files Change Summary
.../data_structures.py Introduced handling for centroid and anchor attributes, including getters and setters. Added methods to retrieve anchor names and centroids.
.../cell_tracking_dataset.py, .../microscopy_dataset.py, .../sleap_dataset.py Unified changes for mode handling to lowercase, conditional augmentation setup based on mode, and instance shuffling in training. Adjusted anchor selection and centroid calculation in sleap_dataset.py.
.../data_utils.py Modified pose_bbox function to use mean for bounding box center calculations instead of min/max.

🐰✨
A little rabbit hopped through the code,
Tweaking and tuning its humble abode.
With a flip and a hop, centroids set just right,
Anchors and boxes, oh what a sight!
In fields of data, it dances with glee,
For clean and swift code, as happy as can be! 🌟
🐰✨


Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 9027ca8 and 55daaf9.
Files selected for processing (5)
  • biogtr/data_structures.py (6 hunks)
  • biogtr/datasets/cell_tracking_dataset.py (2 hunks)
  • biogtr/datasets/data_utils.py (2 hunks)
  • biogtr/datasets/microscopy_dataset.py (2 hunks)
  • biogtr/datasets/sleap_dataset.py (4 hunks)
Additional Context Used
Ruff (2)
biogtr/datasets/data_utils.py (1)

334-334: Do not use bare except

biogtr/datasets/microscopy_dataset.py (1)

86-86: Do not assign a lambda expression, use a def

Additional comments not posted (11)
biogtr/datasets/microscopy_dataset.py (2)

78-81: Conditional initialization of augmentations is correctly implemented.


194-196: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/cell_tracking_dataset.py (2)

84-87: Conditional initialization of augmentations is correctly implemented.


205-207: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/sleap_dataset.py (3)

86-89: Conditional initialization of augmentations is correctly implemented.


Line range hint 250-271: Random anchor selection logic is correctly implemented, enhancing model robustness by introducing variability.


293-295: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/data_utils.py (1)

126-126: Using np.nanmean for center calculations in pose_bbox function enhances robustness by handling missing data effectively.

biogtr/data_structures.py (3)

302-319: Review the implementation of the centroid property and its setter.

The implementation of the centroid property and its setter is clear and follows Pythonic conventions with appropriate documentation.


321-329: Consider caching the anchor calculation.
[PERFORMANCE]
The calculation of the anchor name involves converting the keys of the centroid dictionary to a list, which is done every time the anchor property is accessed. Consider caching this value or computing it less frequently to improve performance.


1002-1029: Validate the new methods for anchor and centroid retrieval.

The new methods get_anchors and get_centroids are well-implemented with clear documentation. They provide a structured way to access anchor names and centroids, which is crucial for the new functionality introduced in the PR.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Out of diff range and nitpick comments (3)
biogtr/datasets/microscopy_dataset.py (1)

Line range hint 86-86: Replace lambda expression with a function definition for better readability and maintainability.

- parser = lambda x: data_utils.parse_synthetic(x, source=source)
+ def parser(x):
+     return data_utils.parse_synthetic(x, source=source)
biogtr/datasets/data_utils.py (1)

Line range hint 334-334: Avoid using bare except to handle exceptions explicitly.

- except:
+ except Exception as e:
+     print(f"Error encountered: {e}")
biogtr/data_structures.py (1)

122-122: Validate the representation of centroid in the __repr__ method.

The representation of centroid in the __repr__ method might expose internal structure which could be sensitive. Consider overriding the __str__ method in the dictionary to provide a more controlled output.

@@ -16,6 +16,7 @@ def __init__(
pred_track_id: int = -1,
bbox: ArrayLike = torch.empty((0, 4)),
crop: ArrayLike = torch.tensor([]),
centroid: dict[str, ArrayLike] = None,
Copy link
Contributor

@coderabbitai coderabbitai bot May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the default value for centroid is consistent with its usage.

The default value for centroid is set to None, which might lead to issues if methods assume it always contains a dictionary. Consider initializing it as an empty dictionary {} for safer handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad! As is initializing with the torch.empty and torch.tensor above since these are all mutable!

Just change the type to centroid: Optional[dict[str, ArrayLike]] = None or centroid: dict[str, ArrayLike] | None = None (newer python syntax)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talmo, you're absolutely right. Using Optional[dict[str, ArrayLike]] = None or centroid: dict[str, ArrayLike] | None = None (for Python 3.10 and newer) in the type hint clearly indicates that the centroid can be None. This approach is safer and more explicit, especially in a dynamic setting where the centroid might not always be provided. This change also aligns with best practices in Python to avoid mutable default arguments which can lead to less predictable behaviors. Thank you for pointing this out!


New Learnings Added
User: talmo
PR: talmolab/biogtr#38
File: biogtr/data_structures.py:19-19
Timestamp: 2024-05-06T19:13:29.738Z
Learning: The `centroid` attribute in the `Instance` class should be initialized with `None` and explicitly allowed to be `None` through type hints using `Optional` or the union operator `|`. This avoids issues associated with mutable default arguments.

Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue to go through and remove all the mutable defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +64 to +71
if centroid is not None:
self._centroid = centroid
elif self.bbox.shape[0]:
y1, x1, y2, x2 = self.bbox.squeeze()
self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
else:
self._centroid = {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor the centroid calculation to a separate method.

The logic for calculating the centroid based on the bounding box is repeated in the constructor and the pose setter. Consider extracting this into a separate method to adhere to the DRY principle.

+ def calculate_centroid(self):
+     y1, x1, y2, x2 = self.bbox.squeeze()
+     return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}

- y1, x1, y2, x2 = self.bbox.squeeze()
- self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
+ self._centroid = self.calculate_centroid()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if centroid is not None:
self._centroid = centroid
elif self.bbox.shape[0]:
y1, x1, y2, x2 = self.bbox.squeeze()
self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
else:
self._centroid = {}
if centroid is not None:
self._centroid = centroid
elif self.bbox.shape[0]:
self._centroid = self.calculate_centroid()
else:
self._centroid = {}
def calculate_centroid(self):
y1, x1, y2, x2 = self.bbox.squeeze()
return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}

@@ -69,16 +69,16 @@ def __init__(
self.clip_length = clip_length
self.crop_size = crop_size
self.padding = padding
self.mode = mode
self.mode = mode.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.mode = mode.lower()
self.mode = mode.lower()

@@ -74,16 +74,17 @@ def __init__(
self.clip_length = clip_length
self.crop_size = crop_size
self.padding = padding
self.mode = mode
self.mode = mode.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.mode = mode.lower()
self.mode = mode.lower()

@@ -75,18 +75,18 @@ def __init__(
self.crop_size = crop_size
self.chunk = chunk
self.clip_length = clip_length
self.mode = mode
self.mode = mode.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train":
+ if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.mode = mode.lower()
self.mode = mode.lower()

@aaprasad aaprasad merged commit 1b7e817 into main May 7, 2024
3 checks passed
@talmo talmo deleted the aadi/randomize-instance-order-and-crop branch May 14, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants