Randomize instance order and crop #38

aaprasad · 2024-05-03T17:51:34Z

As noted in #31, the model may be overfitting to the order in which the instance reid_features are fed into the model. Especially with our dataset logic in microscopy where we just iterate over the track ids this could be especially problematic. Thus in this PR we:

shuffle instances in training mode.

We also add a functionality for random anchor selection when we crop to test if we can generalize the model to missing anchors

Summary by CodeRabbit

New Features
- Introduced new attributes for managing centroid and anchor data, enhancing data structuring capabilities.
- Implemented conditional data augmentation and shuffling for training mode in various datasets to optimize model training.
Enhancements
- Improved bounding box calculation using a more accurate centering technique.
Bug Fixes
- Standardized mode attribute handling by converting it to lowercase across multiple dataset classes to ensure consistent behavior.

add functionality for randomizing crop anchor

coderabbitai · 2024-05-03T17:52:05Z

Walkthrough

The updates across various modules in the BioGTR project focus on enhancing functionality related to data structure manipulation, especially concerning centroids and anchors. Key changes include the introduction of new properties and methods for handling centroids and anchors, conditional augmentation based on training mode, and improved bounding box calculations. These enhancements aim to streamline data handling and augment data processing capabilities in training scenarios.

Changes

Files	Change Summary
`.../data_structures.py`	Introduced handling for `centroid` and `anchor` attributes, including getters and setters. Added methods to retrieve anchor names and centroids.
`.../cell_tracking_dataset.py`, `.../microscopy_dataset.py`, `.../sleap_dataset.py`	Unified changes for mode handling to lowercase, conditional augmentation setup based on mode, and instance shuffling in training. Adjusted anchor selection and centroid calculation in `sleap_dataset.py`.
`.../data_utils.py`	Modified `pose_bbox` function to use mean for bounding box center calculations instead of min/max.

🐰✨
A little rabbit hopped through the code,
Tweaking and tuning its humble abode.
With a flip and a hop, centroids set just right,
Anchors and boxes, oh what a sight!
In fields of data, it dances with glee,
For clean and swift code, as happy as can be! 🌟
🐰✨

Recent Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 9027ca8 and 55daaf9.

Files selected for processing (5)

biogtr/data_structures.py (6 hunks)
biogtr/datasets/cell_tracking_dataset.py (2 hunks)
biogtr/datasets/data_utils.py (2 hunks)
biogtr/datasets/microscopy_dataset.py (2 hunks)
biogtr/datasets/sleap_dataset.py (4 hunks)

Additional Context Used

Ruff (2)

biogtr/datasets/data_utils.py (1)

334-334: Do not use bare except

biogtr/datasets/microscopy_dataset.py (1)

86-86: Do not assign a lambda expression, use a def

Additional comments not posted (11)

biogtr/datasets/microscopy_dataset.py (2)

78-81: Conditional initialization of augmentations is correctly implemented.

194-196: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/cell_tracking_dataset.py (2)

84-87: Conditional initialization of augmentations is correctly implemented.

205-207: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/sleap_dataset.py (3)

86-89: Conditional initialization of augmentations is correctly implemented.

Line range hint 250-271: Random anchor selection logic is correctly implemented, enhancing model robustness by introducing variability.

293-295: Shuffling of instances in training mode is correctly implemented to prevent overfitting.

biogtr/datasets/data_utils.py (1)

126-126: Using np.nanmean for center calculations in pose_bbox function enhances robustness by handling missing data effectively.

biogtr/data_structures.py (3)

302-319: Review the implementation of the centroid property and its setter.

The implementation of the centroid property and its setter is clear and follows Pythonic conventions with appropriate documentation.

321-329: Consider caching the anchor calculation.
[PERFORMANCE]
The calculation of the anchor name involves converting the keys of the centroid dictionary to a list, which is done every time the anchor property is accessed. Consider caching this value or computing it less frequently to improve performance.

1002-1029: Validate the new methods for anchor and centroid retrieval.

The new methods get_anchors and get_centroids are well-implemented with clear documentation. They provide a structured way to access anchor names and centroids, which is crucial for the new functionality introduced in the PR.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

Out of diff range and nitpick comments (3)

biogtr/datasets/microscopy_dataset.py (1)
Line range hint 86-86: Replace lambda expression with a function definition for better readability and maintainability.
- parser = lambda x: data_utils.parse_synthetic(x, source=source)
+ def parser(x):
+     return data_utils.parse_synthetic(x, source=source)
biogtr/datasets/data_utils.py (1)
Line range hint 334-334: Avoid using bare except to handle exceptions explicitly.
- except:
+ except Exception as e:
+     print(f"Error encountered: {e}")
biogtr/data_structures.py (1)

122-122: Validate the representation of centroid in the __repr__ method.

The representation of centroid in the __repr__ method might expose internal structure which could be sensitive. Consider overriding the __str__ method in the dictionary to provide a more controlled output.

coderabbitai · 2024-05-03T17:57:52Z

biogtr/data_structures.py

@@ -16,6 +16,7 @@ def __init__(
        pred_track_id: int = -1,
        bbox: ArrayLike = torch.empty((0, 4)),
        crop: ArrayLike = torch.tensor([]),
+        centroid: dict[str, ArrayLike] = None,


Ensure the default value for centroid is consistent with its usage.

The default value for centroid is set to None, which might lead to issues if methods assume it always contains a dictionary. Consider initializing it as an empty dictionary {} for safer handling.

Bad! As is initializing with the torch.empty and torch.tensor above since these are all mutable!

Just change the type to centroid: Optional[dict[str, ArrayLike]] = None or centroid: dict[str, ArrayLike] | None = None (newer python syntax)

@talmo, you're absolutely right. Using Optional[dict[str, ArrayLike]] = None or centroid: dict[str, ArrayLike] | None = None (for Python 3.10 and newer) in the type hint clearly indicates that the centroid can be None. This approach is safer and more explicit, especially in a dynamic setting where the centroid might not always be provided. This change also aligns with best practices in Python to avoid mutable default arguments which can lead to less predictable behaviors. Thank you for pointing this out!

New Learnings Added

User: talmo PR: talmolab/biogtr#38 File: biogtr/data_structures.py:19-19 Timestamp: 2024-05-06T19:13:29.738Z Learning: The `centroid` attribute in the `Instance` class should be initialized with `None` and explicitly allowed to be `None` through type hints using `Optional` or the union operator `|`. This avoids issues associated with mutable default arguments.

Note: Learnings are effective only in the context of specific code segments. To apply general coding guidelines, please configure review instructions.

I opened an issue to go through and remove all the mutable defaults.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2024-05-03T17:57:52Z

biogtr/data_structures.py

+        if centroid is not None:
+            self._centroid = centroid
+        elif self.bbox.shape[0]:
+            y1, x1, y2, x2 = self.bbox.squeeze()
+            self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}
+        else:
+            self._centroid = {}
+


Refactor the centroid calculation to a separate method.

The logic for calculating the centroid based on the bounding box is repeated in the constructor and the pose setter. Consider extracting this into a separate method to adhere to the DRY principle.

+ def calculate_centroid(self): + y1, x1, y2, x2 = self.bbox.squeeze() + return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])} - y1, x1, y2, x2 = self.bbox.squeeze() - self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])} + self._centroid = self.calculate_centroid()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

if centroid is not None:

self._centroid = centroid

elif self.bbox.shape[0]:

y1, x1, y2, x2 = self.bbox.squeeze()

self._centroid = {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}

else:

self._centroid = {}

if centroid is not None:

self._centroid = centroid

elif self.bbox.shape[0]:

self._centroid = self.calculate_centroid()

else:

self._centroid = {}

def calculate_centroid(self):

y1, x1, y2, x2 = self.bbox.squeeze()

return {"centroid": np.array([(x1 + x2) / 2, (y1 + y2) / 2])}

coderabbitai · 2024-05-03T17:57:52Z

biogtr/datasets/microscopy_dataset.py

@@ -69,16 +69,16 @@ def __init__(
        self.clip_length = clip_length
        self.crop_size = crop_size
        self.padding = padding
-        self.mode = mode
+        self.mode = mode.lower()


Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train": + if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

self.mode = mode.lower()

self.mode = mode.lower()

coderabbitai · 2024-05-03T17:57:52Z

biogtr/datasets/cell_tracking_dataset.py

@@ -74,16 +74,17 @@ def __init__(
        self.clip_length = clip_length
        self.crop_size = crop_size
        self.padding = padding
-        self.mode = mode
+        self.mode = mode.lower()


Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train": + if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

self.mode = mode.lower()

self.mode = mode.lower()

coderabbitai · 2024-05-03T17:57:52Z

biogtr/datasets/sleap_dataset.py

@@ -75,18 +75,18 @@ def __init__(
        self.crop_size = crop_size
        self.chunk = chunk
        self.clip_length = clip_length
-        self.mode = mode
+        self.mode = mode.lower()


Ensure consistent use of string literals for mode comparison.

- if augmentations and self.mode == "train": + if augmentations and self.mode == 'train':

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

self.mode = mode.lower()

self.mode = mode.lower()

aaprasad added 3 commits May 3, 2024 10:37

shuffle instance order if dataset is in training mode.

a6da501

add functionality for randomizing crop anchor

add anchor and centroid properties

3c62687

simplify pose centroid computation

55daaf9

coderabbitai bot reviewed May 3, 2024

View reviewed changes

talmo approved these changes May 6, 2024

View reviewed changes

aaprasad merged commit 1b7e817 into main May 7, 2024
3 checks passed

aaprasad mentioned this pull request May 7, 2024

Shuffle instance ordering within a frame during training #31

Closed

talmo deleted the aadi/randomize-instance-order-and-crop branch May 14, 2024 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomize instance order and crop #38

Randomize instance order and crop #38

aaprasad commented May 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 3, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot May 3, 2024 •

edited

Loading

talmo May 6, 2024

coderabbitai bot May 6, 2024

aaprasad May 7, 2024

coderabbitai bot May 7, 2024

coderabbitai bot May 3, 2024

coderabbitai bot May 3, 2024

coderabbitai bot May 3, 2024

coderabbitai bot May 3, 2024

Randomize instance order and crop #38

Randomize instance order and crop #38

Conversation

aaprasad commented May 3, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented May 3, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 3, 2024 • edited Loading

Choose a reason for hiding this comment

talmo May 6, 2024

Choose a reason for hiding this comment

coderabbitai bot May 6, 2024

Choose a reason for hiding this comment

aaprasad May 7, 2024

Choose a reason for hiding this comment

coderabbitai bot May 7, 2024

Choose a reason for hiding this comment

coderabbitai bot May 3, 2024

Choose a reason for hiding this comment

coderabbitai bot May 3, 2024

Choose a reason for hiding this comment

coderabbitai bot May 3, 2024

Choose a reason for hiding this comment

coderabbitai bot May 3, 2024

Choose a reason for hiding this comment

aaprasad commented May 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 3, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot May 3, 2024 •

edited

Loading