Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow generated datasets to be directly created #4416

Merged
merged 1 commit into from
May 24, 2024
Merged

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented May 23, 2024

Resolves #4397

The make_{patches|frames|clips}_dataset() methods were originally designed to be called internally by the to_{patches|frames|clips}() methods but not called directly. For example, as observed in #4397, they currently return a dataset that behaves like a view in certain ways. Also, directly calling make_clips_dataset() is especially pernicious, as it currently returns a dataset that silently reuses the same frames collection as the input dataset.

However, there is a good use case for directly calling make_{patches|frames|clips}_dataset() to convert a collection into an independent dataset.

So, this PR maintains the behavior of to_{patches|frames|clips}() while tweaking the default behavior of the make_{patches|frames|clips}_dataset() methods so that they do return a completely normal dataset.

Example usage

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.core.patches as fop
import fiftyone.core.clips as foc
import fiftyone.core.video as fov
from fiftyone import ViewField as F

#
# Patches dataset
#

dataset = foz.load_zoo_dataset("quickstart")

patches_view = dataset.to_patches("ground_truth")
patches_dataset = fop.make_patches_dataset(dataset, "ground_truth")

assert patches_view._is_generated == True
assert patches_dataset._is_generated == False
assert patches_dataset._sample_collection_name != dataset._sample_collection_name
assert patches_dataset._frame_collection_name is None
assert patches_dataset.count() == patches_view.count()
assert patches_dataset.count() == dataset.count("ground_truth.detections")

#
# Frames dataset
#

dataset = foz.load_zoo_dataset("quickstart-video")

frames_view = dataset.to_frames(sample_frames=True)
frames_dataset = fov.make_frames_dataset(dataset, sample_frames=True)

assert frames_view._is_generated == True
assert frames_dataset._is_generated == False
assert frames_dataset._sample_collection_name != dataset._sample_collection_name
assert frames_dataset._frame_collection_name is None
assert frames_dataset.count() == frames_view.count()
assert frames_dataset.count() == dataset.count("frames")

#
# Clips dataset
#

dataset = foz.load_zoo_dataset("quickstart-video")

expr = F("detections.detections").length() > 10
clips_view = dataset.to_clips(expr)
clips_dataset = foc.make_clips_dataset(dataset, expr)

assert clips_view._is_generated == True
assert clips_dataset._is_generated == False
assert clips_dataset._sample_collection_name != dataset._sample_collection_name
assert clips_dataset._frame_collection_name != dataset._frame_collection_name
assert clips_dataset.count() == clips_view.count()
assert clips_dataset.count("frames") == clips_view.count("frames")

Summary by CodeRabbit

  • New Features

    • Enhanced dataset management with new parameters to control dataset persistence and generation behavior.
  • Bug Fixes

    • Corrected comments related to frame and sample ID associations in clips datasets.
  • Tests

    • Added new unit tests for patches, frames, and clips datasets to ensure proper creation and properties validation.

@brimoor brimoor requested review from benjaminpkane and a team May 23, 2024 14:43
Copy link
Contributor

coderabbitai bot commented May 23, 2024

Walkthrough

The recent updates to the FiftyOne library introduce new parameters persistent and _generated to several dataset creation functions. These parameters enhance dataset management by controlling persistence and generation behavior. Specifically, they allow for creating either temporary or persistent datasets based on the _generated flag. This change affects functions in various modules, including clips.py, dataset.py, patches.py, stages.py, and video.py. Additionally, new tests ensure the correct functionality of these updates.

Changes

File(s) Change Summary
fiftyone/core/clips.py Added persistent and _generated parameters to make_clips_dataset; modified name parameter to _name; adjusted dataset creation logic.
fiftyone/core/dataset.py Corrected a comment related to clips datasets.
fiftyone/core/patches.py Added persistent and _generated parameters to make_patches_dataset and make_evaluation_patches_dataset.
fiftyone/core/stages.py Added _generated parameter to several functions, reordered arguments.
fiftyone/core/video.py Added persistent and _generated parameters to make_frames_dataset; updated function documentation.
tests/unittests/patches_tests.py Added test_make_patches_dataset method to validate patch dataset creation.
tests/unittests/video_tests.py Added test_make_frames_dataset and test_make_clips_dataset methods to validate frames and clips dataset creation.

Assessment against linked issues

Objective Addressed Explanation
Ensure make_frames_dataset creates a Dataset not treated as FramesView (#4397) The changes include adding parameters to control dataset behavior, but it's unclear if _is_frames issue is resolved.
Validate new parameters in dataset creation functions
Add and verify new test methods for dataset creation

In fields of code where datasets grow,
New flags now guide where data flows.
Persistent paths or temp's embrace,
Each line of code finds its place.
With tests to guard and bugs to chase,
Our codebase shines with added grace.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range and nitpick comments (14)
tests/unittests/patches_tests.py (2)

Line range hint 171-171: Refactor to use more descriptive variable names than l.

Using single-letter variable names like l can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.

Also applies to: 475-475


Line range hint 537-537: Use direct truthiness checks instead of equality to True.

- if F("crowd") == True:
- if F("crowd") == True:
+ if F("crowd"):
+ if F("crowd"):

Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.

Also applies to: 579-579

fiftyone/core/patches.py (2)

Line range hint 654-654: Use direct truthiness checks instead of equality to True.

- if other_fields == True:
- if other_fields == True:
+ if other_fields:
+ if other_fields:

Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.

Also applies to: 805-805


Line range hint 285-285: Refactor to use more descriptive variable names than l.

Using single-letter variable names like l can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.

fiftyone/core/video.py (4)

Line range hint 617-617: Simplify the boolean comparison for clarity.

-    if sample_frames != True:
-        l = locals()
+    if not sample_frames:
+        local_vars = locals()  # Also renamed 'l' to 'local_vars' for clarity

Line range hint 707-707: Use direct boolean checks for cleaner and more Pythonic code.

-    if sample_frames == False:
+    if not sample_frames:

-    if sample_frames != False:
+    if sample_frames:

-    if sample_frames == True:
+    if sample_frames:

Also applies to: 741-741, 748-748, 776-776, 810-810, 813-813, 858-858


Line range hint 789-789: Use is not None for None checks to follow Python best practices.

-    if rel_dir != None:
+    if rel_dir is not None:

Line range hint 1032-1032: Refactor to use direct boolean checks for consistency and readability.

-    if sample_frames == False:
+    if not sample_frames:

-    if sample_frames != True:
+    if not sample_frames:

Also applies to: 1037-1037

fiftyone/core/clips.py (4)

Line range hint 119-119: Replace generic exception handling with specific exceptions.

Using a bare except: clause can catch unexpected exceptions and obscure programming errors. It's better to catch specific exceptions to avoid hiding bugs and to handle only the relevant errors that you expect might occur in this context.


Line range hint 542-542: Use direct truthiness testing for other_fields.

Instead of comparing other_fields to True, you can use the truthiness of the variable directly. This is more Pythonic and readable.

- if other_fields == True:
+ if other_fields:

Also applies to: 723-723


Line range hint 1055-1055: Use is not None for None comparison.

For clarity and to adhere to Python best practices, use is not None instead of != None when you intend to check if a variable is not None.


Line range hint 1125-1125: Clarify the variable name l to improve readability.

The variable name l is ambiguous and can be easily confused with the number 1 or the letter I in many fonts. Use a more descriptive name to improve code readability and maintainability.

tests/unittests/video_tests.py (2)

Line range hint 1040-1040: Replace lambda with a function definition for clarity and maintainability.

- filepath_fcn = lambda sample: sample.filepath
+ def filepath_fcn(sample):
+     return sample.filepath

Line range hint 2689-2689: Consider renaming the variable l to a more descriptive name to improve code readability.

- for _id, l, i, s in zip(
+ for _id, label, index, support in zip(

Apply this change consistently wherever the variable l is used ambiguously.

Also applies to: 2954-2954, 3256-3256

Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits Files that changed from the base of the PR and between 36d4be2 and 48a5eaa.
Files selected for processing (7)
  • fiftyone/core/clips.py (4 hunks)
  • fiftyone/core/dataset.py (1 hunks)
  • fiftyone/core/patches.py (6 hunks)
  • fiftyone/core/stages.py (5 hunks)
  • fiftyone/core/video.py (4 hunks)
  • tests/unittests/patches_tests.py (2 hunks)
  • tests/unittests/video_tests.py (3 hunks)
Additional Context Used
Ruff (58)
fiftyone/core/clips.py (5)

119-119: Do not use bare except


542-542: Avoid equality comparisons to True; use if other_fields: for truth checks


723-723: Avoid equality comparisons to True; use if other_fields: for truth checks


1055-1055: Comparison to None should be cond is not None


1125-1125: Ambiguous variable name: l

fiftyone/core/dataset.py (13)

40-40: fiftyone.core.odm.dataset.SampleFieldDocument imported but unused


236-236: Do not use bare except


342-342: Do not use bare except


3325-3325: Ambiguous variable name: l


3341-3341: Ambiguous variable name: l


3384-3384: Ambiguous variable name: l


6733-6733: Avoid equality comparisons to False; use if not attach_frames: for false checks


7112-7112: Do not use bare except


7122-7122: Do not use bare except


7282-7282: Do not assign a lambda expression, use a def


7282-7282: Ambiguous variable name: l


7576-7576: Do not use bare except


8396-8396: Do not assign a lambda expression, use a def

fiftyone/core/patches.py (3)

285-285: Ambiguous variable name: l


654-654: Avoid equality comparisons to True; use if other_fields: for truth checks


805-805: Avoid equality comparisons to True; use if other_fields: for truth checks

fiftyone/core/stages.py (17)

946-954: Do not assign a lambda expression, use a def


956-970: Do not assign a lambda expression, use a def


1540-1540: Comparison to None should be cond is not None


2504-2504: Do not assign a lambda expression, use a def


2509-2509: Do not assign a lambda expression, use a def


2568-2568: Comparison to None should be cond is not None


2570-2570: Comparison to None should be cond is not None


2577-2577: Comparison to None should be cond is not None


2588-2588: Comparison to None should be cond is not None


2752-2752: Comparison to None should be cond is not None


3920-3920: Comparison to None should be cond is not None


5178-5178: Comparison to None should be cond is not None


6620-6620: Comparison to None should be cond is not None


7270-7270: Do not use bare except


7282-7282: Do not use bare except


7290-7290: Do not use bare except


8284-8284: Do not use bare except

fiftyone/core/video.py (12)

617-617: Avoid inequality comparisons to True; use if not sample_frames: for false checks


618-618: Ambiguous variable name: l


707-707: Avoid equality comparisons to False; use if not sample_frames: for false checks


741-741: Avoid inequality comparisons to False; use if sample_frames: for truth checks


748-748: Avoid equality comparisons to True; use if sample_frames: for truth checks


776-776: Avoid equality comparisons to True; use if sample_frames: for truth checks


789-789: Comparison to None should be cond is not None


810-810: Avoid inequality comparisons to False; use if sample_frames: for truth checks


813-813: Avoid equality comparisons to True; use if sample_frames: for truth checks


858-858: Avoid equality comparisons to True; use if sample_frames: for truth checks


1032-1032: Avoid equality comparisons to False; use if not sample_frames: for false checks


1037-1037: Avoid inequality comparisons to True; use if not sample_frames: for false checks

tests/unittests/patches_tests.py (4)

171-171: Ambiguous variable name: l


475-475: Ambiguous variable name: l


537-537: Avoid equality comparisons to True; use if F("crowd"): for truth checks


579-579: Avoid equality comparisons to True; use if F("crowd"): for truth checks

tests/unittests/video_tests.py (4)

1040-1040: Do not assign a lambda expression, use a def


2689-2689: Ambiguous variable name: l


2954-2954: Ambiguous variable name: l


3256-3256: Ambiguous variable name: l

Additional comments not posted (11)
fiftyone/core/patches.py (2)

Line range hint 561-592: LGTM! The addition of persistent and _generated parameters enhances dataset management flexibility.

The changes are well-documented and correctly implemented according to the PR's objectives.


Line range hint 683-738: LGTM! The addition of persistent and _generated parameters enhances dataset management flexibility.

The changes are well-documented and correctly implemented according to the PR's objectives.

tests/unittests/video_tests.py (3)

16-16: Ensure the newly added imports are utilized within the file.

Also applies to: 18-18


2203-2257: Review the implementation of test_make_frames_dataset.

This test method effectively checks the independence of the frames dataset from the original dataset, which aligns with the PR's objectives. It also verifies that the frames dataset does not have a frame collection, which is expected behavior for a generated dataset.


3286-3354: Review the implementation of test_make_clips_dataset.

This test method checks the functionality of creating a clips dataset from events. It validates that the clips dataset is independent from the original dataset in terms of sample and frame collection names, and it confirms that the clips dataset is not marked as generated, which is consistent with the intended changes.

fiftyone/core/stages.py (5)

7926-7930: The inclusion of _generated=True in make_clips_dataset with trajectories=True ensures dataset independence and correct functionality.

Verification successful

The inclusion of _generated=True and trajectories=True in make_clips_dataset is properly handled in the codebase. The parameters are correctly integrated in fiftyone/core/stages.py, and the unit tests in tests/unittests/video_tests.py ensure the functionality.

  • fiftyone/core/stages.py: The parameters _generated=True and trajectories=True are used in the function calls.
  • tests/unittests/video_tests.py: The unit test test_make_clips_dataset verifies the creation of the clips dataset.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_clips_dataset'

Length of output: 68



Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'make_clips_dataset'

Length of output: 1111



Script:

#!/bin/bash
# Description: Extract relevant lines to verify the handling of `_generated=True` and `trajectories=True` in `make_clips_dataset`.

# Extract lines from `fiftyone/core/stages.py` where `make_clips_dataset` is used.
rg --type py --context 5 'make_clips_dataset' fiftyone/core/stages.py

# Extract lines from `tests/unittests/video_tests.py` where `make_clips_dataset` is used.
rg --type py --context 5 'make_clips_dataset' tests/unittests/video_tests.py

Length of output: 3231


8114-8116: The addition of _generated=True to make_frames_dataset ensures that frames datasets are treated as independent, aligning with the PR objectives.

Verification successful

The addition of _generated=True to make_frames_dataset is properly integrated in fiftyone/core/stages.py. The test in tests/unittests/video_tests.py and the function definition in fiftyone/core/video.py do not explicitly show _generated=True, but this does not affect the correctness of the integration.

  • fiftyone/core/stages.py: _generated=True is correctly passed to make_frames_dataset.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_frames_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_frames_dataset'

Length of output: 69



Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_frames_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'make_frames_dataset'

Length of output: 525



Script:

#!/bin/bash
# Description: Extract relevant lines to verify `_generated=True` integration in `make_frames_dataset`.

# Extract lines from `tests/unittests/video_tests.py`
rg --type py --context 5 'make_frames_dataset' tests/unittests/video_tests.py

# Extract lines from `fiftyone/core/video.py`
rg --type py --context 5 'make_frames_dataset' fiftyone/core/video.py

# Extract lines from `fiftyone/core/stages.py`
rg --type py --context 5 'make_frames_dataset' fiftyone/core/stages.py

Length of output: 2038


7482-7485: The addition of _generated=True to make_patches_dataset aligns with the PR objectives to ensure datasets are treated as independent. Ensure this change integrates well with existing datasets.

Verification successful

The addition of _generated=True to make_patches_dataset aligns with the PR objectives to ensure datasets are treated as independent. The function definition in fiftyone/core/patches.py includes _generated as a parameter, and its usage in fiftyone/core/stages.py correctly passes _generated=True. The unit test in tests/unittests/patches_tests.py also verifies the behavior of generated datasets.

  • fiftyone/core/patches.py: The function definition includes _generated=False by default.
  • fiftyone/core/stages.py: The function is called with _generated=True.
  • tests/unittests/patches_tests.py: The test verifies the behavior of the generated dataset.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_patches_dataset'

Length of output: 70



Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_patches_dataset`.

# Test: Search for the function definition and its usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'def make_patches_dataset'
rg --type py 'make_patches_dataset'

Length of output: 646



Script:

#!/bin/bash
# Description: Extract the definition and usage of `make_patches_dataset` for review.

# Extract the function definition from `fiftyone/core/patches.py`
rg --type py --context 10 'def make_patches_dataset' fiftyone/core/patches.py

# Extract the usage in `tests/unittests/patches_tests.py`
rg --type py --context 10 'make_patches_dataset' tests/unittests/patches_tests.py

# Extract the references in `fiftyone/core/stages.py`
rg --type py --context 10 'make_patches_dataset' fiftyone/core/stages.py

Length of output: 3813


7789-7792: The addition of _generated=True to make_clips_dataset ensures that clips datasets are treated as independent, aligning with the PR objectives.

Verification successful

The addition of _generated=True to make_clips_dataset is correctly integrated and ensures that clips datasets are treated as independent.

  • fiftyone/core/clips.py: The function definition includes _generated=False by default.
  • fiftyone/core/stages.py: The function is called with _generated=True.
  • tests/unittests/video_tests.py: The function is tested with _generated=True.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_clips_dataset'

Length of output: 68



Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py $'make_clips_dataset'

Length of output: 1111



Script:

#!/bin/bash
# Description: Extract the definition of `make_clips_dataset` and its usage in relevant files.

# Extract the function definition from `fiftyone/core/clips.py`
rg --type py --context 10 --max-count 1 'def make_clips_dataset' fiftyone/core/clips.py

# Extract the usage in `fiftyone/core/stages.py`
rg --type py --context 10 'make_clips_dataset' fiftyone/core/stages.py

# Extract the usage in `tests/unittests/video_tests.py`
rg --type py --context 10 'make_clips_dataset' tests/unittests/video_tests.py

Length of output: 6043


7629-7632: The addition of _generated=True to make_evaluation_patches_dataset is consistent with the PR's goal. Verify that this parameter is handled correctly across the codebase.

Verification successful

The addition of _generated=True to make_evaluation_patches_dataset is consistent with the PR's goal. The function definition in fiftyone/core/patches.py includes _generated as a parameter, and its usage in fiftyone/core/stages.py correctly passes _generated=True.

  • fiftyone/core/patches.py:

    def make_evaluation_patches_dataset(
        sample_collection,
        eval_key,
        other_fields=None,
        name=None,
        persistent=False,
        _generated=False,
    ):
  • fiftyone/core/stages.py:

    eval_patches_dataset = fop.make_evaluation_patches_dataset(
        sample_collection,
        self._eval_key,
        _generated=True,
        **kwargs,
    )
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_evaluation_patches_dataset'

Length of output: 81



Script:

#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py $'make_evaluation_patches_dataset'

Length of output: 400



Script:

#!/bin/bash
# Description: Extract the definition of `make_evaluation_patches_dataset` and its usages.

# Extract the function definition from `fiftyone/core/patches.py`
ast-grep --lang python --pattern $'def make_evaluation_patches_dataset($_, $_, $_, **kwargs):\n    $$$' fiftyone/core/patches.py

# Extract the usages from `fiftyone/core/stages.py`
rg --type py 'make_evaluation_patches_dataset' fiftyone/core/stages.py

Length of output: 419



Script:

#!/bin/bash
# Description: Extract the definition of `make_evaluation_patches_dataset` and its usage.

# Extract the function definition from `fiftyone/core/patches.py`
rg --type py --context 10 'def make_evaluation_patches_dataset' fiftyone/core/patches.py

# Extract the usage from `fiftyone/core/stages.py`
rg --type py --context 10 'make_evaluation_patches_dataset' fiftyone/core/stages.py

Length of output: 2467

fiftyone/core/dataset.py (1)

7809-7809: Ensure the pipeline adjustment for clips datasets is fully compatible with standalone collection requirements.

Verification successful

The pipeline adjustments for clips datasets in fiftyone/core/clips.py appear to handle _id and sample_id correctly, aligning with the requirement for clips datasets to operate as standalone collections.

  • fiftyone/core/clips.py:
    • Lines 825, 828, 832, 839, 872, 878, 881, 883, 899, 945, 1025
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the complete pipeline adjustment for clips datasets.

# Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`.
rg --type python $'pipeline' | grep 'clips'

Length of output: 71



Script:

#!/bin/bash
# Description: Verify the complete pipeline adjustment for clips datasets.

# Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`.
rg --type py 'pipeline' | grep 'clips'

Length of output: 686



Script:

#!/bin/bash
# Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract lines involving `pipeline` in `fiftyone/core/clips.py`
rg --type py 'pipeline' fiftyone/core/clips.py -n

Length of output: 491



Script:

#!/bin/bash
# Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract lines involving `pipeline` in `fiftyone/core/clips.py`
sed -n '825p;828p;832p;839p;872p;878p;881p;883p;899p;945p;1025p' fiftyone/core/clips.py

Length of output: 486



Script:

#!/bin/bash
# Description: Extract broader context around lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract broader context (5 lines before and after) around the identified lines in `fiftyone/core/clips.py`
sed -n '820,830p;823,833p;827,837p;834,844p;867,877p;873,883p;876,886p;878,888p;894,904p;940,950p;1020,1030p' fiftyone/core/clips.py

Length of output: 3412

Comment on lines +589 to +590
persistent=False,
_generated=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation for the _generated parameter.

It's important for maintainability and clarity to document all parameters, especially those that are not self-explanatory like _generated. This will help future developers understand the purpose and usage of this parameter.

Comment on lines +487 to +488
persistent=False,
_generated=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying boolean comparisons in the function signature.

-    persistent=False,
-    _generated=False,
+    persistent=False,  # Default value already implies boolean, direct use in conditions is cleaner
+    _generated=False,  # Same as above

Committable suggestion was skipped due low confidence.

Comment on lines +662 to +720
@drop_datasets
def test_make_patches_dataset(self):
dataset = fo.Dataset()

sample1 = fo.Sample(
filepath="image1.png",
tags=["sample1"],
ground_truth=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
fo.Detection(label="rabbit"),
fo.Detection(label="squirrel"),
]
),
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
fo.Detection(label="rabbit"),
fo.Detection(label="squirrel"),
]
),
)

sample2 = fo.Sample(
filepath="image2.png",
tags=["sample2"],
ground_truth=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
)

dataset.add_samples([sample1, sample2])

patches_view = dataset.to_patches("ground_truth")
patches_dataset = fop.make_patches_dataset(dataset, "ground_truth")

self.assertNotEqual(
patches_dataset._sample_collection_name,
dataset._sample_collection_name,
)
self.assertIsNone(patches_dataset._frame_collection_name)
self.assertTrue(patches_view._is_generated)
self.assertFalse(patches_dataset._is_generated)
self.assertEqual(
len(patches_dataset), dataset.count("ground_truth.detections")
)
self.assertEqual(len(patches_dataset), len(patches_view))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enhancing the test coverage for make_patches_dataset.

The current test checks basic separation and flags of the dataset and view. Consider adding more assertions to verify the integrity and contents of the patches dataset, such as checking specific fields or the count of items in various categories. Would you like me to help by suggesting additional test cases or implementing them?

Copy link
Contributor

@benjaminpkane benjaminpkane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@brimoor brimoor merged commit 6faa5d7 into develop May 24, 2024
11 checks passed
@brimoor brimoor deleted the bugfix/iss-4397 branch May 24, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fo.core.video.make_frames_dataset is sneakily considered a frame view
2 participants