Allow generated datasets to be directly created #4416

brimoor · 2024-05-23T14:43:43Z

Resolves #4397

The make_{patches|frames|clips}_dataset() methods were originally designed to be called internally by the to_{patches|frames|clips}() methods but not called directly. For example, as observed in #4397, they currently return a dataset that behaves like a view in certain ways. Also, directly calling make_clips_dataset() is especially pernicious, as it currently returns a dataset that silently reuses the same frames collection as the input dataset.

However, there is a good use case for directly calling make_{patches|frames|clips}_dataset() to convert a collection into an independent dataset.

So, this PR maintains the behavior of to_{patches|frames|clips}() while tweaking the default behavior of the make_{patches|frames|clips}_dataset() methods so that they do return a completely normal dataset.

Example usage

import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.core.patches as fop
import fiftyone.core.clips as foc
import fiftyone.core.video as fov
from fiftyone import ViewField as F

#
# Patches dataset
#

dataset = foz.load_zoo_dataset("quickstart")

patches_view = dataset.to_patches("ground_truth")
patches_dataset = fop.make_patches_dataset(dataset, "ground_truth")

assert patches_view._is_generated == True
assert patches_dataset._is_generated == False
assert patches_dataset._sample_collection_name != dataset._sample_collection_name
assert patches_dataset._frame_collection_name is None
assert patches_dataset.count() == patches_view.count()
assert patches_dataset.count() == dataset.count("ground_truth.detections")

#
# Frames dataset
#

dataset = foz.load_zoo_dataset("quickstart-video")

frames_view = dataset.to_frames(sample_frames=True)
frames_dataset = fov.make_frames_dataset(dataset, sample_frames=True)

assert frames_view._is_generated == True
assert frames_dataset._is_generated == False
assert frames_dataset._sample_collection_name != dataset._sample_collection_name
assert frames_dataset._frame_collection_name is None
assert frames_dataset.count() == frames_view.count()
assert frames_dataset.count() == dataset.count("frames")

#
# Clips dataset
#

dataset = foz.load_zoo_dataset("quickstart-video")

expr = F("detections.detections").length() > 10
clips_view = dataset.to_clips(expr)
clips_dataset = foc.make_clips_dataset(dataset, expr)

assert clips_view._is_generated == True
assert clips_dataset._is_generated == False
assert clips_dataset._sample_collection_name != dataset._sample_collection_name
assert clips_dataset._frame_collection_name != dataset._frame_collection_name
assert clips_dataset.count() == clips_view.count()
assert clips_dataset.count("frames") == clips_view.count("frames")

Summary by CodeRabbit

New Features
- Enhanced dataset management with new parameters to control dataset persistence and generation behavior.
Bug Fixes
- Corrected comments related to frame and sample ID associations in clips datasets.
Tests
- Added new unit tests for patches, frames, and clips datasets to ensure proper creation and properties validation.

coderabbitai · 2024-05-23T14:43:55Z

Walkthrough

The recent updates to the FiftyOne library introduce new parameters persistent and _generated to several dataset creation functions. These parameters enhance dataset management by controlling persistence and generation behavior. Specifically, they allow for creating either temporary or persistent datasets based on the _generated flag. This change affects functions in various modules, including clips.py, dataset.py, patches.py, stages.py, and video.py. Additionally, new tests ensure the correct functionality of these updates.

Changes

File(s)	Change Summary
`fiftyone/core/clips.py`	Added `persistent` and `_generated` parameters to `make_clips_dataset`; modified `name` parameter to `_name`; adjusted dataset creation logic.
`fiftyone/core/dataset.py`	Corrected a comment related to clips datasets.
`fiftyone/core/patches.py`	Added `persistent` and `_generated` parameters to `make_patches_dataset` and `make_evaluation_patches_dataset`.
`fiftyone/core/stages.py`	Added `_generated` parameter to several functions, reordered arguments.
`fiftyone/core/video.py`	Added `persistent` and `_generated` parameters to `make_frames_dataset`; updated function documentation.
`tests/unittests/patches_tests.py`	Added `test_make_patches_dataset` method to validate patch dataset creation.
`tests/unittests/video_tests.py`	Added `test_make_frames_dataset` and `test_make_clips_dataset` methods to validate frames and clips dataset creation.

Assessment against linked issues

Objective	Addressed	Explanation
Ensure `make_frames_dataset` creates a `Dataset` not treated as `FramesView` (#4397)	❓	The changes include adding parameters to control dataset behavior, but it's unclear if `_is_frames` issue is resolved.
Validate new parameters in dataset creation functions	✅
Add and verify new test methods for dataset creation	✅

In fields of code where datasets grow,
New flags now guide where data flows.
Persistent paths or temp's embrace,
Each line of code finds its place.
With tests to guard and bugs to chase,
Our codebase shines with added grace.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

Outside diff range and nitpick comments (14)

tests/unittests/patches_tests.py (2)
Line range hint 171-171: Refactor to use more descriptive variable names than l.

Using single-letter variable names like l can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.

Also applies to: 475-475

Line range hint 537-537: Use direct truthiness checks instead of equality to True.
- if F("crowd") == True:
- if F("crowd") == True:
+ if F("crowd"):
+ if F("crowd"):
Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.

Also applies to: 579-579
fiftyone/core/patches.py (2)
Line range hint 654-654: Use direct truthiness checks instead of equality to True.
- if other_fields == True:
- if other_fields == True:
+ if other_fields:
+ if other_fields:
Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.

Also applies to: 805-805

Line range hint 285-285: Refactor to use more descriptive variable names than l.

Using single-letter variable names like l can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.
fiftyone/core/video.py (4)
Line range hint 617-617: Simplify the boolean comparison for clarity.
-    if sample_frames != True:
-        l = locals()
+    if not sample_frames:
+        local_vars = locals()  # Also renamed 'l' to 'local_vars' for clarity
Line range hint 707-707: Use direct boolean checks for cleaner and more Pythonic code.
-    if sample_frames == False:
+    if not sample_frames:

-    if sample_frames != False:
+    if sample_frames:

-    if sample_frames == True:
+    if sample_frames:
Also applies to: 741-741, 748-748, 776-776, 810-810, 813-813, 858-858

Line range hint 789-789: Use is not None for None checks to follow Python best practices.
-    if rel_dir != None:
+    if rel_dir is not None:
Line range hint 1032-1032: Refactor to use direct boolean checks for consistency and readability.
-    if sample_frames == False:
+    if not sample_frames:

-    if sample_frames != True:
+    if not sample_frames:
Also applies to: 1037-1037
fiftyone/core/clips.py (4)
Line range hint 119-119: Replace generic exception handling with specific exceptions.

Using a bare except: clause can catch unexpected exceptions and obscure programming errors. It's better to catch specific exceptions to avoid hiding bugs and to handle only the relevant errors that you expect might occur in this context.

Line range hint 542-542: Use direct truthiness testing for other_fields.

Instead of comparing other_fields to True, you can use the truthiness of the variable directly. This is more Pythonic and readable.
- if other_fields == True:
+ if other_fields:
Also applies to: 723-723

Line range hint 1055-1055: Use is not None for None comparison.

For clarity and to adhere to Python best practices, use is not None instead of != None when you intend to check if a variable is not None.

Line range hint 1125-1125: Clarify the variable name l to improve readability.

The variable name l is ambiguous and can be easily confused with the number 1 or the letter I in many fonts. Use a more descriptive name to improve code readability and maintainability.
tests/unittests/video_tests.py (2)
Line range hint 1040-1040: Replace lambda with a function definition for clarity and maintainability.
- filepath_fcn = lambda sample: sample.filepath
+ def filepath_fcn(sample):
+     return sample.filepath
Line range hint 2689-2689: Consider renaming the variable l to a more descriptive name to improve code readability.
- for _id, l, i, s in zip(
+ for _id, label, index, support in zip(
Apply this change consistently wherever the variable l is used ambiguously.

Also applies to: 2954-2954, 3256-3256

Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 36d4be2 and 48a5eaa.

Files selected for processing (7)

fiftyone/core/clips.py (4 hunks)
fiftyone/core/dataset.py (1 hunks)
fiftyone/core/patches.py (6 hunks)
fiftyone/core/stages.py (5 hunks)
fiftyone/core/video.py (4 hunks)
tests/unittests/patches_tests.py (2 hunks)
tests/unittests/video_tests.py (3 hunks)

Additional Context Used

Ruff (58)

fiftyone/core/clips.py (5)

119-119: Do not use bare except

542-542: Avoid equality comparisons to True; use if other_fields: for truth checks

723-723: Avoid equality comparisons to True; use if other_fields: for truth checks

1055-1055: Comparison to None should be cond is not None

1125-1125: Ambiguous variable name: l

fiftyone/core/dataset.py (13)

40-40: fiftyone.core.odm.dataset.SampleFieldDocument imported but unused

236-236: Do not use bare except

342-342: Do not use bare except

3325-3325: Ambiguous variable name: l

3341-3341: Ambiguous variable name: l

3384-3384: Ambiguous variable name: l

6733-6733: Avoid equality comparisons to False; use if not attach_frames: for false checks

7112-7112: Do not use bare except

7122-7122: Do not use bare except

7282-7282: Do not assign a lambda expression, use a def

7282-7282: Ambiguous variable name: l

7576-7576: Do not use bare except

8396-8396: Do not assign a lambda expression, use a def

fiftyone/core/patches.py (3)

285-285: Ambiguous variable name: l

654-654: Avoid equality comparisons to True; use if other_fields: for truth checks

805-805: Avoid equality comparisons to True; use if other_fields: for truth checks

fiftyone/core/stages.py (17)

946-954: Do not assign a lambda expression, use a def

956-970: Do not assign a lambda expression, use a def

1540-1540: Comparison to None should be cond is not None

2504-2504: Do not assign a lambda expression, use a def

2509-2509: Do not assign a lambda expression, use a def

2568-2568: Comparison to None should be cond is not None

2570-2570: Comparison to None should be cond is not None

2577-2577: Comparison to None should be cond is not None

2588-2588: Comparison to None should be cond is not None

2752-2752: Comparison to None should be cond is not None

3920-3920: Comparison to None should be cond is not None

5178-5178: Comparison to None should be cond is not None

6620-6620: Comparison to None should be cond is not None

7270-7270: Do not use bare except

7282-7282: Do not use bare except

7290-7290: Do not use bare except

8284-8284: Do not use bare except

fiftyone/core/video.py (12)

617-617: Avoid inequality comparisons to True; use if not sample_frames: for false checks

618-618: Ambiguous variable name: l

707-707: Avoid equality comparisons to False; use if not sample_frames: for false checks

741-741: Avoid inequality comparisons to False; use if sample_frames: for truth checks

748-748: Avoid equality comparisons to True; use if sample_frames: for truth checks

776-776: Avoid equality comparisons to True; use if sample_frames: for truth checks

789-789: Comparison to None should be cond is not None

810-810: Avoid inequality comparisons to False; use if sample_frames: for truth checks

813-813: Avoid equality comparisons to True; use if sample_frames: for truth checks

858-858: Avoid equality comparisons to True; use if sample_frames: for truth checks

1032-1032: Avoid equality comparisons to False; use if not sample_frames: for false checks

1037-1037: Avoid inequality comparisons to True; use if not sample_frames: for false checks

tests/unittests/patches_tests.py (4)

171-171: Ambiguous variable name: l

475-475: Ambiguous variable name: l

537-537: Avoid equality comparisons to True; use if F("crowd"): for truth checks

579-579: Avoid equality comparisons to True; use if F("crowd"): for truth checks

tests/unittests/video_tests.py (4)

1040-1040: Do not assign a lambda expression, use a def

2689-2689: Ambiguous variable name: l

2954-2954: Ambiguous variable name: l

3256-3256: Ambiguous variable name: l

Additional comments not posted (11)

fiftyone/core/patches.py (2)

Line range hint 561-592: LGTM! The addition of persistent and _generated parameters enhances dataset management flexibility.

The changes are well-documented and correctly implemented according to the PR's objectives.

Line range hint 683-738: LGTM! The addition of persistent and _generated parameters enhances dataset management flexibility.

The changes are well-documented and correctly implemented according to the PR's objectives.

tests/unittests/video_tests.py (3)

16-16: Ensure the newly added imports are utilized within the file.

Also applies to: 18-18

2203-2257: Review the implementation of test_make_frames_dataset.

This test method effectively checks the independence of the frames dataset from the original dataset, which aligns with the PR's objectives. It also verifies that the frames dataset does not have a frame collection, which is expected behavior for a generated dataset.

3286-3354: Review the implementation of test_make_clips_dataset.

This test method checks the functionality of creating a clips dataset from events. It validates that the clips dataset is independent from the original dataset in terms of sample and frame collection names, and it confirms that the clips dataset is not marked as generated, which is consistent with the intended changes.
fiftyone/core/stages.py (5)
7926-7930: The inclusion of _generated=True in make_clips_dataset with trajectories=True ensures dataset independence and correct functionality.

Verification successful

The inclusion of _generated=True and trajectories=True in make_clips_dataset is properly handled in the codebase. The parameters are correctly integrated in fiftyone/core/stages.py, and the unit tests in tests/unittests/video_tests.py ensure the functionality.

fiftyone/core/stages.py: The parameters _generated=True and trajectories=True are used in the function calls.

tests/unittests/video_tests.py: The unit test test_make_clips_dataset verifies the creation of the clips dataset.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_clips_dataset'
Length of output: 68

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'make_clips_dataset'
Length of output: 1111

Script:
#!/bin/bash
# Description: Extract relevant lines to verify the handling of `_generated=True` and `trajectories=True` in `make_clips_dataset`.

# Extract lines from `fiftyone/core/stages.py` where `make_clips_dataset` is used.
rg --type py --context 5 'make_clips_dataset' fiftyone/core/stages.py

# Extract lines from `tests/unittests/video_tests.py` where `make_clips_dataset` is used.
rg --type py --context 5 'make_clips_dataset' tests/unittests/video_tests.py
Length of output: 3231
8114-8116: The addition of _generated=True to make_frames_dataset ensures that frames datasets are treated as independent, aligning with the PR objectives.

Verification successful

The addition of _generated=True to make_frames_dataset is properly integrated in fiftyone/core/stages.py. The test in tests/unittests/video_tests.py and the function definition in fiftyone/core/video.py do not explicitly show _generated=True, but this does not affect the correctness of the integration.

fiftyone/core/stages.py: _generated=True is correctly passed to make_frames_dataset.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_frames_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_frames_dataset'
Length of output: 69

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_frames_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'make_frames_dataset'
Length of output: 525

Script:
#!/bin/bash
# Description: Extract relevant lines to verify `_generated=True` integration in `make_frames_dataset`.

# Extract lines from `tests/unittests/video_tests.py`
rg --type py --context 5 'make_frames_dataset' tests/unittests/video_tests.py

# Extract lines from `fiftyone/core/video.py`
rg --type py --context 5 'make_frames_dataset' fiftyone/core/video.py

# Extract lines from `fiftyone/core/stages.py`
rg --type py --context 5 'make_frames_dataset' fiftyone/core/stages.py
Length of output: 2038
7482-7485: The addition of _generated=True to make_patches_dataset aligns with the PR objectives to ensure datasets are treated as independent. Ensure this change integrates well with existing datasets.

Verification successful

The addition of _generated=True to make_patches_dataset aligns with the PR objectives to ensure datasets are treated as independent. The function definition in fiftyone/core/patches.py includes _generated as a parameter, and its usage in fiftyone/core/stages.py correctly passes _generated=True. The unit test in tests/unittests/patches_tests.py also verifies the behavior of generated datasets.

fiftyone/core/patches.py: The function definition includes _generated=False by default.

fiftyone/core/stages.py: The function is called with _generated=True.

tests/unittests/patches_tests.py: The test verifies the behavior of the generated dataset.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_patches_dataset'
Length of output: 70

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_patches_dataset`.

# Test: Search for the function definition and its usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py 'def make_patches_dataset'
rg --type py 'make_patches_dataset'
Length of output: 646

Script:
#!/bin/bash
# Description: Extract the definition and usage of `make_patches_dataset` for review.

# Extract the function definition from `fiftyone/core/patches.py`
rg --type py --context 10 'def make_patches_dataset' fiftyone/core/patches.py

# Extract the usage in `tests/unittests/patches_tests.py`
rg --type py --context 10 'make_patches_dataset' tests/unittests/patches_tests.py

# Extract the references in `fiftyone/core/stages.py`
rg --type py --context 10 'make_patches_dataset' fiftyone/core/stages.py
Length of output: 3813
7789-7792: The addition of _generated=True to make_clips_dataset ensures that clips datasets are treated as independent, aligning with the PR objectives.

Verification successful

The addition of _generated=True to make_clips_dataset is correctly integrated and ensures that clips datasets are treated as independent.

fiftyone/core/clips.py: The function definition includes _generated=False by default.

fiftyone/core/stages.py: The function is called with _generated=True.

tests/unittests/video_tests.py: The function is tested with _generated=True.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_clips_dataset'
Length of output: 68

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_clips_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py $'make_clips_dataset'
Length of output: 1111

Script:
#!/bin/bash
# Description: Extract the definition of `make_clips_dataset` and its usage in relevant files.

# Extract the function definition from `fiftyone/core/clips.py`
rg --type py --context 10 --max-count 1 'def make_clips_dataset' fiftyone/core/clips.py

# Extract the usage in `fiftyone/core/stages.py`
rg --type py --context 10 'make_clips_dataset' fiftyone/core/stages.py

# Extract the usage in `tests/unittests/video_tests.py`
rg --type py --context 10 'make_clips_dataset' tests/unittests/video_tests.py
Length of output: 6043
7629-7632: The addition of _generated=True to make_evaluation_patches_dataset is consistent with the PR's goal. Verify that this parameter is handled correctly across the codebase.
Verification successful

The addition of _generated=True to make_evaluation_patches_dataset is consistent with the PR's goal. The function definition in fiftyone/core/patches.py includes _generated as a parameter, and its usage in fiftyone/core/stages.py correctly passes _generated=True.
fiftyone/core/patches.py:
def make_evaluation_patches_dataset(
    sample_collection,
    eval_key,
    other_fields=None,
    name=None,
    persistent=False,
    _generated=False,
):
fiftyone/core/stages.py:
eval_patches_dataset = fop.make_evaluation_patches_dataset(
    sample_collection,
    self._eval_key,
    _generated=True,
    **kwargs,
)
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type python $'make_evaluation_patches_dataset'
Length of output: 81

Script:
#!/bin/bash
# Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`.

# Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter.
rg --type py $'make_evaluation_patches_dataset'
Length of output: 400

Script:
#!/bin/bash
# Description: Extract the definition of `make_evaluation_patches_dataset` and its usages.

# Extract the function definition from `fiftyone/core/patches.py`
ast-grep --lang python --pattern $'def make_evaluation_patches_dataset($_, $_, $_, **kwargs):\n    $$$' fiftyone/core/patches.py

# Extract the usages from `fiftyone/core/stages.py`
rg --type py 'make_evaluation_patches_dataset' fiftyone/core/stages.py
Length of output: 419

Script:
#!/bin/bash
# Description: Extract the definition of `make_evaluation_patches_dataset` and its usage.

# Extract the function definition from `fiftyone/core/patches.py`
rg --type py --context 10 'def make_evaluation_patches_dataset' fiftyone/core/patches.py

# Extract the usage from `fiftyone/core/stages.py`
rg --type py --context 10 'make_evaluation_patches_dataset' fiftyone/core/stages.py
Length of output: 2467
fiftyone/core/dataset.py (1)
7809-7809: Ensure the pipeline adjustment for clips datasets is fully compatible with standalone collection requirements.

Verification successful

The pipeline adjustments for clips datasets in fiftyone/core/clips.py appear to handle _id and sample_id correctly, aligning with the requirement for clips datasets to operate as standalone collections.

fiftyone/core/clips.py:

Lines 825, 828, 832, 839, 872, 878, 881, 883, 899, 945, 1025
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the complete pipeline adjustment for clips datasets.

# Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`.
rg --type python $'pipeline' | grep 'clips'
Length of output: 71

Script:
#!/bin/bash
# Description: Verify the complete pipeline adjustment for clips datasets.

# Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`.
rg --type py 'pipeline' | grep 'clips'
Length of output: 686

Script:
#!/bin/bash
# Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract lines involving `pipeline` in `fiftyone/core/clips.py`
rg --type py 'pipeline' fiftyone/core/clips.py -n
Length of output: 491

Script:
#!/bin/bash
# Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract lines involving `pipeline` in `fiftyone/core/clips.py`
sed -n '825p;828p;832p;839p;872p;878p;881p;883p;899p;945p;1025p' fiftyone/core/clips.py
Length of output: 486

Script:
#!/bin/bash
# Description: Extract broader context around lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review.

# Extract broader context (5 lines before and after) around the identified lines in `fiftyone/core/clips.py`
sed -n '820,830p;823,833p;827,837p;834,844p;867,877p;873,883p;876,886p;878,888p;894,904p;940,950p;1020,1030p' fiftyone/core/clips.py
Length of output: 3412

coderabbitai · 2024-05-23T14:52:40Z

fiftyone/core/clips.py

+    persistent=False,
+    _generated=False,


Add documentation for the _generated parameter.

It's important for maintainability and clarity to document all parameters, especially those that are not self-explanatory like _generated. This will help future developers understand the purpose and usage of this parameter.

coderabbitai · 2024-05-23T14:52:40Z

fiftyone/core/video.py

+    persistent=False,
+    _generated=False,


Consider simplifying boolean comparisons in the function signature.

- persistent=False, - _generated=False, + persistent=False, # Default value already implies boolean, direct use in conditions is cleaner + _generated=False, # Same as above

Committable suggestion was skipped due low confidence.

coderabbitai · 2024-05-23T14:52:40Z

tests/unittests/patches_tests.py

+    @drop_datasets
+    def test_make_patches_dataset(self):
+        dataset = fo.Dataset()
+
+        sample1 = fo.Sample(
+            filepath="image1.png",
+            tags=["sample1"],
+            ground_truth=fo.Detections(
+                detections=[
+                    fo.Detection(label="cat"),
+                    fo.Detection(label="dog"),
+                    fo.Detection(label="rabbit"),
+                    fo.Detection(label="squirrel"),
+                ]
+            ),
+            predictions=fo.Detections(
+                detections=[
+                    fo.Detection(label="cat"),
+                    fo.Detection(label="dog"),
+                    fo.Detection(label="rabbit"),
+                    fo.Detection(label="squirrel"),
+                ]
+            ),
+        )
+
+        sample2 = fo.Sample(
+            filepath="image2.png",
+            tags=["sample2"],
+            ground_truth=fo.Detections(
+                detections=[
+                    fo.Detection(label="cat"),
+                    fo.Detection(label="dog"),
+                ]
+            ),
+            predictions=fo.Detections(
+                detections=[
+                    fo.Detection(label="cat"),
+                    fo.Detection(label="dog"),
+                ]
+            ),
+        )
+
+        dataset.add_samples([sample1, sample2])
+
+        patches_view = dataset.to_patches("ground_truth")
+        patches_dataset = fop.make_patches_dataset(dataset, "ground_truth")
+
+        self.assertNotEqual(
+            patches_dataset._sample_collection_name,
+            dataset._sample_collection_name,
+        )
+        self.assertIsNone(patches_dataset._frame_collection_name)
+        self.assertTrue(patches_view._is_generated)
+        self.assertFalse(patches_dataset._is_generated)
+        self.assertEqual(
+            len(patches_dataset), dataset.count("ground_truth.detections")
+        )
+        self.assertEqual(len(patches_dataset), len(patches_view))
+


Consider enhancing the test coverage for make_patches_dataset.

The current test checks basic separation and flags of the dataset and view. Consider adding more assertions to verify the integrity and contents of the patches dataset, such as checking specific fields or the count of items in various categories. Would you like me to help by suggesting additional test cases or implementing them?

benjaminpkane

🚀

allow generated datasets to be directly created

48a5eaa

brimoor requested review from benjaminpkane and a team May 23, 2024 14:43

brimoor mentioned this pull request May 23, 2024

fo.core.video.make_frames_dataset is sneakily considered a frame view #4397

Closed

3 tasks

coderabbitai bot reviewed May 23, 2024

View reviewed changes

benjaminpkane approved these changes May 23, 2024

View reviewed changes

brimoor merged commit 6faa5d7 into develop May 24, 2024
11 checks passed

brimoor deleted the bugfix/iss-4397 branch May 24, 2024 14:42

brimoor mentioned this pull request Jun 5, 2024

Cherry picking PRs into v0.24.1 #4455

Merged

coderabbitai bot mentioned this pull request Oct 4, 2024

Remove potential for misspelling expected Dataset fields #4885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow generated datasets to be directly created #4416

Allow generated datasets to be directly created #4416

brimoor commented May 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 23, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot May 23, 2024

coderabbitai bot May 23, 2024

coderabbitai bot May 23, 2024

benjaminpkane left a comment

Allow generated datasets to be directly created #4416

Allow generated datasets to be directly created #4416

Conversation

brimoor commented May 23, 2024 • edited by coderabbitai bot Loading

Example usage

Summary by CodeRabbit

coderabbitai bot commented May 23, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot May 23, 2024

Choose a reason for hiding this comment

coderabbitai bot May 23, 2024

Choose a reason for hiding this comment

coderabbitai bot May 23, 2024

Choose a reason for hiding this comment

benjaminpkane left a comment

Choose a reason for hiding this comment

brimoor commented May 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 23, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)