-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow generated datasets to be directly created #4416
Conversation
WalkthroughThe recent updates to the FiftyOne library introduce new parameters Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (14)
tests/unittests/patches_tests.py (2)
Line range hint
171-171
: Refactor to use more descriptive variable names thanl
.Using single-letter variable names like
l
can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.Also applies to: 475-475
Line range hint
537-537
: Use direct truthiness checks instead of equality toTrue
.- if F("crowd") == True: - if F("crowd") == True: + if F("crowd"): + if F("crowd"):Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.
Also applies to: 579-579
fiftyone/core/patches.py (2)
Line range hint
654-654
: Use direct truthiness checks instead of equality toTrue
.- if other_fields == True: - if other_fields == True: + if other_fields: + if other_fields:Direct truthiness checks are more Pythonic and concise when you need to evaluate the truth value of an expression.
Also applies to: 805-805
Line range hint
285-285
: Refactor to use more descriptive variable names thanl
.Using single-letter variable names like
l
can reduce code readability, especially in complex list comprehensions or lambda functions. Consider using more descriptive names that indicate the purpose or content of the variable.fiftyone/core/video.py (4)
Line range hint
617-617
: Simplify the boolean comparison for clarity.- if sample_frames != True: - l = locals() + if not sample_frames: + local_vars = locals() # Also renamed 'l' to 'local_vars' for clarity
Line range hint
707-707
: Use direct boolean checks for cleaner and more Pythonic code.- if sample_frames == False: + if not sample_frames: - if sample_frames != False: + if sample_frames: - if sample_frames == True: + if sample_frames:Also applies to: 741-741, 748-748, 776-776, 810-810, 813-813, 858-858
Line range hint
789-789
: Useis not None
for None checks to follow Python best practices.- if rel_dir != None: + if rel_dir is not None:
Line range hint
1032-1032
: Refactor to use direct boolean checks for consistency and readability.- if sample_frames == False: + if not sample_frames: - if sample_frames != True: + if not sample_frames:Also applies to: 1037-1037
fiftyone/core/clips.py (4)
Line range hint
119-119
: Replace generic exception handling with specific exceptions.Using a bare
except:
clause can catch unexpected exceptions and obscure programming errors. It's better to catch specific exceptions to avoid hiding bugs and to handle only the relevant errors that you expect might occur in this context.
Line range hint
542-542
: Use direct truthiness testing forother_fields
.Instead of comparing
other_fields
toTrue
, you can use the truthiness of the variable directly. This is more Pythonic and readable.- if other_fields == True: + if other_fields:Also applies to: 723-723
Line range hint
1055-1055
: Useis not None
for None comparison.For clarity and to adhere to Python best practices, use
is not None
instead of!= None
when you intend to check if a variable is not None.
Line range hint
1125-1125
: Clarify the variable namel
to improve readability.The variable name
l
is ambiguous and can be easily confused with the number1
or the letterI
in many fonts. Use a more descriptive name to improve code readability and maintainability.tests/unittests/video_tests.py (2)
Line range hint
1040-1040
: Replace lambda with a function definition for clarity and maintainability.- filepath_fcn = lambda sample: sample.filepath + def filepath_fcn(sample): + return sample.filepath
Line range hint
2689-2689
: Consider renaming the variablel
to a more descriptive name to improve code readability.- for _id, l, i, s in zip( + for _id, label, index, support in zip(Apply this change consistently wherever the variable
l
is used ambiguously.Also applies to: 2954-2954, 3256-3256
Review Details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (7)
- fiftyone/core/clips.py (4 hunks)
- fiftyone/core/dataset.py (1 hunks)
- fiftyone/core/patches.py (6 hunks)
- fiftyone/core/stages.py (5 hunks)
- fiftyone/core/video.py (4 hunks)
- tests/unittests/patches_tests.py (2 hunks)
- tests/unittests/video_tests.py (3 hunks)
Additional Context Used
Ruff (58)
fiftyone/core/clips.py (5)
119-119: Do not use bare
except
542-542: Avoid equality comparisons to
True
; useif other_fields:
for truth checks
723-723: Avoid equality comparisons to
True
; useif other_fields:
for truth checks
1055-1055: Comparison to
None
should becond is not None
1125-1125: Ambiguous variable name:
l
fiftyone/core/dataset.py (13)
40-40:
fiftyone.core.odm.dataset.SampleFieldDocument
imported but unused
236-236: Do not use bare
except
342-342: Do not use bare
except
3325-3325: Ambiguous variable name:
l
3341-3341: Ambiguous variable name:
l
3384-3384: Ambiguous variable name:
l
6733-6733: Avoid equality comparisons to
False
; useif not attach_frames:
for false checks
7112-7112: Do not use bare
except
7122-7122: Do not use bare
except
7282-7282: Do not assign a
lambda
expression, use adef
7282-7282: Ambiguous variable name:
l
7576-7576: Do not use bare
except
8396-8396: Do not assign a
lambda
expression, use adef
fiftyone/core/patches.py (3)
285-285: Ambiguous variable name:
l
654-654: Avoid equality comparisons to
True
; useif other_fields:
for truth checks
805-805: Avoid equality comparisons to
True
; useif other_fields:
for truth checksfiftyone/core/stages.py (17)
946-954: Do not assign a
lambda
expression, use adef
956-970: Do not assign a
lambda
expression, use adef
1540-1540: Comparison to
None
should becond is not None
2504-2504: Do not assign a
lambda
expression, use adef
2509-2509: Do not assign a
lambda
expression, use adef
2568-2568: Comparison to
None
should becond is not None
2570-2570: Comparison to
None
should becond is not None
2577-2577: Comparison to
None
should becond is not None
2588-2588: Comparison to
None
should becond is not None
2752-2752: Comparison to
None
should becond is not None
3920-3920: Comparison to
None
should becond is not None
5178-5178: Comparison to
None
should becond is not None
6620-6620: Comparison to
None
should becond is not None
7270-7270: Do not use bare
except
7282-7282: Do not use bare
except
7290-7290: Do not use bare
except
8284-8284: Do not use bare
except
fiftyone/core/video.py (12)
617-617: Avoid inequality comparisons to
True
; useif not sample_frames:
for false checks
618-618: Ambiguous variable name:
l
707-707: Avoid equality comparisons to
False
; useif not sample_frames:
for false checks
741-741: Avoid inequality comparisons to
False
; useif sample_frames:
for truth checks
748-748: Avoid equality comparisons to
True
; useif sample_frames:
for truth checks
776-776: Avoid equality comparisons to
True
; useif sample_frames:
for truth checks
789-789: Comparison to
None
should becond is not None
810-810: Avoid inequality comparisons to
False
; useif sample_frames:
for truth checks
813-813: Avoid equality comparisons to
True
; useif sample_frames:
for truth checks
858-858: Avoid equality comparisons to
True
; useif sample_frames:
for truth checks
1032-1032: Avoid equality comparisons to
False
; useif not sample_frames:
for false checks
1037-1037: Avoid inequality comparisons to
True
; useif not sample_frames:
for false checkstests/unittests/patches_tests.py (4)
171-171: Ambiguous variable name:
l
475-475: Ambiguous variable name:
l
537-537: Avoid equality comparisons to
True
; useif F("crowd"):
for truth checks
579-579: Avoid equality comparisons to
True
; useif F("crowd"):
for truth checkstests/unittests/video_tests.py (4)
1040-1040: Do not assign a
lambda
expression, use adef
2689-2689: Ambiguous variable name:
l
2954-2954: Ambiguous variable name:
l
3256-3256: Ambiguous variable name:
l
Additional comments not posted (11)
fiftyone/core/patches.py (2)
Line range hint
561-592
: LGTM! The addition ofpersistent
and_generated
parameters enhances dataset management flexibility.The changes are well-documented and correctly implemented according to the PR's objectives.
Line range hint
683-738
: LGTM! The addition ofpersistent
and_generated
parameters enhances dataset management flexibility.The changes are well-documented and correctly implemented according to the PR's objectives.
tests/unittests/video_tests.py (3)
16-16
: Ensure the newly added imports are utilized within the file.Also applies to: 18-18
2203-2257
: Review the implementation oftest_make_frames_dataset
.This test method effectively checks the independence of the frames dataset from the original dataset, which aligns with the PR's objectives. It also verifies that the frames dataset does not have a frame collection, which is expected behavior for a generated dataset.
3286-3354
: Review the implementation oftest_make_clips_dataset
.This test method checks the functionality of creating a clips dataset from events. It validates that the clips dataset is independent from the original dataset in terms of sample and frame collection names, and it confirms that the clips dataset is not marked as generated, which is consistent with the intended changes.
fiftyone/core/stages.py (5)
7926-7930
: The inclusion of_generated=True
inmake_clips_dataset
withtrajectories=True
ensures dataset independence and correct functionality.Verification successful
The inclusion of
_generated=True
andtrajectories=True
inmake_clips_dataset
is properly handled in the codebase. The parameters are correctly integrated infiftyone/core/stages.py
, and the unit tests intests/unittests/video_tests.py
ensure the functionality.
fiftyone/core/stages.py
: The parameters_generated=True
andtrajectories=True
are used in the function calls.tests/unittests/video_tests.py
: The unit testtest_make_clips_dataset
verifies the creation of the clips dataset.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type python $'make_clips_dataset'Length of output: 68
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_clips_dataset` with `trajectories=True`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type py 'make_clips_dataset'Length of output: 1111
Script:
#!/bin/bash # Description: Extract relevant lines to verify the handling of `_generated=True` and `trajectories=True` in `make_clips_dataset`. # Extract lines from `fiftyone/core/stages.py` where `make_clips_dataset` is used. rg --type py --context 5 'make_clips_dataset' fiftyone/core/stages.py # Extract lines from `tests/unittests/video_tests.py` where `make_clips_dataset` is used. rg --type py --context 5 'make_clips_dataset' tests/unittests/video_tests.pyLength of output: 3231
8114-8116
: The addition of_generated=True
tomake_frames_dataset
ensures that frames datasets are treated as independent, aligning with the PR objectives.Verification successful
The addition of
_generated=True
tomake_frames_dataset
is properly integrated infiftyone/core/stages.py
. The test intests/unittests/video_tests.py
and the function definition infiftyone/core/video.py
do not explicitly show_generated=True
, but this does not affect the correctness of the integration.
fiftyone/core/stages.py
:_generated=True
is correctly passed tomake_frames_dataset
.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_frames_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type python $'make_frames_dataset'Length of output: 69
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_frames_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type py 'make_frames_dataset'Length of output: 525
Script:
#!/bin/bash # Description: Extract relevant lines to verify `_generated=True` integration in `make_frames_dataset`. # Extract lines from `tests/unittests/video_tests.py` rg --type py --context 5 'make_frames_dataset' tests/unittests/video_tests.py # Extract lines from `fiftyone/core/video.py` rg --type py --context 5 'make_frames_dataset' fiftyone/core/video.py # Extract lines from `fiftyone/core/stages.py` rg --type py --context 5 'make_frames_dataset' fiftyone/core/stages.pyLength of output: 2038
7482-7485
: The addition of_generated=True
tomake_patches_dataset
aligns with the PR objectives to ensure datasets are treated as independent. Ensure this change integrates well with existing datasets.Verification successful
The addition of
_generated=True
tomake_patches_dataset
aligns with the PR objectives to ensure datasets are treated as independent. The function definition infiftyone/core/patches.py
includes_generated
as a parameter, and its usage infiftyone/core/stages.py
correctly passes_generated=True
. The unit test intests/unittests/patches_tests.py
also verifies the behavior of generated datasets.
fiftyone/core/patches.py
: The function definition includes_generated=False
by default.fiftyone/core/stages.py
: The function is called with_generated=True
.tests/unittests/patches_tests.py
: The test verifies the behavior of the generated dataset.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_patches_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type python $'make_patches_dataset'Length of output: 70
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_patches_dataset`. # Test: Search for the function definition and its usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type py 'def make_patches_dataset' rg --type py 'make_patches_dataset'Length of output: 646
Script:
#!/bin/bash # Description: Extract the definition and usage of `make_patches_dataset` for review. # Extract the function definition from `fiftyone/core/patches.py` rg --type py --context 10 'def make_patches_dataset' fiftyone/core/patches.py # Extract the usage in `tests/unittests/patches_tests.py` rg --type py --context 10 'make_patches_dataset' tests/unittests/patches_tests.py # Extract the references in `fiftyone/core/stages.py` rg --type py --context 10 'make_patches_dataset' fiftyone/core/stages.pyLength of output: 3813
7789-7792
: The addition of_generated=True
tomake_clips_dataset
ensures that clips datasets are treated as independent, aligning with the PR objectives.Verification successful
The addition of
_generated=True
tomake_clips_dataset
is correctly integrated and ensures that clips datasets are treated as independent.
fiftyone/core/clips.py
: The function definition includes_generated=False
by default.fiftyone/core/stages.py
: The function is called with_generated=True
.tests/unittests/video_tests.py
: The function is tested with_generated=True
.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_clips_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type python $'make_clips_dataset'Length of output: 68
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_clips_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type py $'make_clips_dataset'Length of output: 1111
Script:
#!/bin/bash # Description: Extract the definition of `make_clips_dataset` and its usage in relevant files. # Extract the function definition from `fiftyone/core/clips.py` rg --type py --context 10 --max-count 1 'def make_clips_dataset' fiftyone/core/clips.py # Extract the usage in `fiftyone/core/stages.py` rg --type py --context 10 'make_clips_dataset' fiftyone/core/stages.py # Extract the usage in `tests/unittests/video_tests.py` rg --type py --context 10 'make_clips_dataset' tests/unittests/video_tests.pyLength of output: 6043
7629-7632
: The addition of_generated=True
tomake_evaluation_patches_dataset
is consistent with the PR's goal. Verify that this parameter is handled correctly across the codebase.Verification successful
The addition of
_generated=True
tomake_evaluation_patches_dataset
is consistent with the PR's goal. The function definition infiftyone/core/patches.py
includes_generated
as a parameter, and its usage infiftyone/core/stages.py
correctly passes_generated=True
.
fiftyone/core/patches.py
:def make_evaluation_patches_dataset( sample_collection, eval_key, other_fields=None, name=None, persistent=False, _generated=False, ):
fiftyone/core/stages.py
:eval_patches_dataset = fop.make_evaluation_patches_dataset( sample_collection, self._eval_key, _generated=True, **kwargs, )Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type python $'make_evaluation_patches_dataset'Length of output: 81
Script:
#!/bin/bash # Description: Verify the integration of `_generated=True` in `make_evaluation_patches_dataset`. # Test: Search for the function usage in other parts of the codebase. Expect: Proper handling of the new parameter. rg --type py $'make_evaluation_patches_dataset'Length of output: 400
Script:
#!/bin/bash # Description: Extract the definition of `make_evaluation_patches_dataset` and its usages. # Extract the function definition from `fiftyone/core/patches.py` ast-grep --lang python --pattern $'def make_evaluation_patches_dataset($_, $_, $_, **kwargs):\n $$$' fiftyone/core/patches.py # Extract the usages from `fiftyone/core/stages.py` rg --type py 'make_evaluation_patches_dataset' fiftyone/core/stages.pyLength of output: 419
Script:
#!/bin/bash # Description: Extract the definition of `make_evaluation_patches_dataset` and its usage. # Extract the function definition from `fiftyone/core/patches.py` rg --type py --context 10 'def make_evaluation_patches_dataset' fiftyone/core/patches.py # Extract the usage from `fiftyone/core/stages.py` rg --type py --context 10 'make_evaluation_patches_dataset' fiftyone/core/stages.pyLength of output: 2467
fiftyone/core/dataset.py (1)
7809-7809
: Ensure the pipeline adjustment for clips datasets is fully compatible with standalone collection requirements.Verification successful
The pipeline adjustments for clips datasets in
fiftyone/core/clips.py
appear to handle_id
andsample_id
correctly, aligning with the requirement for clips datasets to operate as standalone collections.
fiftyone/core/clips.py
:
- Lines 825, 828, 832, 839, 872, 878, 881, 883, 899, 945, 1025
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the complete pipeline adjustment for clips datasets. # Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`. rg --type python $'pipeline' | grep 'clips'Length of output: 71
Script:
#!/bin/bash # Description: Verify the complete pipeline adjustment for clips datasets. # Test: Search for the usage of `pipeline` in the context of clips datasets. Expect: Proper handling of `_id` and `sample_id`. rg --type py 'pipeline' | grep 'clips'Length of output: 686
Script:
#!/bin/bash # Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review. # Extract lines involving `pipeline` in `fiftyone/core/clips.py` rg --type py 'pipeline' fiftyone/core/clips.py -nLength of output: 491
Script:
#!/bin/bash # Description: Extract relevant lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review. # Extract lines involving `pipeline` in `fiftyone/core/clips.py` sed -n '825p;828p;832p;839p;872p;878p;881p;883p;899p;945p;1025p' fiftyone/core/clips.pyLength of output: 486
Script:
#!/bin/bash # Description: Extract broader context around lines involving `pipeline` in `fiftyone/core/clips.py` for detailed review. # Extract broader context (5 lines before and after) around the identified lines in `fiftyone/core/clips.py` sed -n '820,830p;823,833p;827,837p;834,844p;867,877p;873,883p;876,886p;878,888p;894,904p;940,950p;1020,1030p' fiftyone/core/clips.pyLength of output: 3412
persistent=False, | ||
_generated=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation for the _generated
parameter.
It's important for maintainability and clarity to document all parameters, especially those that are not self-explanatory like _generated
. This will help future developers understand the purpose and usage of this parameter.
persistent=False, | ||
_generated=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simplifying boolean comparisons in the function signature.
- persistent=False,
- _generated=False,
+ persistent=False, # Default value already implies boolean, direct use in conditions is cleaner
+ _generated=False, # Same as above
Committable suggestion was skipped due low confidence.
@drop_datasets | ||
def test_make_patches_dataset(self): | ||
dataset = fo.Dataset() | ||
|
||
sample1 = fo.Sample( | ||
filepath="image1.png", | ||
tags=["sample1"], | ||
ground_truth=fo.Detections( | ||
detections=[ | ||
fo.Detection(label="cat"), | ||
fo.Detection(label="dog"), | ||
fo.Detection(label="rabbit"), | ||
fo.Detection(label="squirrel"), | ||
] | ||
), | ||
predictions=fo.Detections( | ||
detections=[ | ||
fo.Detection(label="cat"), | ||
fo.Detection(label="dog"), | ||
fo.Detection(label="rabbit"), | ||
fo.Detection(label="squirrel"), | ||
] | ||
), | ||
) | ||
|
||
sample2 = fo.Sample( | ||
filepath="image2.png", | ||
tags=["sample2"], | ||
ground_truth=fo.Detections( | ||
detections=[ | ||
fo.Detection(label="cat"), | ||
fo.Detection(label="dog"), | ||
] | ||
), | ||
predictions=fo.Detections( | ||
detections=[ | ||
fo.Detection(label="cat"), | ||
fo.Detection(label="dog"), | ||
] | ||
), | ||
) | ||
|
||
dataset.add_samples([sample1, sample2]) | ||
|
||
patches_view = dataset.to_patches("ground_truth") | ||
patches_dataset = fop.make_patches_dataset(dataset, "ground_truth") | ||
|
||
self.assertNotEqual( | ||
patches_dataset._sample_collection_name, | ||
dataset._sample_collection_name, | ||
) | ||
self.assertIsNone(patches_dataset._frame_collection_name) | ||
self.assertTrue(patches_view._is_generated) | ||
self.assertFalse(patches_dataset._is_generated) | ||
self.assertEqual( | ||
len(patches_dataset), dataset.count("ground_truth.detections") | ||
) | ||
self.assertEqual(len(patches_dataset), len(patches_view)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider enhancing the test coverage for make_patches_dataset
.
The current test checks basic separation and flags of the dataset and view. Consider adding more assertions to verify the integrity and contents of the patches dataset, such as checking specific fields or the count of items in various categories. Would you like me to help by suggesting additional test cases or implementing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Resolves #4397
The
make_{patches|frames|clips}_dataset()
methods were originally designed to be called internally by theto_{patches|frames|clips}()
methods but not called directly. For example, as observed in #4397, they currently return a dataset that behaves like a view in certain ways. Also, directly callingmake_clips_dataset()
is especially pernicious, as it currently returns a dataset that silently reuses the same frames collection as the input dataset.However, there is a good use case for directly calling
make_{patches|frames|clips}_dataset()
to convert a collection into an independent dataset.So, this PR maintains the behavior of
to_{patches|frames|clips}()
while tweaking the default behavior of themake_{patches|frames|clips}_dataset()
methods so that they do return a completely normal dataset.Example usage
Summary by CodeRabbit
New Features
Bug Fixes
Tests