New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export/model inference cleanup (for release) #4243
Conversation
WalkthroughThe recent updates enhance FiftyOne's media handling and batching operations, introducing more flexible strategies for batch processing and media file management. Key improvements include support for specific group slices in media downloading and clearing, configurable batch sizes and strategies for saving operations, and efficiency boosts in file handling tasks. These changes aim to streamline workflows, improve performance, and offer users greater control over data processing and management. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Review Status
Configuration used: CodeRabbit UI
Files selected for processing (26)
- docs/source/teams/cloud_media.rst (6 hunks)
- docs/source/user_guide/config.rst (1 hunks)
- fiftyone/core/collections.py (12 hunks)
- fiftyone/core/dataset.py (8 hunks)
- fiftyone/core/frame.py (3 hunks)
- fiftyone/core/models.py (37 hunks)
- fiftyone/core/stages.py (1 hunks)
- fiftyone/core/storage.py (8 hunks)
- fiftyone/core/utils.py (2 hunks)
- fiftyone/core/view.py (8 hunks)
- fiftyone/utils/annotations.py (4 hunks)
- fiftyone/utils/cvat.py (2 hunks)
- fiftyone/utils/data/exporters.py (2 hunks)
- fiftyone/utils/data/parsers.py (9 hunks)
- fiftyone/utils/eval/detection.py (3 hunks)
- fiftyone/utils/eval/segmentation.py (4 hunks)
- fiftyone/utils/geojson.py (1 hunks)
- fiftyone/utils/image.py (2 hunks)
- fiftyone/utils/iou.py (2 hunks)
- fiftyone/utils/labelbox.py (6 hunks)
- fiftyone/utils/labelstudio.py (2 hunks)
- fiftyone/utils/patches.py (2 hunks)
- fiftyone/utils/scale.py (4 hunks)
- fiftyone/utils/utils3d.py (4 hunks)
- fiftyone/utils/video.py (3 hunks)
- tests/unittests/model_tests.py (1 hunks)
Files not reviewed due to errors (1)
- (no review received)
Files skipped from review due to trivial changes (2)
- fiftyone/core/frame.py
- fiftyone/core/stages.py
Additional comments not posted (76)
fiftyone/utils/patches.py (1)
55-55
: Adding validation for image collections in the__init__
method enhances the robustness of theImagePatchesExtractor
class by ensuring that thesamples
argument is a valid image collection.fiftyone/utils/image.py (1)
350-354
: Leveraging FiftyOne's autosave feature in_transform_images_single
by usingview.iter_samples(autosave=save)
simplifies the code and potentially optimizes the saving process by removing the need for explicitsample.save()
calls.fiftyone/utils/eval/segmentation.py (1)
356-358
: Refactoring the logic for saving evaluation results based on thesave
flag in theevaluate_samples
function improves code readability and maintainability by clearly indicating when results should be saved.fiftyone/utils/eval/detection.py (5)
177-177
: Introducing asave
variable improves readability.
179-179
: Ensuresave
controls all relevant saving operations.Verification successful
The provided context and analysis suggest that the
save
variable is used to control saving operations within the function, consistent with the intent of ensuringsave
controls all relevant saving operations. Based on the evidence, it appears that the original review comment is aligned with the codebase's implementation.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that `save` variable is consistently used to control saving operations throughout the function. grep -n 'eval_key is not None' fiftyone/utils/eval/detection.pyLength of output: 101
Script:
#!/bin/bash # Search for all occurrences of `save` and any direct saving operations within the function's scope. grep -n -A 20 -B 20 'save = eval_key is not None' fiftyone/utils/eval/detection.pyLength of output: 1332
191-191
: Usingautosave=save
initer_samples
call aligns with the new saving logic.
208-208
: Correctly applyingsave
to conditionally update frame-level fields.
213-213
: Applyingsave
to conditionally update sample-level fields is consistent with the intended logic.fiftyone/utils/iou.py (2)
Line range hint
1-1
: Removingpylint: disable=no-member
directives suggests improved code quality or refactoring.
Line range hint
1-1
: Verify no new linting issues introduced by the removal ofpylint: disable=no-member
.fiftyone/core/storage.py (6)
194-210
: Consider using a thread pool for opening multiple files concurrently.The current implementation of
open_files
sequentially opens each file, which might not be the most efficient approach, especially when dealing with a large number of files. Consider utilizing a thread pool to open multiple files concurrently, which could improve performance significantly.
226-242
: Ensure proper error handling inread_files
whenskip_failures
is False.The
read_files
function is designed to read multiple files into memory, with an option to skip failures. However, it's crucial to ensure that whenskip_failures
is set to False, the function properly handles errors and provides meaningful feedback to the caller. Consider adding tests or checks to verify the behavior in both scenarios (with and without failures).
786-786
: Optimize themove_files
function for performance.The
move_files
function currently relies on_run
to execute_do_move_file
tasks. While this is functional, there might be opportunities to optimize performance, especially for large batches of files. Consider exploring more efficient ways to move files, such as batch operations or parallel processing techniques, to enhance the function's performance.
841-841
: Review the error handling strategy indelete_files
.The
delete_files
function includes a mechanism to skip failures based on theskip_failures
flag. It's important to ensure that this error handling strategy is consistent and effective across different scenarios, including permissions issues, non-existent files, and other potential errors. Consider reviewing and testing the error handling to confirm its robustness.
956-967
: Validate the mode parameter in_do_open_file
.The
_do_open_file
function opens a file with the specified mode. To prevent potential errors or security issues, consider validating themode
parameter to ensure it's one of the expected values (e.g., "r", "rb", "w", "wb"). This validation can help avoid unexpected behavior or vulnerabilities related to file handling.
973-984
: Consider adding a size limit for files in_do_read_file
.The
_do_read_file
function reads the entire content of a file into memory, which could lead to memory issues for very large files. Consider implementing a size limit or a warning mechanism to alert the user when attempting to read a file that exceeds a certain size threshold. This can help prevent potential out-of-memory errors and improve the function's usability.docs/source/teams/cloud_media.rst (4)
403-411
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [390-407]
Clarify the
group_slices
parameter info.Dataset.download_media
.The addition of the
group_slices
parameter in thefo.Dataset.download_media
method is a valuable feature. However, the documentation could benefit from a more detailed explanation or examples of how to use this parameter, especially for users unfamiliar with the concept of group slices. Consider adding a brief example or a more detailed description to help users understand how to leverage this new functionality effectively.
423-447
: Document the impact ofbatch_size
andtarget_size_bytes
on download performance.The documentation for
fo.Dataset.download_context
introducesbatch_size
andtarget_size_bytes
parameters for configuring the download strategy. It would be beneficial to include information about how these parameters impact download performance and efficiency. Providing guidance or best practices on choosing values for these parameters could help users optimize their download processes.
490-496
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [493-503]
Expand on the output of
fo.Dataset.cache_stats
withgroup_slices
.The documentation for
fo.Dataset.cache_stats
now includes agroup_slices
parameter. To enhance clarity, consider providing an example of the output of this method whengroup_slices
is used. This could help users understand what additional information is provided by specifying group slices and how it might be useful for their workflows.
500-515
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [512-522]
Emphasize the importance of
group_slices
info.Dataset.clear_media
.The addition of the
group_slices
parameter tofo.Dataset.clear_media
is an important update. To ensure users fully understand its significance, consider emphasizing howgroup_slices
can be used to selectively clear media from the cache. This could involve providing a use case or scenario where clearing media for specific group slices is particularly beneficial.fiftyone/utils/geojson.py (1)
145-158
: Consider adding explicit error handling or logging within the try block ofload_location_data
.While the use of a try-finally block ensures that the saving operation is attempted even if an error occurs, it might be beneficial to log errors or handle specific exceptions that could occur during the loading of location data. This would improve the robustness of the function and provide more insight into any issues that arise.
fiftyone/utils/utils3d.py (1)
511-518
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [449-515]
Clarify the usage and expected type of the
skip_failures
parameter incompute_orthographic_projection_images
.The
skip_failures
parameter is documented as a boolean, but its usage suggests it might accept string values (e.g.,"ignore"
). This could lead to confusion about the expected type and values for this parameter. Consider clarifying its usage in the documentation or adjusting the implementation to ensure consistency with the documented type. Ensuring clear and consistent documentation and implementation for parameters is crucial for maintainability and usability.fiftyone/utils/video.py (3)
750-750
: Consider adding a comment explaining thesave
variable logic for clarity.While the logic for determining when to autosave samples is clear from the code, adding a brief comment explaining the conditions under which autosaving is enabled (i.e., when
save_filepaths
andsample_frames
are both true, or whenupdate_filepaths
is true) could improve code readability and maintainability.
765-767
: Optimization of saving logic usingautosave=save
is a significant improvement.The use of
autosave=save
in theview.iter_samples
call is a well-thought-out change that optimizes the saving logic by potentially reducing the number of database writes. This change aligns with the PR objectives of enhancing efficiency and maintainability.
762-770
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [750-767]
Ensure comprehensive testing of the updated saving logic.
Given the complexity of the
_transform_videos
function and the introduction of conditional autosaving, it's crucial to ensure that this logic is thoroughly tested across various scenarios. This includes testing with different combinations ofsave_filepaths
,sample_frames
, andupdate_filepaths
to verify that samples are saved correctly and efficiently in all cases.fiftyone/utils/scale.py (4)
23-23
: The addition ofimport fiftyone.core.collections as foc
supports the newSaveContext
object. This change is necessary and correctly implemented.
211-211
: Initializing the progress bar with the total number of labels and theprogress
parameter enhances user feedback during the import process. This change is well-integrated and improves the function's usability.
212-212
: Introducing theSaveContext
object and using it within a context manager alongside the progress bar is a significant improvement. This change optimizes the saving mechanism by batching save operations, which is especially beneficial for large datasets.
252-252
: Updating the saving mechanism to leverage theSaveContext
object for saving samples is a prudent enhancement. This approach efficiently manages database transactions and reduces overhead, which is particularly advantageous for operations involving large datasets.docs/source/user_guide/config.rst (1)
59-61
: The addition ofsave_context()
to thedefault_batcher
configuration description accurately reflects its expanded usage. Ensure this change is consistent with the actual code implementation.Verification successful
The search results from the source code provide instances where
save_context()
is used, particularly infiftyone/core/dataset.py
,fiftyone/core/view.py
, andfiftyone/core/collections.py
. These instances show the method being used in various contexts, including within dataset and view operations, with parameters such asbatch_size
andbatching_strategy
being passed to it. This indicates thatsave_context()
is indeed designed to work with configurable batching strategies, which could include thedefault_batcher
configuration as mentioned in the documentation.However, the specific term
default_batcher
was not found directly in the search results. This could mean that the termdefault_batcher
is a configuration parameter or a concept that indirectly affects howsave_context()
operates, rather than being directly referenced within the method's implementation or its immediate documentation.Given the evidence from the source code showing
save_context()
's flexible usage with batching strategies, and without direct references todefault_batcher
, it can be inferred that the addition ofsave_context()
to thedefault_batcher
configuration description in the documentation is consistent with the method's design to work with configurable batching strategies. However, the exact termdefault_batcher
being part of the method's implementation or documentation could not be verified directly.Based on this analysis, the review comment is considered to be verified with the understanding that
default_batcher
refers to a concept or configuration affecting batching strategies, whichsave_context()
is designed to work with.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that `save_context()` method uses `default_batcher` configuration. rg "save_context" --type rstLength of output: 664
Script:
#!/bin/bash # Search for mentions of `default_batcher` in relation to `save_context()` in the source code or documentation comments. rg "save_context" -A 5 -B 5 --type pyLength of output: 9915
fiftyone/core/view.py (3)
481-501
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [434-493]
Consider adding type hints for the new parameters to improve code readability and maintainability.
- def iter_samples(self, progress=False, autosave=False, batch_size=None, batching_strategy=None): + def iter_samples(self, progress: bool = False, autosave: bool = False, batch_size: Optional[Union[int, float]] = None, batching_strategy: Optional[str] = None):
493-498
: The introduction of thebatching_strategy
parameter enhances the flexibility of autosaving samples. However, ensure that the implementation of this feature is thoroughly tested, especially for edge cases such as extremely large batch sizes or very short latencies.
619-639
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [572-636]
The changes to
iter_groups
mirror those initer_samples
, introducing thebatching_strategy
parameter. As withiter_samples
, consider adding type hints for the new parameters and ensure thorough testing of the new functionality.- def iter_groups(self, group_slices=None, progress=False, autosave=False, batch_size=None, batching_strategy=None): + def iter_groups(self, group_slices: Optional[Iterable[str]] = None, progress: bool = False, autosave: bool = False, batch_size: Optional[Union[int, float]] = None, batching_strategy: Optional[str] = None):fiftyone/utils/data/parsers.py (7)
1681-1682
: Consider documenting the default value ofwrite_clips
in the class docstring.
1727-1727
: Validate thatclip_path
does not already exist before extracting a clip to avoid unintentional overwrites.
1760-1760
: Clarify in the docstring thatwrite_clips
controls the behavior forClipView
instances specifically.
1848-1848
: The constructor correctly passes parameters to theExtractClipsMixin
base class. Ensure consistent documentation forwrite_clips
across related classes.
1833-1833
: As withFiftyOneUnlabeledVideoSampleParser
, clarify in the docstring the specific conditions under whichwrite_clips
applies.
1859-1859
: The constructor correctly initializes base classes and handles the new parameters for label handling. Ensure that thewrite_clips
parameter is consistently documented and understood across classes.
Line range hint
1905-1905
: Ensure that thecompute_metadata
parameter is clearly documented, including its impact on performance when enabled.fiftyone/core/models.py (5)
322-325
: The use ofcontextlib.ExitStack()
for managing contexts and resources is a good practice, ensuring proper cleanup in case of errors.
343-343
: Centralizing save operations withctx.save(sample)
is a strategic improvement, potentially optimizing database interactions.
905-906
: Ensure that the logic for adjusting field selection based on media type is correctly implemented, especially in the context of embeddings computation.
1447-1452
: Verify the new utilities and methods added for handling sample collections and efficient file reading to ensure they meet the intended requirements and improve performance.
176-198
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1-1794]
Ensure that the removal of unneeded
pylint: disable=no-member
directives does not introduce any new linting issues.fiftyone/core/utils.py (1)
1608-1656
: Ensure the function handles all expected input types correctly, especially whenbatch_size
is neitherNone
, an integer, nor a float.+ # Verify handling of unexpected `batch_size` types + if not isinstance(batch_size, (type(None), int, float)): + raise TypeError("`batch_size` must be None, an integer, or a float")fiftyone/utils/labelbox.py (2)
1645-1657
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1587-1654]
Consider optimizing the sample addition process in
import_from_labelbox
.The current implementation adds new samples one by one inside the loop, which can be inefficient for large datasets. Consider collecting all new samples in a list and adding them in a single call to
dataset.add_samples(new_samples)
after the loop. This approach can significantly reduce database operations and improve performance.
1847-1855
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1850-1870]
Optimize the media upload process in
upload_media_to_labelbox
.The current implementation uploads media files one by one, which can be slow for large datasets. Consider using the
create_data_rows()
method provided by Labelbox, which allows for batch uploading of media files. This method can significantly improve the upload speed by reducing the number of API calls made to Labelbox.fiftyone/utils/annotations.py (4)
1347-1352
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1313-1494]
Ensure
DrawConfig
parsing and usage are correctly implemented indraw_labeled_images
.
1347-1352
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1313-1494]
Verify the conversion of label fields to
ImageLabels
and the annotation process indraw_labeled_image
.
1347-1352
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1313-1494]
Ensure correct handling of clips and full videos in
draw_labeled_videos
.Verification successful
The expanded search successfully identified a reference to
is_clips
within thedraw_labeled_videos
function, confirming that the function does indeed differentiate between clips and full videos through the use of theis_clips
variable. This variable is derived from the dataset's_is_clips
property, indicating a designed approach to handle clips and full videos distinctly within the function's logic.Based on this finding, it can be concluded that the handling of clips and full videos in the
draw_labeled_videos
function aligns with the intended behavior as described in the review comment.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the handling of clips and full videos aligns with the function's logic and intended behavior. grep -A 30 "def draw_labeled_videos(" fiftyone/utils/annotations.py | grep "is_clips"Length of output: 85
Script:
#!/bin/bash # Expand the search to include a broader portion of the function's body and related logic to understand how clips and full videos are handled. grep -A 50 -B 20 "def draw_labeled_videos(" fiftyone/utils/annotations.py | grep -E "is_clips|clip_bounds"Length of output: 150
1347-1352
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1313-1494]
Verify the conversion of label fields to
VideoLabels
and the annotation process indraw_labeled_video
.Verification successful
The verification process has confirmed the presence and usage of both
_to_video_labels
andetaa.annotate_video
within thedraw_labeled_video
function infiftyone/utils/annotations.py
. This aligns with the review comment's request to verify the conversion of label fields toVideoLabels
and the annotation process, indicating that the function correctly implements these steps.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check the conversion and annotation process for correctness and efficiency. grep -A 20 "def draw_labeled_video(" fiftyone/utils/annotations.py | grep "_to_video_labels" grep -A 20 "def draw_labeled_video(" fiftyone/utils/annotations.py | grep "etaa.annotate_video"Length of output: 188
Script:
#!/bin/bash # Extend the search to the entire `draw_labeled_video` function for `_to_video_labels` and `etaa.annotate_video` grep -C 50 "def draw_labeled_video(" fiftyone/utils/annotations.py | grep "_to_video_labels" grep -C 50 "def draw_labeled_video(" fiftyone/utils/annotations.py | grep "etaa.annotate_video"Length of output: 284
fiftyone/utils/data/exporters.py (8)
Line range hint
1085-1085
: Consider adding type hints to improve code readability and maintainability.
Line range hint
1165-1165
: Ensure that thesetup
method properly handles exceptions that may arise during setup, such as issues with creating directories or initializing exporters.
Line range hint
1177-1177
: Verify that theexport_sample
method correctly handles cases where the media cannot be exported due to permissions or other filesystem issues.
Line range hint
1191-1191
: Consider handling potential exceptions when writing the labels JSON file, such as permissions issues or disk space limitations.
Line range hint
1335-1335
: Consider adding type hints to improve code readability and maintainability.
Line range hint
1395-1395
: Ensure that thesetup
method properly handles exceptions that may arise during setup, such as issues with creating directories.
Line range hint
1407-1407
: Verify that theexport_sample
method correctly handles cases where the media cannot be exported due to permissions or other filesystem issues.
Line range hint
1421-1421
: Ensure that theclose
method properly finalizes all aspects of the export, including handling any cleanup necessary for partial exports.fiftyone/utils/cvat.py (3)
4404-4406
: Consider precomputing metadata asynchronously or in parallel.The call to
samples.compute_metadata()
is a potentially expensive operation, especially for large datasets. Consider precomputing this metadata asynchronously or in parallel to improve performance.
5285-5296
: Consider adding a configuration option to control the warning threshold.The warning about the required RAM for uploading all images to CVAT in a single task is helpful, but the threshold is hardcoded. Consider adding a configuration option to allow users to adjust this threshold based on their system capabilities or preferences.
4404-4406
: Consider parallelizing metadata computation for batch uploads.The metadata computation step in the batch upload process could potentially be a bottleneck for large datasets. Consider parallelizing this step to improve the overall upload performance.
fiftyone/core/dataset.py (8)
2144-2149
: Introducingbatching_strategy
parameter enhances flexibility in batch saving. Ensure backward compatibility and update documentation accordingly.
2171-2174
: Example usage ofbatching_strategy
is clear. Consider including these examples in the official documentation for better user guidance.
2193-2207
: Documentation forbatch_size
andbatching_strategy
is clear and comprehensive. Ensure consistency with example usage provided elsewhere.
2220-2224
: Implementation ofbatching_strategy
in autosave context appears correct. Recommend thorough testing to ensure expected behavior under various configurations.
2267-2267
: Introducingbatching_strategy
toiter_groups
enhances batch saving flexibility. Ensure backward compatibility and update documentation accordingly.
2291-2294
: Example usage ofbatching_strategy
withiter_groups
is clear. Consider including these examples in the official documentation for better user guidance.
2316-2330
: Documentation forbatch_size
andbatching_strategy
initer_groups
is clear and comprehensive. Ensure consistency with example usage provided elsewhere.
2347-2351
: Implementation ofbatching_strategy
in autosave context for groups appears correct. Recommend thorough testing to ensure expected behavior under various configurations.fiftyone/core/collections.py (5)
118-128
: Ensure proper initialization of batching strategy-related variables.The initialization of variables related to the batching strategy (
_curr_batch_size
,_curr_batch_size_bytes
,_last_time
) in the__enter__
method is correctly handled based on the chosen strategy. This setup ensures that the context is ready for batch processing as soon as it's entered, which is a good practice.
161-180
: Optimize the calculation of_curr_batch_size_bytes
.The calculation of
_curr_batch_size_bytes
in theif self._batching_strategy == "size":
block involves converting operations to strings and then summing their lengths. This approach might not accurately reflect the actual byte size of the operations and could be inefficient. Consider a more direct method of estimating or tracking the size of operations to improve accuracy and performance.
874-894
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [862-891]
Clarify the behavior of
batch_size
andbatching_strategy
in documentation.The documentation for the
iter_samples
method explains thebatch_size
andbatching_strategy
parameters but could benefit from additional clarity on how these parameters interact and their default behaviors. Providing examples or more detailed descriptions could help users better understand how to use these features effectively.
987-1013
: > 📝 NOTEThis review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [955-1010]
Simplify the
save_context
method's interface if possible.The
save_context
method's interface is consistent with the rest of the changes, offering flexibility in configuring the batching strategy. However, consider if there are ways to simplify this interface for common use cases, potentially by setting sensible defaults or providing utility functions for common configurations. Simplifying the interface can make it easier for users to adopt and use these features.
10082-10086
: Ensure consistent handling of the "filepath" field.The conditional addition or removal of the "filepath" field based on the
include_filepath
flag in_get_media_fields
is a clear and straightforward way to manage the inclusion of this field. This approach allows for flexible handling of media fields, which is beneficial in different contexts.
def parse_batching_strategy(batch_size=None, batching_strategy=None): | ||
"""Parses the given batching strategy configuration, applying any default | ||
config settings as necessary. | ||
|
||
Args: | ||
batch_size (None): the batch size to use. If a ``batching_strategy`` is | ||
provided, this parameter configures that strategy as described | ||
below. If no ``batching_strategy`` is provided, this can either be | ||
an integer specifying the number of samples to save in a batch (in | ||
which case ``batching_strategy`` is implicitly set to ``"static"``) | ||
or a float number of seconds between batched saves (in which case | ||
``batching_strategy`` is implicitly set to ``"latency"``) | ||
batching_strategy (None): the batching strategy to use for each save | ||
operation. Supported values are: | ||
|
||
- ``"static"``: a fixed sample batch size for each save | ||
- ``"size"``: a target batch size, in bytes, for each save | ||
- ``"latency"``: a target latency, in seconds, between saves | ||
""" | ||
if batching_strategy is None: | ||
if batch_size is None: | ||
batching_strategy = fo.config.default_batcher | ||
elif isinstance(batch_size, numbers.Integral): | ||
batching_strategy = "static" | ||
elif isinstance(batch_size, numbers.Number): | ||
batching_strategy = "latency" | ||
else: | ||
raise ValueError( | ||
"Unsupported batch size %s; must be an integer or float" | ||
% batch_size | ||
) | ||
|
||
supported_batching_strategies = ("static", "size", "latency") | ||
if batching_strategy not in supported_batching_strategies: | ||
raise ValueError( | ||
"Unsupported batching strategy '%s'; supported values are %s" | ||
% (batching_strategy, supported_batching_strategies) | ||
) | ||
|
||
if batch_size is None: | ||
if batching_strategy == "static": | ||
batch_size = fo.config.batcher_static_size | ||
elif batching_strategy == "size": | ||
batch_size = fo.config.batcher_target_size_bytes | ||
elif batching_strategy == "latency": | ||
batch_size = fo.config.batcher_target_latency | ||
|
||
return batch_size, batching_strategy | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding type hints for better code readability and static type checking.
- def parse_batching_strategy(batch_size=None, batching_strategy=None):
+ def parse_batching_strategy(batch_size: Optional[Union[int, float]] = None, batching_strategy: Optional[str] = None) -> Tuple[Union[int, float], str]:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def parse_batching_strategy(batch_size=None, batching_strategy=None): | |
"""Parses the given batching strategy configuration, applying any default | |
config settings as necessary. | |
Args: | |
batch_size (None): the batch size to use. If a ``batching_strategy`` is | |
provided, this parameter configures that strategy as described | |
below. If no ``batching_strategy`` is provided, this can either be | |
an integer specifying the number of samples to save in a batch (in | |
which case ``batching_strategy`` is implicitly set to ``"static"``) | |
or a float number of seconds between batched saves (in which case | |
``batching_strategy`` is implicitly set to ``"latency"``) | |
batching_strategy (None): the batching strategy to use for each save | |
operation. Supported values are: | |
- ``"static"``: a fixed sample batch size for each save | |
- ``"size"``: a target batch size, in bytes, for each save | |
- ``"latency"``: a target latency, in seconds, between saves | |
""" | |
if batching_strategy is None: | |
if batch_size is None: | |
batching_strategy = fo.config.default_batcher | |
elif isinstance(batch_size, numbers.Integral): | |
batching_strategy = "static" | |
elif isinstance(batch_size, numbers.Number): | |
batching_strategy = "latency" | |
else: | |
raise ValueError( | |
"Unsupported batch size %s; must be an integer or float" | |
% batch_size | |
) | |
supported_batching_strategies = ("static", "size", "latency") | |
if batching_strategy not in supported_batching_strategies: | |
raise ValueError( | |
"Unsupported batching strategy '%s'; supported values are %s" | |
% (batching_strategy, supported_batching_strategies) | |
) | |
if batch_size is None: | |
if batching_strategy == "static": | |
batch_size = fo.config.batcher_static_size | |
elif batching_strategy == "size": | |
batch_size = fo.config.batcher_target_size_bytes | |
elif batching_strategy == "latency": | |
batch_size = fo.config.batcher_target_latency | |
return batch_size, batching_strategy | |
def parse_batching_strategy(batch_size: Optional[Union[int, float]] = None, batching_strategy: Optional[str] = None) -> Tuple[Union[int, float], str]: | |
"""Parses the given batching strategy configuration, applying any default | |
config settings as necessary. | |
Args: | |
batch_size (None): the batch size to use. If a ``batching_strategy`` is | |
provided, this parameter configures that strategy as described | |
below. If no ``batching_strategy`` is provided, this can either be | |
an integer specifying the number of samples to save in a batch (in | |
which case ``batching_strategy`` is implicitly set to ``"static"``) | |
or a float number of seconds between batched saves (in which case | |
``batching_strategy`` is implicitly set to ``"latency"``) | |
batching_strategy (None): the batching strategy to use for each save | |
operation. Supported values are: | |
- ``"static"``: a fixed sample batch size for each save | |
- ``"size"``: a target batch size, in bytes, for each save | |
- ``"latency"``: a target latency, in seconds, between saves | |
""" | |
if batching_strategy is None: | |
if batch_size is None: | |
batching_strategy = fo.config.default_batcher | |
elif isinstance(batch_size, numbers.Integral): | |
batching_strategy = "static" | |
elif isinstance(batch_size, numbers.Number): | |
batching_strategy = "latency" | |
else: | |
raise ValueError( | |
"Unsupported batch size %s; must be an integer or float" | |
% batch_size | |
) | |
supported_batching_strategies = ("static", "size", "latency") | |
if batching_strategy not in supported_batching_strategies: | |
raise ValueError( | |
"Unsupported batching strategy '%s'; supported values are %s" | |
% (batching_strategy, supported_batching_strategies) | |
) | |
if batch_size is None: | |
if batching_strategy == "static": | |
batch_size = fo.config.batcher_static_size | |
elif batching_strategy == "size": | |
batch_size = fo.config.batcher_target_size_bytes | |
elif batching_strategy == "latency": | |
batch_size = fo.config.batcher_target_latency | |
return batch_size, batching_strategy |
The function documentation could be expanded to include examples of usage and more detailed explanations of the parameters and return values.
+ """
+ Examples:
+ >>> parse_batching_strategy()
+ (1, 'static') # Default values from config
+ >>> parse_batching_strategy(10)
+ (10, 'static') # Explicit batch size, inferred static strategy
+ >>> parse_batching_strategy(0.5, 'latency')
+ (0.5, 'latency') # Explicit batch size and strategy
+ """
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def parse_batching_strategy(batch_size=None, batching_strategy=None): | |
"""Parses the given batching strategy configuration, applying any default | |
config settings as necessary. | |
Args: | |
batch_size (None): the batch size to use. If a ``batching_strategy`` is | |
provided, this parameter configures that strategy as described | |
below. If no ``batching_strategy`` is provided, this can either be | |
an integer specifying the number of samples to save in a batch (in | |
which case ``batching_strategy`` is implicitly set to ``"static"``) | |
or a float number of seconds between batched saves (in which case | |
``batching_strategy`` is implicitly set to ``"latency"``) | |
batching_strategy (None): the batching strategy to use for each save | |
operation. Supported values are: | |
- ``"static"``: a fixed sample batch size for each save | |
- ``"size"``: a target batch size, in bytes, for each save | |
- ``"latency"``: a target latency, in seconds, between saves | |
""" | |
if batching_strategy is None: | |
if batch_size is None: | |
batching_strategy = fo.config.default_batcher | |
elif isinstance(batch_size, numbers.Integral): | |
batching_strategy = "static" | |
elif isinstance(batch_size, numbers.Number): | |
batching_strategy = "latency" | |
else: | |
raise ValueError( | |
"Unsupported batch size %s; must be an integer or float" | |
% batch_size | |
) | |
supported_batching_strategies = ("static", "size", "latency") | |
if batching_strategy not in supported_batching_strategies: | |
raise ValueError( | |
"Unsupported batching strategy '%s'; supported values are %s" | |
% (batching_strategy, supported_batching_strategies) | |
) | |
if batch_size is None: | |
if batching_strategy == "static": | |
batch_size = fo.config.batcher_static_size | |
elif batching_strategy == "size": | |
batch_size = fo.config.batcher_target_size_bytes | |
elif batching_strategy == "latency": | |
batch_size = fo.config.batcher_target_latency | |
return batch_size, batching_strategy | |
def parse_batching_strategy(batch_size=None, batching_strategy=None): | |
"""Parses the given batching strategy configuration, applying any default | |
config settings as necessary. | |
Args: | |
batch_size (None): the batch size to use. If a ``batching_strategy`` is | |
provided, this parameter configures that strategy as described | |
below. If no ``batching_strategy`` is provided, this can either be | |
an integer specifying the number of samples to save in a batch (in | |
which case ``batching_strategy`` is implicitly set to ``"static"``) | |
or a float number of seconds between batched saves (in which case | |
``batching_strategy`` is implicitly set to ``"latency"``) | |
batching_strategy (None): the batching strategy to use for each save | |
operation. Supported values are: | |
- ``"static"``: a fixed sample batch size for each save | |
- ``"size"``: a target batch size, in bytes, for each save | |
- ``"latency"``: a target latency, in seconds, between saves | |
Examples: | |
>>> parse_batching_strategy() | |
(1, 'static') # Default values from config | |
>>> parse_batching_strategy(10) | |
(10, 'static') # Explicit batch size, inferred static strategy | |
>>> parse_batching_strategy(0.5, 'latency') | |
(0.5, 'latency') # Explicit batch size and strategy | |
""" | |
if batching_strategy is None: | |
if batch_size is None: | |
batching_strategy = fo.config.default_batcher | |
elif isinstance(batch_size, numbers.Integral): | |
batching_strategy = "static" | |
elif isinstance(batch_size, numbers.Number): | |
batching_strategy = "latency" | |
else: | |
raise ValueError( | |
"Unsupported batch size %s; must be an integer or float" | |
% batch_size | |
) | |
supported_batching_strategies = ("static", "size", "latency") | |
if batching_strategy not in supported_batching_strategies: | |
raise ValueError( | |
"Unsupported batching strategy '%s'; supported values are %s" | |
% (batching_strategy, supported_batching_strategies) | |
) | |
if batch_size is None: | |
if batching_strategy == "static": | |
batch_size = fo.config.batcher_static_size | |
elif batching_strategy == "size": | |
batch_size = fo.config.batcher_target_size_bytes | |
elif batching_strategy == "latency": | |
batch_size = fo.config.batcher_target_latency | |
return batch_size, batching_strategy |
batch_size (None): the batch size to use. If a ``batching_strategy`` is | ||
provided, this parameter configures the strategy as described below. | ||
If no ``batching_strategy`` is provided, this can either be an | ||
integer specifying the number of samples to save in a batch (in | ||
which case ``batching_strategy`` is implicitly set to ``"static"``) | ||
or a float number of seconds between batched saves (in which case | ||
``batching_strategy`` is implicitly set to ``"latency"``) | ||
batching_strategy (None): the batching strategy to use for each save | ||
operation. Supported values are: | ||
|
||
- ``"static"``: a fixed sample batch size for each save | ||
- ``"size"``: a target batch size, in bytes, for each save | ||
- ``"latency"``: a target latency, in seconds, between saves | ||
""" | ||
|
||
def __init__(self, sample_collection, batch_size=None): | ||
if batch_size is None: | ||
batch_size = 0.2 | ||
def __init__( | ||
self, | ||
sample_collection, | ||
batch_size=None, | ||
batching_strategy=None, | ||
): | ||
batch_size, batching_strategy = fou.parse_batching_strategy( | ||
batch_size=batch_size, batching_strategy=batching_strategy | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider validating batch_size
and batching_strategy
parameters.
The constructor of SaveContext
accepts batch_size
and batching_strategy
parameters but does not perform any validation on them. It's important to validate these parameters to ensure they meet the expected criteria (e.g., batch_size
is a positive number, batching_strategy
is one of the allowed values). This can prevent potential runtime errors and improve the robustness of the code.
def _resolve_media_field(self, media_field): | ||
_media_field, is_frame_field = self._handle_frame_field(media_field) | ||
|
||
media_fields = self._get_media_fields(frames=is_frame_field) | ||
for root, leaf in media_fields.items(): | ||
if leaf is not None: | ||
leaf = root + "." + leaf | ||
|
||
if _media_field in (root, leaf): | ||
return leaf if leaf is not None else root | ||
|
||
raise ValueError("'%s' is not a valid media field" % media_field) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate the media_field
parameter in _resolve_media_field
.
The _resolve_media_field
method raises a ValueError
if the provided media_field
is not valid. It's good practice to validate input parameters, but consider adding more detailed error messages or suggestions for valid values to help users correct their input more easily.
@@ -19,6 +19,7 @@ | |||
import eta.core.web as etaw | |||
|
|||
import fiftyone as fo | |||
import fiftyone.core.collections as foc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing the unused import fiftyone.core.collections
if it's not being utilized within this file.
@@ -1707,7 +1707,7 @@ def _get_clip_path(self, sample): | |||
video_path = sample.filepath | |||
basename, ext = os.path.splitext(os.path.basename(video_path)) | |||
|
|||
if self.export_media: | |||
if self.write_clips: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that clip_dir
is created if it does not exist when write_clips
is True
.
if self.write_clips:
+ os.makedirs(self.clip_dir, exist_ok=True)
if self.clip_dir is None:
self.clip_dir = etau.make_temp_dir()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if self.write_clips: | |
if self.write_clips: | |
os.makedirs(self.clip_dir, exist_ok=True) | |
if self.clip_dir is None: | |
self.clip_dir = etau.make_temp_dir() |
try: | ||
end = len(sliceable) | ||
except: | ||
end = None | ||
|
||
start = 0 | ||
while True: | ||
if end is not None and start >= end: | ||
return | ||
|
||
chunk = sliceable[start : (start + batch_size)] | ||
if len(chunk) == 0: # works for numpy arrays, Torch tensors, etc | ||
|
||
# works for numpy arrays, Torch tensors, etc | ||
if end is None and len(chunk) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of cases where the length of the sliceable object is not available could be made more explicit in the documentation.
+ """
+ Note:
+ If the length of the `sliceable` object is not available, this function
+ will attempt to iterate over it until it's exhausted. This behavior is
+ suitable for objects like generators, but it may have unintended
+ consequences for infinite generators.
+ """
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
try: | |
end = len(sliceable) | |
except: | |
end = None | |
start = 0 | |
while True: | |
if end is not None and start >= end: | |
return | |
chunk = sliceable[start : (start + batch_size)] | |
if len(chunk) == 0: # works for numpy arrays, Torch tensors, etc | |
# works for numpy arrays, Torch tensors, etc | |
if end is None and len(chunk) == 0: | |
try: | |
end = len(sliceable) | |
except: | |
end = None | |
start = 0 | |
while True: | |
if end is not None and start >= end: | |
return | |
chunk = sliceable[start : (start + batch_size)] | |
# works for numpy arrays, Torch tensors, etc | |
if end is None and len(chunk) == 0: |
closing in favor of #4244 |
Change log
autosave=True
or its equivalent rather than callingsample.save()
in a loopexport_media
parameter towrite_clips
in video sample parsers, for clarityselect_fields()
incompute_embeddings()
andcompute_patch_embeddings()
, for consistency with how this is handled in theapply_model()
utility.fou.iter_slices()
SampleCollection._resolve_media_field()
utilityfos.read_files()
method to efficiently read from multiple files in a threadpoolpylint: disable=no-member