Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Aug 22, 2023

This PR introduces a new input parameter - a boolean specifying whether or not to actually include the binary file data associated with the requested files - to a binary data retrieval method. Before, this variable was simply never passed when the requests were made, and so no actual data was ever actually returned, only metadata. The PR allows for actual data to be retrieved. Also included in the PR is an update to the relevant class's testing suite and some necessary resulting changes to the method taking on this new input parameter.. and also a docstring update.

@ghost ghost requested review from dannenberg and stuqdog August 22, 2023 05:08
@ghost ghost self-requested a review as a code owner August 22, 2023 05:08
@ghost ghost requested a review from njooma August 22, 2023 05:08
Copy link
Member

@stuqdog stuqdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! One minor spelling fix and one question, otherwise lgtm :)

self,
filter: Optional[Filter] = None,
dest: Optional[str] = None,
self, filter: Optional[Filter] = None, dest: Optional[str] = None, include_binary: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is defaulting to False the right behavior here? I would've expected True to be a better default but will defer to @dannenberg on this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you'll wanna do some extra work to make this nice (ex add nice errors and a better bool name)... but defaulting to false is likely correct as you can only return one actual file. https://viaminc.slack.com/archives/C0307PGHR7E/p1692649912349079?thread_ts=1692649715.146639&cid=C0307PGHR7E

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can totally see it going both ways. I only made it default to false because True would make the amount of data received of course much larger and noticeably slow down the execution of the function, so I opted for the faster choice. Also @dannenberg the method itself will return as many files as fit into the filter. The 100 vs 1 limit thing only affects how many files are included in a single request. The method repeatedly makes requests over and over again until it gets the right number of files, regardless of the value of include_binary. So if a filter matches 7 files, calling this method will always return a list of those 7 files, they just may or may not actually contain the binary data of the file depending on the value of this boolean. Do you still think it should default to false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i understand it (based on devin hilly's comment), you will not get any binary data unless you set the limit to 1. the code should be changed to reflect this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannenberg this true on a per-request basis. In a single request for data, the limit must be 1 if you want the binary data to be included. The limit can be something else if include_binary is false. We account for this here:
data_request = DataRequest(filter=filter, limit=1 if include_file_data else 100, last=last)
This method makes multiple requests to get the relevant files. So if there 10 files matching the filter you pass the method AND you want the binary file data included, the method will make 10 individual gRPC requests (each with a limit of 1, so that the requests actually work) and store each response in a list that is eventually returned. If you don't want the binary file data, then all 10 relevant files can and will be returned in a single response following a single request that has a limit that isn't 1. None of these internal mechanics are exposed to the caller of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. thanks.

@ghost ghost requested a review from stuqdog August 22, 2023 14:24
@dannenberg
Copy link
Contributor

dannenberg commented Aug 22, 2023 via email

@ghost
Copy link
Author

ghost commented Aug 22, 2023

@dannenberg @stuqdog Okay I just changed up the method a bit so that it takes in another variable capping the number of binary files returned. it defaults to 100 if we are not including the actual binary data, but 10 if we are. The user can of course set a number to override either of these values, or pass "0" to indicate no limit (i.e., just return everything that matches). Also, the new boolean now defaults to True.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@stuqdog stuqdog merged commit c96524e into viamrobotics:main Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants