-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] - buffered file reader #2731
Conversation
ce9a0cc
to
c72c99f
Compare
pkg/readers/bufferedfilereader.go
Outdated
return nil, err | ||
} | ||
|
||
rdr, ok := reader.(io.ReadSeekCloser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we control all the relevant implementations, is it correct to say that we always expect this to succeed? And if so, should ReadCloser()
be changed to return a *io.ReadSeekCloser
so that we don't have to do this at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered raising this question in the pr: If we modify the return type to io.ReadSeekCloser, we will also need to revise the ReadCloser method in the contentReader interface within gitparse.go. This adjustment left me somewhat uncertain, but I'm happy to make the change.
I also tend to get a little squimish around interface assertions. 😅
the test failure should be fixed on |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [trufflesecurity/trufflehog](https://togithub.com/trufflesecurity/trufflehog) | action | minor | `v3.74.0` -> `v3.75.0` | --- ### Release Notes <details> <summary>trufflesecurity/trufflehog (trufflesecurity/trufflehog)</summary> ### [`v3.75.0`](https://togithub.com/trufflesecurity/trufflehog/releases/tag/v3.75.0) [Compare Source](https://togithub.com/trufflesecurity/trufflehog/compare/v3.74.0...v3.75.0) #### What's Changed - \[chore] - update buffer metrics by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2737 - fix(deps): update module github.com/aws/aws-sdk-go to v1.51.28 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2741 - chore(deps): update golangci/golangci-lint-action action to v5 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2744 - Scan commit metadata by [@​rgmz](https://togithub.com/rgmz) in [trufflesecurity/trufflehog#2713 - Fix SQL Server detector tests by [@​rosecodym](https://togithub.com/rosecodym) in [trufflesecurity/trufflehog#2716 - Revert "Scan commit metadata" by [@​rosecodym](https://togithub.com/rosecodym) in [trufflesecurity/trufflehog#2747 - \[bug] - Refactor newDiff constructor to avoid double initialization of contentWriter by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2742 - \[chore] - update buffered file writer metric by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2740 - \[refactor] - lazy buffer retrieval by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2745 - \[chore] Remove broken test by [@​mcastorina](https://togithub.com/mcastorina) in [trufflesecurity/trufflehog#2748 - \[bug] - fix buffer size metric by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2749 - \[bug] - Fix the metric for buffered file writer writes by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2750 - fix(deps): update module github.com/aws/aws-sdk-go to v1.51.29 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2751 - update integration logos by [@​dustin-decker](https://togithub.com/dustin-decker) in [trufflesecurity/trufflehog#2752 - fix(deps): update module github.com/aws/aws-sdk-go to v1.51.30 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2756 - \[chore] - add additional binary extension by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2760 - pkg: fix function names in comment by [@​mountcount](https://togithub.com/mountcount) in [trufflesecurity/trufflehog#2761 - \[chore] - ignore pbix and vsdx files by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2762 - fix(deps): update module github.com/aws/aws-sdk-go to v1.51.31 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2763 - Scan commit metadata by [@​rgmz](https://togithub.com/rgmz) in [trufflesecurity/trufflehog#2754 - \[bug] - Correctly set metrics for enumerated orgs by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2757 - \[chore ] -Update ignore extensions by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2764 - \[chore] Add some happy path logs to GitLab by [@​mcastorina](https://togithub.com/mcastorina) in [trufflesecurity/trufflehog#2765 - Fix Git source test by [@​rgmz](https://togithub.com/rgmz) in [trufflesecurity/trufflehog#2767 - \[feat] - buffered file reader by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2731 - \[feat] - Add ReadFrom method to BufferedFileWriter by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2759 - fix(deps): update module google.golang.org/protobuf to v1.34.0 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2766 - \[bug] - Improve BufferedFileReader Close Behavior by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2768 - fixes calendly api key regex by [@​ankushgoel27](https://togithub.com/ankushgoel27) in [trufflesecurity/trufflehog#2368 - Expose detector-specific false positive logic by [@​rosecodym](https://togithub.com/rosecodym) in [trufflesecurity/trufflehog#2743 - Detector-Fix: Reintroduce Cloudflareglobalapikey by [@​ankushgoel27](https://togithub.com/ankushgoel27) in [trufflesecurity/trufflehog#2101 - Detector-Competition-Fix - fixed the alchemy detector regex by [@​ankushgoel27](https://togithub.com/ankushgoel27) in [trufflesecurity/trufflehog#1821 - fix(deps): update module github.com/aws/aws-sdk-go to v1.51.32 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2769 - fix(deps): update module google.golang.org/api to v0.177.0 by [@​renovate](https://togithub.com/renovate) in [trufflesecurity/trufflehog#2770 - \[chore] - update imports by [@​ahrav](https://togithub.com/ahrav) in [trufflesecurity/trufflehog#2772 - adds build version to finished scanning log by [@​zricethezav](https://togithub.com/zricethezav) in [trufflesecurity/trufflehog#2773 - Update rabbitmq.go regex detect amqps protocol by [@​NikhilPanwar](https://togithub.com/NikhilPanwar) in [trufflesecurity/trufflehog#2609 - fix for infinite recursion in Postman var sub by [@​zricethezav](https://togithub.com/zricethezav) in [trufflesecurity/trufflehog#2780 #### New Contributors - [@​mountcount](https://togithub.com/mountcount) made their first contribution in [trufflesecurity/trufflehog#2761 **Full Changelog**: trufflesecurity/trufflehog@v3.74.0...v3.75.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/matter-labs/vault-auth-tee). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMzEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjMzMS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Description:
This PR introduces a new
readers
package with abufferedFileReader
implementation for efficient random access reading, seeking, and closing operations. ThebufferedFileReader
combines the functionality ofBufferedFileWriter
for buffered writing and anio.ReadSeekCloser
for random access reading and seeking. It also provides aClose
method to release the buffer pool.BUT Why??
The main reasons for introducing this new reader implementation are two-fold:
Memory Management: This will provide a way to create a custom reader that can accept an
io.Reader
with any amount of data without worrying about holding all the data in memory. This addresses the current issue with how we handle the result of “git cat-file” where we hold the entire contents in memory. It will also alleviate memory concerns in other scenarios where the reader we pass tohandlers.HandleFile
holds a lot of data.Archiver Library Compatibility: The archiver library we use depends on the underlying reader implementing the
seekReaderAt
(io.ReaderAt & io.Seeker) interface. This newbufferedFileReader
struct allows us to use it in all those places while leveraging our existingBufferedFileWriter
when writing data.Ok... but, why not use the existing
DiskbufferReader
??While we have an existing
diskbufferReader
implementation which is a valid solution, we decided to introduce the newbufferedFileReader
for the following reasons:Efficiency for Small Files: The majority of files we need to process are under 10MB in size. Creating a temporary file for every single file, as done by the
diskbufferReader
, can be inefficient for small files. ThebufferedFileReader
avoids this overhead by storing small files in memory using theBufferedFileWriter
.Observability and Metrics: The
BufferedFileWriter
, which thebufferedFileReader
leverages, provides observability through existing instrumented metrics. This allows us to monitor and track the performance of the new reader implementation more effectively.Buffer Pool Optimization: The
BufferedFileWriter
is optimized to use an existing buffer pool to reduce memory allocation. This optimization can improve performance.Key features:
BufferedFileWriter
for buffered writing and data storage.io.ReadSeekCloser
.io.Reader
,io.Seeker
, andio.ReaderAt
interfaces.Close
method releases the buffer back to the pool.Checklist:
make test-community
)?make lint
this requires golangci-lint)?