Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] - buffered file reader #2731

Merged
merged 17 commits into from
Apr 30, 2024
Merged

[feat] - buffered file reader #2731

merged 17 commits into from
Apr 30, 2024

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented Apr 22, 2024

Description:

This PR introduces a new readers package with a bufferedFileReader implementation for efficient random access reading, seeking, and closing operations. The bufferedFileReader combines the functionality of BufferedFileWriter for buffered writing and an io.ReadSeekCloser for random access reading and seeking. It also provides a Close method to release the buffer pool.

BUT Why??

The main reasons for introducing this new reader implementation are two-fold:

  1. Memory Management: This will provide a way to create a custom reader that can accept an io.Reader with any amount of data without worrying about holding all the data in memory. This addresses the current issue with how we handle the result of “git cat-file” where we hold the entire contents in memory. It will also alleviate memory concerns in other scenarios where the reader we pass to handlers.HandleFile holds a lot of data.

  2. Archiver Library Compatibility: The archiver library we use depends on the underlying reader implementing the seekReaderAt (io.ReaderAt & io.Seeker) interface. This new bufferedFileReader struct allows us to use it in all those places while leveraging our existing BufferedFileWriter when writing data.

Ok... but, why not use the existing DiskbufferReader??

While we have an existing diskbufferReader implementation which is a valid solution, we decided to introduce the new bufferedFileReader for the following reasons:

  1. Efficiency for Small Files: The majority of files we need to process are under 10MB in size. Creating a temporary file for every single file, as done by the diskbufferReader, can be inefficient for small files. The bufferedFileReader avoids this overhead by storing small files in memory using the BufferedFileWriter.

  2. Observability and Metrics: The BufferedFileWriter, which the bufferedFileReader leverages, provides observability through existing instrumented metrics. This allows us to monitor and track the performance of the new reader implementation more effectively.

  3. Buffer Pool Optimization: The BufferedFileWriter is optimized to use an existing buffer pool to reduce memory allocation. This optimization can improve performance.

Key features:

  • Utilizes BufferedFileWriter for buffered writing and data storage.
  • Supports random access reading and seeking using an io.ReadSeekCloser.
  • Implements io.Reader, io.Seeker, and io.ReaderAt interfaces.
  • Close method releases the buffer back to the pool.
  • Designed for efficient handling of large data with minimal memory usage.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@ahrav ahrav requested a review from a team April 22, 2024 22:51
@ahrav ahrav force-pushed the feat-buffered-file-reader branch from ce9a0cc to c72c99f Compare April 22, 2024 22:54
@ahrav ahrav marked this pull request as ready for review April 23, 2024 00:27
@ahrav ahrav requested a review from a team as a code owner April 23, 2024 00:27
pkg/readers/bufferedfilereader.go Show resolved Hide resolved
return nil, err
}

rdr, ok := reader.(io.ReadSeekCloser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we control all the relevant implementations, is it correct to say that we always expect this to succeed? And if so, should ReadCloser() be changed to return a *io.ReadSeekCloser so that we don't have to do this at runtime?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered raising this question in the pr: If we modify the return type to io.ReadSeekCloser, we will also need to revise the ReadCloser method in the contentReader interface within gitparse.go. This adjustment left me somewhat uncertain, but I'm happy to make the change.

I also tend to get a little squimish around interface assertions. 😅

Base automatically changed from update-write-method to main April 23, 2024 15:47
@ahrav ahrav requested a review from rosecodym April 24, 2024 19:12
@ahrav ahrav requested review from a team April 24, 2024 19:22
@rosecodym
Copy link
Contributor

the test failure should be fixed on main

@ahrav ahrav merged commit 46d4ae1 into main Apr 30, 2024
12 checks passed
@ahrav ahrav deleted the feat-buffered-file-reader branch April 30, 2024 14:31
haraldh added a commit to matter-labs/vault-auth-tee that referenced this pull request May 6, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
|
[trufflesecurity/trufflehog](https://togithub.com/trufflesecurity/trufflehog)
| action | minor | `v3.74.0` -> `v3.75.0` |

---

### Release Notes

<details>
<summary>trufflesecurity/trufflehog
(trufflesecurity/trufflehog)</summary>

###
[`v3.75.0`](https://togithub.com/trufflesecurity/trufflehog/releases/tag/v3.75.0)

[Compare
Source](https://togithub.com/trufflesecurity/trufflehog/compare/v3.74.0...v3.75.0)

#### What's Changed

- \[chore] - update buffer metrics by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2737
- fix(deps): update module github.com/aws/aws-sdk-go to v1.51.28 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2741
- chore(deps): update golangci/golangci-lint-action action to v5 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2744
- Scan commit metadata by [@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2713
- Fix SQL Server detector tests by
[@&#8203;rosecodym](https://togithub.com/rosecodym) in
[trufflesecurity/trufflehog#2716
- Revert "Scan commit metadata" by
[@&#8203;rosecodym](https://togithub.com/rosecodym) in
[trufflesecurity/trufflehog#2747
- \[bug] - Refactor newDiff constructor to avoid double initialization
of contentWriter by [@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2742
- \[chore] - update buffered file writer metric by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2740
- \[refactor] - lazy buffer retrieval by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2745
- \[chore] Remove broken test by
[@&#8203;mcastorina](https://togithub.com/mcastorina) in
[trufflesecurity/trufflehog#2748
- \[bug] - fix buffer size metric by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2749
- \[bug] - Fix the metric for buffered file writer writes by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2750
- fix(deps): update module github.com/aws/aws-sdk-go to v1.51.29 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2751
- update integration logos by
[@&#8203;dustin-decker](https://togithub.com/dustin-decker) in
[trufflesecurity/trufflehog#2752
- fix(deps): update module github.com/aws/aws-sdk-go to v1.51.30 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2756
- \[chore] - add additional binary extension by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2760
- pkg: fix function names in comment by
[@&#8203;mountcount](https://togithub.com/mountcount) in
[trufflesecurity/trufflehog#2761
- \[chore] - ignore pbix and vsdx files by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2762
- fix(deps): update module github.com/aws/aws-sdk-go to v1.51.31 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2763
- Scan commit metadata by [@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2754
- \[bug] - Correctly set metrics for enumerated orgs by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2757
- \[chore ] -Update ignore extensions by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2764
- \[chore] Add some happy path logs to GitLab by
[@&#8203;mcastorina](https://togithub.com/mcastorina) in
[trufflesecurity/trufflehog#2765
- Fix Git source test by [@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2767
- \[feat] - buffered file reader by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2731
- \[feat] - Add ReadFrom method to BufferedFileWriter by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2759
- fix(deps): update module google.golang.org/protobuf to v1.34.0 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2766
- \[bug] - Improve BufferedFileReader Close Behavior by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2768
- fixes calendly api key regex by
[@&#8203;ankushgoel27](https://togithub.com/ankushgoel27) in
[trufflesecurity/trufflehog#2368
- Expose detector-specific false positive logic by
[@&#8203;rosecodym](https://togithub.com/rosecodym) in
[trufflesecurity/trufflehog#2743
- Detector-Fix: Reintroduce Cloudflareglobalapikey by
[@&#8203;ankushgoel27](https://togithub.com/ankushgoel27) in
[trufflesecurity/trufflehog#2101
- Detector-Competition-Fix - fixed the alchemy detector regex by
[@&#8203;ankushgoel27](https://togithub.com/ankushgoel27) in
[trufflesecurity/trufflehog#1821
- fix(deps): update module github.com/aws/aws-sdk-go to v1.51.32 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2769
- fix(deps): update module google.golang.org/api to v0.177.0 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2770
- \[chore] - update imports by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2772
- adds build version to finished scanning log by
[@&#8203;zricethezav](https://togithub.com/zricethezav) in
[trufflesecurity/trufflehog#2773
- Update rabbitmq.go regex detect amqps protocol by
[@&#8203;NikhilPanwar](https://togithub.com/NikhilPanwar) in
[trufflesecurity/trufflehog#2609
- fix for infinite recursion in Postman var sub by
[@&#8203;zricethezav](https://togithub.com/zricethezav) in
[trufflesecurity/trufflehog#2780

#### New Contributors

- [@&#8203;mountcount](https://togithub.com/mountcount) made their first
contribution in
[trufflesecurity/trufflehog#2761

**Full Changelog**:
trufflesecurity/trufflehog@v3.74.0...v3.75.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/matter-labs/vault-auth-tee).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMzEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjMzMS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants