Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] - Mime Type Detection #3084

Open
wants to merge 1 commit into
base: perf-optimize-chunker
Choose a base branch
from

Conversation

ahrav
Copy link
Collaborator

@ahrav ahrav commented Jul 21, 2024

Description:

This PR refactors the handling of MIME type detection during file processing. Instead of performing this check at the final stage of the file handling in handleNonArchiveContent, it is moved to an earlier stage to avoid unnecessary operations.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@ahrav ahrav force-pushed the refactor-mimetype-detection branch from 3364dbe to 63229d5 Compare July 22, 2024 01:51
@ahrav ahrav marked this pull request as ready for review July 22, 2024 17:18
@ahrav ahrav requested a review from a team as a code owner July 22, 2024 17:18
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty confused by adding undocumented file type detection way down in the call stack (specific comments inline), as well as the thrown-away MIME type detection.

@@ -89,7 +89,7 @@ func (h *archiveHandler) openArchive(ctx logContext.Context, depth int, reader f

if reader.format == nil {
if depth > 0 {
mtr, err := newSizedMimetypeReaderFromFileReader(reader)
mtr, err := newSizedReaderFromFileReader(reader)
if err != nil {
return fmt.Errorf("error reading MIME type: %w", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this error message be updated?

@@ -44,7 +44,7 @@ func (h *defaultHandler) HandleFile(ctx logContext.Context, input fileReader) (c
h.metrics.incFilesProcessed()
}()

mtr, err := newSizedMimetypeReaderFromFileReader(input)
mtr, err := newSizedReaderFromFileReader(input)
if err != nil {
ctx.Logger().Error(err, "error reading MIME type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this error message be updated?

return sizedReader{}, errEmptyReader
}

bufReader, _, err := determineMIMEType(r)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you determine the MIME type but immediately throw it away? This seems to reverse all the optimizations you've been adding :/

// This function is particularly useful for specialized archive handlers
// that need to pass extracted content to the default handler without modifying the original reader.
func newSizedMimeTypeReader(r io.Reader, size int64) (sizedMimeTypeReader, error) {
func newSizedReader(r io.Reader, size int64) (sizedReader, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would never occur to me that this function could fail because of an unsupported file type - that behavior isn't hinted at in the name or the doc comment.

Comment on lines +329 to +334
if errors.Is(err, errEmptyReader) {
ctx.Logger().V(5).Info("empty reader, skipping file")
return nil
} else if errors.Is(err, errUnsupportedMIME) {
ctx.Logger().V(5).Info("skipping file")
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use handleReaderError here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants