Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: adds a new decoder for compressed data. #1488

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joe-make-infrastructure
This is aimed at occasions where people attempt to smuggle secrets out
by zipping up a text file then base64 encoding it and to detect encoded
zipped secrets in things like pipeline logs from a CI system.

    This is aimed at occasions where people attempt to smuggle secrets out
    by zipping up a text file then base64 encoding it and to detect encoded
    zipped secrets in things like pipeline logs from a CI system.
@joe-make-infrastructure joe-make-infrastructure requested a review from a team as a code owner July 13, 2023 13:40
@joe-make-infrastructure
Copy link
Author

@dustin-decker -- new PR with only one user.. Ta..

@bill-rich
Copy link
Collaborator

Thanks for the contribution!

This looks very similar to the archive handler. The reason handlers were created was to deal with units to be scanned before they are chunked, since individual chunks may not include the entire archive.

Does this cover another case I'm overlooking?

@joe-make-infrastructure
Copy link
Author

joe-make-infrastructure commented Jul 20, 2023

Yeah, I think so. In $day_job we see zip files encoded as base64 in log files, that can contain secrets. Sometimes this is an artefact from a CI/CD system, or it could be a malicious user attempting to smuggle them out. An example would be the below:

` echo "H4sIANkDnGQAA0ssL45PTE5OLS6Oz06tjM9MUbBVcPT2dAyN8g31dXL0C/X0NDf18nflSgQqLE5NLkotQVIPVByYWmhmmJhs4B2SUZFUGZhiZBSQVlqanJ0UmVGsbZ7qWWLgZlmcbGykzQUAP+HEHGsAAAA=" | base64 -d | gunzip -

aws_access_key_id = AKIAUZMUMBANUII75JOE
aws_secret_access_key = Qeq61ac0KThxbyQd22PfuuckbYhs+7eIt0F9sc32+`

I initially tried pushing the output of the BASE64 decode (if it didn't detect any secrets) back into the handlerChan, but by that point it's closing and causes trufflehog to hang. This would also mean that chunk Size isn't a problem in this case. That would be my preferred solution, in case the zip file is an Extractor type, but I couldn't get it to work. But that is an edge case for us, the most common is similar to above.

By having the compressor decoder run after the BASE64 decoder, it chains them together without code duplication, as the BASE64 decoder writes the decoded data back into the chunk at:

chunk.Data = result.Bytes()

Currently chunk size is a problem, but for the cases we are looking for, 10k is enough for now. We may consider making chunk size configurable as a command line flag if it becomes an issue.

Does that help clarify?

Kind regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants