Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out Cloudflare error pages + performance improvement #41

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

michenriksen
Copy link

Happy New Year! 🎉

This PR adds a new -filter-cf-errors flag to httprobe which causes it to read the first 512 bytes from listening servers in order to determine if they are a "wildcard" response from Cloudflare like the ones shown here:

Cloudflare error pages

If a Cloudflare signature string is found, the function returns false to treat it as not listening. The functionality is implemented in a generic fashion so that it can easily be extended to filter out other common false-positive responses.

The PR also implements a small performance improvement by performing a DNS lookup on incoming domains and ignore the unresolvable ones to avoid filling up the job channels with dead domains. I timed the execution with a list containing 1483 resolveable and unresolvable domains with the large port list and was able to shave off 7 minutes of execution time, which is not a lot, but also not insignificant:

cat hosts.txt | time httprobe -t 2000 -p large
httprobe -t 2000 -p large  10.73s user 10.15s system 0% cpu 47:13.08 total

cat hosts.txt | time new-httprobe -t 2000 -p large
new-httprobe -t 2000 -p large  12.39s user 11.03s system 0% cpu 40:10.77 total

cat hosts.txt | time new-httprobe -t 2000 -filter-cf-errors -p large
new-httprobe -t 2000 -filter-cf-errors -p large  13.13s user 10.86s system 0% cpu 40:00.91 total

Michael Henriksen added 2 commits January 1, 2021 12:48
Adds a new filterStrings argument to isListening
which will cause the function to read the first
512 bytes from responses and check for any of
the given strings and return false if present.
This is implemented in order to support a new
-filter-cf-errors cli flag which will filter out
error pages from Cloudflare as they are common
false-positives.
Performs a DNS lookup on incoming domains so that
channels are not filled with jobs to check dead
domains.
@michenriksen
Copy link
Author

@tomnomnom I came across the httpx project by Project Discovery and noticed that the tool has support for filter strings with the -filter-string and -filter-regex flags. Would you be interested in having a similar feature for httprobe? If so, I will gladly refactor this PR to implement more generic filtering instead of the Cloudflare-specific filtering. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant