This repository provides data used by the Web Application Firewall EasyWAF.
A whitelist of IP address ranges of search engine crawlers and crawlers of other big platforms is scraped every 12 hours by a GitHub Action and stored in this repository. This list is used by the "Fake Crawlers" module of Easy WAF to block fake crawlers. For more information about the WAF visit the EasyWAF repository.
The authenticity of most crawlers can be determined with a reverse DNS lookup, but an additional IP whitelist increases performance. In addition, the authenticity of some crawlers, such as the Facebook crawler, can only be determined by the IP.
But why is the IP range list not created locally by EasyWAF itself? The main reason is that the download of the BGP Routing Table Analysis takes some time, especially with poor internet connections. This effect would be amplified if an application is started multiple times in parallel, for example with Node.js cluster mode. In addition, it is possible to react more quickly to changes in the data sources used without having to update EasyWAF.
Why is this not a security issue? The data is only used for the whitelist of the Fake Crawler module, so adding malicious IPs does not allow WAF bypassing from those IP addresses. A disruption or failure of this data source would currently only cause problems with Facebook crawlers and somewhat reduce the performance of EasyWAF.
- Documentation: Check Googlebot and other Google crawlers
- Direct link to JSON: Google IP ranges
- Direct link to JSON: Bing IP ranges
- Documentation: Facebook Crawler
- IP ranges are scraped from BGP Routing Table Analysis
- Documentation: Twitterbot
- IP ranges are scraped from BGP Routing Table Analysis
- Documentation: Is DuckDuckBot related to DuckDuckGo?
- Documentation: Pinterest Crawlers
- Website: BGP Routing Table Analysis
- Direct link to IPv4 ranges: IPv4 Prefixes
- Direct link to IPv6 ranges: IPv6 Prefixes
If a public GitHub issue or discussion is not the right choice for your concern, you can contact me directly:
- E-Mail: info@timokoessler.de
- My Website: timokoessler.de