Description
Describe the Bug
There is no mechanism to prevent firecrawl from crawling my site. I do not want my site to be crawled. Do not crawl my site.
To Reproduce
Steps to reproduce the issue:
- Crawl a site that does not want to be crawled.
Expected Behavior
It should not crawl the site that does not want to be crawled.
The easiest way to do this would be to add something like Firecrawl/1.4.3
(or whatever the current version is) to the end of the user agent, and allow people to write an nginx rule to block it:
map $http_user_agent $llm_scraper_user_agent {
default 0;
~*Firecrawl 1;
}
server {
# ...
if ($llm_scraper_user_agent) {
return 403;
}
# ...
}
Additionally, you could also implement following the robots.txt
and look for a rule that either matches *
or Firecrawl
.
Environment (please complete the following information):
- OS: N/A
- Firecrawl Version: N/A
- Node.js Version: N/A
Additional Context
Add any other context about the problem here, such as configuration specifics, network conditions, data volumes, etc.