Skip to content

[Bug] Add an identifiable user agent so that I can prevent it from crawling my site #1169

Open
@solonovamax

Description

@solonovamax

Describe the Bug
There is no mechanism to prevent firecrawl from crawling my site. I do not want my site to be crawled. Do not crawl my site.

To Reproduce
Steps to reproduce the issue:

  1. Crawl a site that does not want to be crawled.

Expected Behavior
It should not crawl the site that does not want to be crawled.
The easiest way to do this would be to add something like Firecrawl/1.4.3 (or whatever the current version is) to the end of the user agent, and allow people to write an nginx rule to block it:

map $http_user_agent $llm_scraper_user_agent {
    default         0;
    ~*Firecrawl     1;
}

server {

    # ...

    if ($llm_scraper_user_agent) {
        return 403;
    }

    # ...
}

Additionally, you could also implement following the robots.txt and look for a rule that either matches * or Firecrawl.

Environment (please complete the following information):

  • OS: N/A
  • Firecrawl Version: N/A
  • Node.js Version: N/A

Additional Context
Add any other context about the problem here, such as configuration specifics, network conditions, data volumes, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions