[Bug] Add an identifiable user agent so that I can prevent it from crawling my site

**Describe the Bug**
There is no mechanism to prevent firecrawl from crawling my site. I do not want my site to be crawled. Do not crawl my site.

**To Reproduce**
Steps to reproduce the issue:
1. Crawl a site that does not want to be crawled.

**Expected Behavior**
It should not crawl the site that does not want to be crawled.
The easiest way to do this would be to add something like `Firecrawl/1.4.3` (or whatever the current version is) to the end of the user agent, and allow people to write an nginx rule to block it:
```nginx
map $http_user_agent $llm_scraper_user_agent {
    default         0;
    ~*Firecrawl     1;
}

server {

    # ...

    if ($llm_scraper_user_agent) {
        return 403;
    }

    # ...
}
```

Additionally, you could also implement following the `robots.txt` and look for a rule that either matches `*` or `Firecrawl`.

**Environment (please complete the following information):**
- OS: N/A
- Firecrawl Version: N/A
- Node.js Version: N/A

**Additional Context**
Add any other context about the problem here, such as configuration specifics, network conditions, data volumes, etc.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Add an identifiable user agent so that I can prevent it from crawling my site #1169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Add an identifiable user agent so that I can prevent it from crawling my site #1169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions