Skip to content

taicaile/scrapy-antiban

Repository files navigation

Scrapy-Antiban

This spider middleware aims to avoid banned by the target websites. When the ban condition is triggered, this middleware stop the engine for a certain period, then re-check until the ban lift.

Initially, the engine stopped for 60 seconds, then it will increase by 50% if still banned.

Usage,

# in spider, re-yield same request with meta,
meta = {"pre_request_banned": True}

# in settings.py
SPIDER_MIDDLEWARES = {
    "scrapy_antiban.throttle.ThrottleMiddleware": 543,
}

About

A spider middleware to avoid ban while scraping.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages