Scrapy MongoDB

MongoDB-based components for Scrapy that allows distributed crawling

Available Scrapy components

Scheduler
Duplication Filter

Installation

From github

To install it via pip,

# install
pip install git+https://github.com/taicaile/scrapy-mongodb
# reinstall
pip install --ignore-installed git+https://github.com/taicaile/scrapy-mongodb

or clone it first,

git clone https://github.com/taicaile/scrapy-mongodb.git
cd scrapy-mongodb
python setup.py install

To install specific version,

# replace the version `v0.1.0` as you expect,
pip install git+https://github.com/taicaile/scrapy-mongodb@v0.1.0

You can put the following in requirements.txt,

scrapy-mongodb@git+https://github.com/taicaile/scrapy-mongodb@v0.1.0

Usage

Enable the components in your settings.py:

# Enables scheduling storing requests queue in mongodb.
SCHEDULER = "scrapy_mongodb.scheduler.Scheduler"

# Specify the host and port to use when connecting to Mongodb (optional).
MONGODB_SERVER = 'localhost'
MONGODB_PORT = 27017
MONGODB_DB = "scrapy"

persist,

MONGODB_DUPEFILTER_PERSIST = False # by default
MONGODB_SCHEDULER_QUEUE_PERSIST = False # By default

Note this is not suitable for distribution currently.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
example-project		example-project
scrapy_mongodb		scrapy_mongodb
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy MongoDB

Available Scrapy components

Installation

Usage

About

Releases

Packages

Languages

taicaile/scrapy-mongodb

Folders and files

Latest commit

History

Repository files navigation

Scrapy MongoDB

Available Scrapy components

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages