Using RQ (Redis Queue) to crawl links and titles
- rq:
pip install rq
- lxml:
pip install lxml
- cssselect:
pip install cssselect
- Step 1: Using
schema.sql
to initialize your database - Step 2: Update DB config in
services.py
androot_url
inbootstrap.py
- Step 3: Run
bootstrap.py
to initialize crawling job:python bootstrap.py
- Step 4: Start one or more workers by running
rq worker
inrq_crawl
directory - Step 5: Run
python count.py
to view crawling speed