Skip to content

Using RQ (Redis Queue) to crawl links and titles

Notifications You must be signed in to change notification settings

thoqbk/rq_crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RQ Crawl

Using RQ (Redis Queue) to crawl links and titles

Dependencies

  • rq: pip install rq
  • lxml: pip install lxml
  • cssselect: pip install cssselect

Get started

  • Step 1: Using schema.sql to initialize your database
  • Step 2: Update DB config in services.py and root_url in bootstrap.py
  • Step 3: Run bootstrap.py to initialize crawling job: python bootstrap.py
  • Step 4: Start one or more workers by running rq worker in rq_crawl directory
  • Step 5: Run python count.py to view crawling speed

About

Using RQ (Redis Queue) to crawl links and titles

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages