A Distributed web crawler system. Support for templated spider development.
- Some features refer to Pyspider and Scrapy, including WebUI with script editor, task monitor, project manager and result viewer.
- Using IP nodes for distributed node management, using Celery as a distributed task queue.
- Support nodes task scheduling, including the configuration of single spider frequency and network requests and other parameters.
- Support the configuration of project priority, task retry-times.
- Install Python 2.7
$ brew install python
-
Install MongoDB & Redis
-
Clone Xspider Code
git clone https://github.com/zym1115718204/xspider.git
- Install Package
$ pip install -r requirements.txt
- Run
$ cd xspider/xspider
$ ./run all
- Visit: http://localhost:2017
- Nodes management
Licensed under the Apache License, Version 2.0