Fulmar is a distributed crawler system. By using non-blocking network I/O, Fulmar can handle hundreds of open connections at the same time. You can extractthe data you need from websites. In a fast, simple way.
from fulmar.base_spider import BaseSpider
class Handler(BaseSpider):
def on_start(self):
self.crawl('http://www.baidu.com/', callback=self.parse_and_save)
def parse_and_save(self, response):
return {
"url": response.url,
"title": response.page_lxml.xpath('//title/text()')[0]}
You can save above code in a new file called baidu_spider.py and run command:
fulmar start_project baidu_spider.py
If you have installed redis, you will get:
Successfully start the project, project name: "baidu_spider".
Finally, start Fulmar:
fulmar all
Automatic installation:
pip install fulmar
Fulmar is listed in PyPI and
can be installed with pip
or easy_install
.
Fulmar source code is hosted on GitHub.
Please visit Fulmar Docs.