sc_spider

A SongCi spider project. (Chinese: 宋词爬虫)

Overview

An efficient spider based on scrapy to crawl SongCi from web.

Results can be saved into multiple formats of files, as well as into MongoDB collections.

Requirements

Python 3.5+
(Optional) Docker && Docker-compose
(Optional) Mongodb

Usage

You may choose one of the following methods to run this project.

Using docker is the recommended way, as you don't need to bother installing and configuring MongoDB and other stuff.

Docker

Install Docker && Docker-compose.
(Optional) Run docker-compose build in case of environment updates.
Run sc_spider:

docker-compose up

Command Line (scrapy)

Install MongoDB
Install project requirements: pip install -r requirements.txt
Run sc_spider:

cd sc_scrapy
scrapy crawl gushiwen -s MONGO_URI=localhost:27017

Command Line (python)

Install MongoDB
Install project requirements: pip install -r requirements.txt
Edit your hosts file, adding: 127.0.0.1 mongo Or
Modify MONGO_URI settings in sc_scrapy/settings.py.
Run sc_spider:

cd sc_scrapy
python execute.py

Features

Fast and flexible
Able to pause and resume crawls, as Requests are serializable.
Multiple output formats (thanks to scrapy) with UTF8 literals support

Releases

You can download the latest stable releases from: https://github.com/wings27/sc_spider/releases

Contributing

All contributions are welcomed: you can add new spiders, create enhancements patches or resolve issues.

However, please follow these conventions:

Your coding style should follow PEP 8
Spiders should only crawl for SongCi related contents
Spiders should obey robots.txt rules

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
sc_scrapy		sc_scrapy
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sc_spider

Overview

Requirements

Usage

Docker

Command Line (scrapy)

Command Line (python)

Features

Releases

Contributing

About

Releases 1

Packages

Contributors 3

Languages

License

wings27/sc_spider

Folders and files

Latest commit

History

Repository files navigation

sc_spider

Overview

Requirements

Usage

Docker

Command Line (scrapy)

Command Line (python)

Features

Releases

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages