Spiders

serveral spiders using requests, BeautifulSoup or scrapy, and so on.

Data crawled be stored in MongoDB or MySQL. Spider kongjie downloads pictures of all users in kongjie.com.

Spider haofl:

It crawls haofl.net using scrapy, extends CrawlSpider but in Spider style.

Spider kongjie:

A spider using requests and BeautifulSoup to crawl kongjie.com. It is concise enough because of requests and bs4. Redis hash is used to de-duplicate person.

Blog is here: Python网络爬虫requests、bs4爬取空姐网图片

Spider qiubai:

This spider crawls qiushibaike.com using scrapy. It extends CrawlSpider but in Spider style. Style such as Rule, LinkExtractor in CrawlSpider will be used soon. Data crawled is stored into MongoDB.

Blog is here: Python爬虫框架Scrapy之爬取糗事百科大量段子数据

Spider onesixnine:

A spider using scrapy which can crawl all images in 169ee.com. It use CrawlSpider in scrapy to crawl the full site. Rule and LinkExtractor are used to extract links to follow. Images will be saved in the disk.

Blog is here: 爬虫进阶：CrawlSpider爬取169ee全站美女图片

Spider flhhkkSpider:

A spider using scrapy and selenium to crawl all candidate baidu wangpan download links, then preserves them in MySQL. It use CrawlSpider in scrapy to crawl the full site.

Running

Data

Blog is here: TODO

You can give me a star if they help you.

Cite it when you use it to write any blog or post.

Copyright

ychen@fdu

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
flhhkkSpider		flhhkkSpider
haofl		haofl
images		images
ipproxy		ipproxy
kongjie		kongjie
onesixnine		onesixnine
qiubai		qiubai
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

flhhkkSpider

flhhkkSpider

haofl

haofl

images

images

ipproxy

ipproxy

kongjie

kongjie

onesixnine

onesixnine

qiubai

qiubai

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Spiders

Spider haofl:

Spider kongjie:

Spider qiubai:

Spider onesixnine:

Spider flhhkkSpider:

Running

Data

Copyright

About

Releases

Packages

Languages

License

ychenracing/Spiders

Folders and files

Latest commit

History

Repository files navigation

Spiders

Spider haofl:

Spider kongjie:

Spider qiubai:

Spider onesixnine:

Spider flhhkkSpider:

Running

Data

Copyright

About

Resources

License

Stars

Watchers

Forks

Languages