A news crawler for Life-Long Learning LM
Media Name | Website | Spider Name |
---|---|---|
自由時報 | news.ltn.com.tw | ltn |
中央社 | www.cna.com.tw | cna |
中國時報 | www.chinatimes.com | ct |
三立新聞 | www.setn.com | setn |
華視新聞 | news.cts.com.tw | cts |
scrapy crawl <spider_name>
- if you don't want to save data to database, you can delete NewsCrawlerPGStoragePipeline in setting.py
- you can change postgresql setting use environment variables, see more info in pipelines.py