Nepali News crawler

Installation

In order to use this crawler, just install scrapy and clone this repository.


$ pip3 install scrapy
$ git clone https://github.com/vksbhandary/nepali-news-crawler.git
$ cd nepali-news-crawler
$ scrapy crawl news_hamrakura -o hamrakura.csv -t csv

Supported Sites

hamrakura
kantipurdaily
onlinekhabar
pahilopost
wordpress website ¹
- nepalitribune
- news24nepal
- nepalitimes
- You can use this for any wordpress website ²

Executing crawler

Executing hamrakura crawler


$ scrapy crawl news_hamrakura -o hamrakura.csv -t csv

Executing kantipurdaily crawler


$ scrapy crawl kanti_news -o kantipur.csv -t csv

Executing onlinekhabar crawler


$ scrapy crawl news_onlinekhabar -o onlinekhabar.csv -t csv

Executing pahilopost crawler


$ scrapy crawl news_pahilo -o file.csv -t csv

Executing wordpress crawler


$ scrapy crawl wordpress_news -o news24nepal.csv -t csv

¹ In order to use the wordpress website example you should follow steps:

Open file spiders/wordpress.py
Edit line 14 to add your domain
Open your terminal and execute $ scrapy crawl wordpress_news -o news24nepal.csv -t csv

² This crawler uses wordpress's RESTful API to fetch posts. Therefore a website should have enabled REST API for this crawler to work. In order to check if a wordpress website is supported by this crwaler

Go to yourdomainname.com/wp-json/wp/v2/posts/
If you see a bunch of Json data then its good to go
If you see 404 error page or forbidden error page then its not supported.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
spiders		spiders
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
items.py		items.py
list.json		list.json
middlewares.py		middlewares.py
pipelines.py		pipelines.py
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spiders

spiders

.gitignore

.gitignore

README.md

README.md

init.py

init.py

items.py

items.py

list.json

list.json

middlewares.py

middlewares.py

pipelines.py

pipelines.py

settings.py

settings.py

Repository files navigation

Nepali News crawler

Installation

Supported Sites

Executing crawler

About

Releases

Packages

Languages

vksbhandary/nepali-news-crawler

Folders and files

Latest commit

History

Repository files navigation

Nepali News crawler

Installation

Supported Sites

Executing crawler

About

Topics

Resources

Stars

Watchers

Forks

Languages