Skip to content

tahmim-16/BlogCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

BlogCrawler

Scraping Amrabondhu blog

Here I have scraped the blog contents from the website Amrabondhu using scrapy.

It includes:

  1. ID (just to number the contents)
  2. Title
  3. Author
  4. Text i.e the whole article
  5. URL of the page
  6. Published date of the article
  7. Accessed time

Also it will save all the html pages and will parse all the next pages.

Requirements:

  1. Pycharm IDE
  2. Python 3.10
  3. Scrapy 2.4.1

These are just my requirements to write and run the script.

Releases

No releases published

Packages

 
 
 

Languages