This project is consisted of two crawlers built with different python webscraping libraries ( beautifulsoup4, scrapy and selenium ) which extract data from gsmarena and and its Bangladeshi variant website gsmarena-bd and store the data into a MongoDB Database.
Website | Crawler |
---|---|
gsmarena | gsmareana-selenium |
gsmarena-bd | gsmareanabd-beautifulsoup4 gsmareanabd-scrapy |
python , MongoDB database
Python - 3.6.8 (64 bit)
MongoDB - 4.4.8
- Download source code
- Clone the repository
git clone https://github.com/tanjimanasreen/gsmarena-crawler.git
This comes with an end to end pipeline that scrapes all the phones' specifications available on gsmarena.com.bd and stores it into a MongoDB database.
Open the Scrapy-project folder and run it using scrapy crawl command. Set the Database configuration variables on the scrapy settings.py
file.
Built With:
- Scrapy Framework - 2.5.0
- Pymongo - 3.12.0
This parser can parse all the phones' specifications available on gsmarena.com.bd using python's beautifulsoup4 package and stores it into a json file.
Download and run the notebook available here in your local pc using jupyter notebook or on google colab.
Built With:
- BeautifulSoup4 - 4.6.3
This uses Selenium package for python to scrape all the phones' specifications available on gsmarena.com and stores it into a MongoDB database.
Open the gsmarena-com-crawler folder and run the gsmarena_parser.py
file on your pc. The environment variables are provided in .env.example
file. Set the Database configuration variables.
Built With: