Mechanical Scraper - Web Scraping Easily
Make simple repetitive tasks convenient when developing a web scraping program.
Reduce your development time using this library based on requests and bs4.BeautifulSoup.
Just copy and paste the http message obtained through tools such as Fiddler to automatically generate a request code.
Depending on the Content-Type, the appropriate code is automatically applied.
Update the session automatically.
A browser will automatically pop up about the http response. Then, you can easily copy css selectors via developer's tool.
Find out hidden data automatically.
from mechanical_scraper.mechanical_scraper import MechanicalScraper
ms = MechanicalScraper()
ms.set_base_url('https://finance.naver.com/')
url = f'{ms.base_url}'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 Edg/111.0.1661.54'
}
# If a value of True is entered in the second argument, a browser will automatically pop up about the http response. Then, you can easily copy css selectors via developer's tool.
response = ms.get(url, True, headers=headers)
response.raise_for_status()
bs = BeautifulSoup(response.text, 'html.parser')
elements = bs.select('#container > div.aside > div > div.aside_area.aside_popular > table > tbody > tr')
for el in elements:
title = el.select_one('th > a').text.strip()
price = el.select_one('td:nth-child(2)').text.replace(',', '').strip()
link = ms.get_full_url(el.select_one('th > a')['href'])
print(title, price, link)
print()
pip install git+https://github.com/wsder31/mechanical_scraper.git
pip install --force-reinstall git+https://github.com/wsder31/mechanical_scraper.git