Skip to content

A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.

License

Notifications You must be signed in to change notification settings

wickenico/py-html-to-csv-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Py-Html-to-csv-converter

This repository contains a Python script for web scraping using Selenium an convert HTML code to csv. The script is designed to be adaptable for various scraping tasks on websites with dynamic content.

Script Structure

  • scraper.py: The main Python script containing the web scraping functionality.
  • main.py: Program to call and pass the parameters. Call with:
python3 main.py
  • requirements.txt: List of Python libraries required for the script.

Getting Started

Prerequisites

  • Python installed
  • Selenium library installed (pip install selenium)
  • Webdriver (e.g., ChromeDriver) installed and its path set in the script

Install

  1. Clone this repository:
    git clone https://github.com/wickenico/py-html-to-csv-converter.git
    
  2. Install the requirements:
    pip install -r requirements.txt
    
  3. Download the ChromeDriver that matches your Chrome version and put it in your PATH.

Usage

Script setup

Open scraper.py in your preferred text editor and update the following:

  • Web Driver: Set the path to your preferred web driver (e.g., ChromeDriver) in the script.
  • CSS Selectors: Customize the CSS selectors in the script to match the structure of the target website. Adjust the selectors used for button clicks, content extraction, and link identification.
  • Output Filename: Optionally, change the output filename in the navigate_and_go_back function if needed.

Output

  • The scraped data will be stored in a CSV file named output.csv. Open this file using a spreadsheet application like Excel or Google Sheets for further analysis.
  • If you encounter any issues or have suggestions for improvement, please create a Pull Request. Your feedback is valuable!

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.

LICENSE

License: MIT

About

A Python tool for effortlessly converting HTML source code to CSV format with Selenium Webdriver.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages