WikiWebCrawler is a Python-based web crawler specifically designed for extracting data from Wikipedia. It efficiently navigates through Wikipedia pages, starting from a user-defined entry point, to gather information like page titles, introductory sections, and 'See Also' links.
- Targeted Crawling: Starts from a specific Wikipedia article and explores related pages.
- Data Extraction: Retrieves titles, introductions, and 'See Also' links.
- Error Handling: Robust error management for stable performance.
- User-Friendly Output: Clean and readable data presentation.
- Clone the repository:
git clone https://github.com/[YourUsername]/WikiWebCrawler.git - Install necessary dependencies (if any).
- Run the crawler:
python Wikipedia_Crawler_v3.py [Your Starting URL]
- Python 3.x
- Additional modules:
urllib,re,time(usually included in standard Python distribution)
Feel free to fork the project and submit pull requests with enhancements.
This project is licensed under the MIT License - see the LICENSE.md file for details.
- Inspired by the vast information available on Wikipedia and the potential for automated data extraction.
- Thanks to contributors and the open-source community for continuous support and inspiration.