Skip to content

zaid-cdlg/WikiExplorerAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiExplorerAI

Overview

WikiWebCrawler is a Python-based web crawler specifically designed for extracting data from Wikipedia. It efficiently navigates through Wikipedia pages, starting from a user-defined entry point, to gather information like page titles, introductory sections, and 'See Also' links.

Features

  • Targeted Crawling: Starts from a specific Wikipedia article and explores related pages.
  • Data Extraction: Retrieves titles, introductions, and 'See Also' links.
  • Error Handling: Robust error management for stable performance.
  • User-Friendly Output: Clean and readable data presentation.

Usage

  1. Clone the repository: git clone https://github.com/[YourUsername]/WikiWebCrawler.git
  2. Install necessary dependencies (if any).
  3. Run the crawler: python Wikipedia_Crawler_v3.py [Your Starting URL]

Requirements

  • Python 3.x
  • Additional modules: urllib, re, time (usually included in standard Python distribution)

Contributing

Feel free to fork the project and submit pull requests with enhancements.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • Inspired by the vast information available on Wikipedia and the potential for automated data extraction.
  • Thanks to contributors and the open-source community for continuous support and inspiration.

About

An efficient Python web crawler for Wikipedia. Extracts titles, intros, and links from specified pages. Ideal for data mining, research, and AI content aggregation. Features include robust error handling and user-friendly output. A versatile tool for web scraping enthusiasts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors