Document Search Engine Tool
-
Updated
Dec 8, 2022 - Python
Document Search Engine Tool
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
a crawler for Wikipedia (for now only the English pages)
python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
Add a description, image, and links to the wikipedia-crawler topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-crawler topic, visit your repo's landing page and select "manage topics."