This project is a powerful, stealth-enabled web scraper built using Python + Selenium to automate the process of extracting repository details from GitHub collections.
- 🔍 Automated Web Scraping – Extracts data like repo names, stars, forks, and languages.
- 🧠 Stealth Mode Enabled – Avoids bot detection using ChromeDriver stealth configuration.
- 🌐 Handles JavaScript-Heavy Pages – Scrapes dynamic content by rendering the full page.
- 📊 Data Analytics – Visualizes data with charts and graphs via Streamlit.
- 💾 Export Options – Save scraped results directly as CSV.
- 📁 Download-Free Setup – Uses
webdriver-manager
, so no need to manually install ChromeDriver.
- 📈 Market & Trend Analysis
- 🧑💻 GitHub-based Research & Repo Discovery
- 🏢 Competitor & Project Intelligence
- 🤖 Dataset creation for AI/ML Models
git clone https://github.com/yokodrea/scraper-project.git cd scraper-project
pip install -r requirements.txt
✅ No need to download ChromeDriver manually — it's handled by webdriver-manager.
streamlit run scrape.py You can customize the GitHub collection URL inside scrape.py.
The scraped data will be saved as:
project_list.csv
Language: Python
Core Libraries: selenium (for automation), pandas (for data handling), streamlit (for UI) ,webdriver-manager (auto-handles ChromeDriver)
Scraping Mode : Headless browser via Chrome
Export Format: CSV
⏱️ Multi-threading for speed boost
📬 Real-time alerts on repo updates
🕒 Scheduler for auto-scraping (cron jobs)
🌐 Deploy as a hosted scraping service
📌 To be filled in once final dependencies are set. 👉 See requirements.txt for more info.
This project is licensed under the MIT License.