Web crawling tool for NGO

This project is a web crawling tool specifically designed for Non-Governmental Organizations (NGO). It aims to automate the process of gathering relevant data from various online sources and scheduling data extractions. This ensures that users are kept up-to-date with the newest opportunities.

Features

Efficient Web Crawling: The tool uses Scrapy, a fast and powerful scraping and web crawling framework. It is capable of crawling multiple websites concurrently, ensuring efficient data collection.

Data Extraction Speed: The data extraction process is optimized for speed, and RedisStack is used to manage the workload.
Browser Automation: With Playwright, the tool can automate browser tasks, which is particularly useful for crawling dynamic websites.
Tailored Offers: The tool uses OpenAI to analyze user descriptions and tailor offers accordingly, providing a personalized experience for the users.
User Interface: A user-friendly GUI is provided using Tkinter, making the tool easy to use even for non-technical users.
Proxies and User Agents: Easy proxies and user agents integration to prevent IP blocking and ensure uninterrupted web crawling.

Future Scope

Future enhancements to this tool may include:

Ability to Crawl More Complex and Dynamic Websites: To expand the scope of data collection.
Scheduling Extractions: Implementing cron jobs or using Scrapyd to schedule data extraction tasks.
Email Notifications: Sending the newest notifications via email to keep users updated with the latest information.
Improved Data Extraction Accuracy: To ensure the quality of the data collected.
Advanced Data Analysis Features: To provide more in-depth insights from the collected data.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Azure-ttk-theme		Azure-ttk-theme
Crucial_Data_Files		Crucial_Data_Files
NGO		NGO
NGO_ArticlesLinks/RedisDB_Filters/scaling-python-scrapy-redis		NGO_ArticlesLinks/RedisDB_Filters/scaling-python-scrapy-redis
NGO_Final/NGO_ArticlesLinks/RedisDB_Filters		NGO_Final/NGO_ArticlesLinks/RedisDB_Filters
NGO_Pages		NGO_Pages
RedisDB_Filters/scaling-python-scrapy-redis		RedisDB_Filters/scaling-python-scrapy-redis
User_Link_Files.py		User_Link_Files.py
.gitignore		.gitignore
API_Request.py		API_Request.py
Avatar.png		Avatar.png
LICENSE		LICENSE
Master_version.py		Master_version.py
MultpleTerminalsExec.py		MultpleTerminalsExec.py
README.md		README.md
Tokenization.py		Tokenization.py
Users_current_link.txt		Users_current_link.txt

License

tobi303x/Web-crawling-tool-for-NGO

Folders and files

Latest commit

History

Repository files navigation

Web crawling tool for NGO

Features

Future Scope

About

Topics

Resources

License

Stars

Watchers

Forks

Languages