Streamlit Cloud Scraper 🕸️

This Streamlit application is designed to automate the process of taking screenshots of web pages, extracting contact information, and downloading the results. The application utilizes Selenium WebDriver and offers proxy support to bypass geo-restrictions.

Features

Screenshot Capture: Automatically takes a screenshot of the specified web page.
Contact Information Extraction: Extracts emails and phone numbers from the page's content.
Text Content Extraction: Extracts all visible text from the web page.
Proxy Support: Optional proxy configuration to bypass geo-blocking, supporting SOCKS4 and SOCKS5 proxies.
Download Options: Allows users to download the screenshot and extracted text content.
Version Information: Displays version information for Python, Streamlit, Selenium, Chromedriver, and Chromium.
Logging: Captures and displays Selenium logs for debugging.

Requirements

Python 3.6+
Streamlit
Selenium
BeautifulSoup (beautifulsoup4)
Chromedriver (Make sure chromedriver is installed and accessible)

Installation

Clone the Repository

git clone https://github.com/your-repo/streamlit-cloud-scraper.git
cd streamlit-cloud-scraper

Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies
```
pip install -r requirements.txt
```
Run the Application
```
streamlit run streamlit_app.py
```

How to Use

1. Input URL

Enter the URL of the webpage you want to scrape in the provided text input field.

2. Proxy Configuration (Optional)

Enable Proxy: Toggle to enable proxy support.
Select Proxy Type: Choose between SOCKS4 and SOCKS5.
Refresh Proxy List: If proxies are enabled, click to refresh the list of available proxies.
Select Country: Choose the country for your proxy, if applicable.
Select Proxy: Choose a specific proxy from the available list.

3. Start the Scraping Process

Click the "Start Selenium run and take screenshot" button to start the scraping process. The application will:

Navigate to the specified URL using Selenium.
Take a screenshot of the webpage.
Extract contact information (emails and phone numbers).
Extract all visible text content from the webpage.

4. View and Download Results

Screenshot: View the screenshot of the webpage and download it as a PNG file.
Contact Information: View the extracted emails and phone numbers.
Text Content: View the extracted text content and download it as a TXT file.
Logs: View the Selenium logs to debug any issues.

Project Structure

streamlit-cloud-scraper/
├── logs/                 # Log files generated by Selenium
├── screenshots/          # Screenshots taken by Selenium
├── streamlit_app.py      # Main Streamlit application script
├── requirements.txt      # Python dependencies
└── README.md             # Documentation file

Troubleshooting

Chromedriver Issues: Ensure that chromedriver is installed and properly set up in your PATH. You can download it from here.
Proxy Errors: Make sure the proxy settings are correct and that the proxy is functional.
Permissions: Ensure the application has the necessary permissions to create directories and write files in the working directory.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
resources		resources
scratchpad		scratchpad
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Packages.md		Packages.md
README.md		README.md
Virtualenv.md		Virtualenv.md
packages.txt		packages.txt
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
streamlit_web_app.py		streamlit_web_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streamlit Cloud Scraper 🕸️

Features

Requirements

Installation

How to Use

1. Input URL

2. Proxy Configuration (Optional)

3. Start the Scraping Process

4. View and Download Results

Project Structure

Troubleshooting

About

Releases

Packages

Languages

License

Ralphdapythondev/Streamlit-Selenium

Folders and files

Latest commit

History

Repository files navigation

Streamlit Cloud Scraper 🕸️

Features

Requirements

Installation

How to Use

1. Input URL

2. Proxy Configuration (Optional)

3. Start the Scraping Process

4. View and Download Results

Project Structure

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages