Skip to content

viniciusmarson/docker-ecr-python-selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Docker ECR Python Selenium

This Docker image was built upon the ECR Lambda Python Image with the objective of have selenium and its dependencies installed for webscraping purpose.

When running this image the selenium location will be:

Binary: /opt/chrome/chrome

Executable: /opt/chromedriver

So when using selenium python lib your code should pass the binary_location and executable_path like this:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.binary_location = '/opt/chrome/chrome'
driver = webdriver.Chrome("/opt/chromedriver", options=options)

Dont forget to pass the arguments to limitate selenium acessos to chrome resources for a better performance and error prevention. See an example below:

import os
from tempfile import mkdtemp
from selenium import webdriver


def create_options():
    """Create chrome options based on environment variables"""
    options = webdriver.ChromeOptions()
    options.binary_location = '/opt/chrome/chrome'
    options.add_argument('--no-sandbox')  # Bypass OS security model
    options.add_argument("--disable-gpu") # applicable to windows os only
    options.add_argument('--headless')
    options.add_argument("--single-process")
    options.add_argument("disable-infobars")
    options.add_argument("start-maximized")
    options.add_argument("--disable-extensions")
    options.add_argument("--remote-debugging-port=9222")
    options.add_argument("--window-size=1280x1696")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument(f"--user-data-dir={mkdtemp()}")
    options.add_argument(f"--data-path={mkdtemp()}")
    options.add_argument(f"--disk-cache-dir={mkdtemp()}")
    options.add_argument("--disable-dev-shm-usage")  # overcome limited resource problems
    options.add_argument("--disable-dev-tools")
    options.add_argument("--no-zygote")
    return options


def scrape(url):
    """"Scrap data form page url"""
    driver = webdriver.Chrome("/opt/chromedriver", options=create_options())
    driver.get(url)
    return driver

To see a complete example of a webscraping project running in AWS Lambda with ECR check the repository aws-serverless-webscraping

About

Docker image to use Selenium with Python in AWS ECR

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published