You can access the accompanying presentation here.
on Windows, open a new Command Prompt, and type python, and if you enter python shell >>>, you're good to go.
on Mac/Linux, open a new Terminal window, and type python or python3.
type pip -V in your current shell.
Generally speaking, it is a good idea to run scripts in their respective Virtual Environments. This way you can isolate your python packages to specific scripts/projects and avoid clashing versions or compatibility issues.
Create a folder for your project and navigate to it using your terminal.
Type virtualenv --python python3.7 venv, then type source venv/bin/activate to activate your virtual environment.
To install Selenium, type pip3 install selenium, to install Pandas, type pip3 install pandas. Note pip3 should be changed to pip on windows.
Go to this link, and download the appropriate driver for your current Chrome version.
You can find this by navigating to Chrome options (triple dot top right), and selecting Help -> About Google Chrome
Next, place this in an easily accessible directory like your current working dir (wink wink)
Test wether your Selenium installation was successfull by creating a new file named test.py.
Add the default import import selenium at the top, and put a simple print statement, like print("Damn, this worked").
Now you can execute this file by opening a new Command Prompt at your current working dir, and calling python ./test.py.
If your installation failed, it will give you an import error, otherwise you get the one-liner print statement in your console.
Selenium should now be installed just fine, so create a new file to work in.
To make your life easier, add a DRIVER_PATH variable for your webdriver, such as DRIVER_PATH = Service("D:\Selenium_LAB\chromedriver.exe").
Note: For Linux, this PATH looks more like DRIVER_PATH = Service("/home/users/your_username/Documents/Selenium_LAB/chromedriver")
The import for this is from selenium.webdriver.chrome.service import Service, if you use VSCode, you will likely get import suggestions for Selenium modules.
There are some known bugs with a few current chrome versions, just add options = webdriver.ChromeOptions() and options.add_experimental_option('excludeSwitches', ['enable-logging']).
Next, we need to instantiate our webdriver with the pre-defined options, such as driver = webdriver.Chrome(service=DRIVER_PATH, options=options).
From now on, most actions we call in the script will use this driver variable for context, such as driver.get() etc...
You can do this using the driver.get("https://www.link-to-site.com") method
In our example, we use NASDAQ's site to find out about stock values. A large part of web-scraping is getting to understand how the website we're scraping works, we need to visit the site and make some observations. Does this site have a "cookie wall"? Is the site behind CloudFlare? What actions we, as users, would have to take to get the information we need without automation?
For our site, we can see that there is a cookie wall with an accept button. We need to tell Selenium how to get past this.
Using the Developer Console, we can gather info about the element we need to interact with. The accept button has an ID of "onetrust-accept-btn-handler".
Select this button from the DOM, and click it with cookie_accept = driver.find_element(By.ID, "onetrust-accept-btn-handler") and cookie_accept.click().
Give your project a save, then open your Command Prompt at your current directory.
You can now run the scripts with:
on Windows: python ./name_of_file.py
on Mac/Linux: python3 name_of_file.py
Congrats! You just automated a button click on a website.