# Web Bot Basics

### Description:
- This code creates the framework for a web-bot that allows the input of a Linkedin article. The bot then navigates to that article, scrapes the relevant information and then submits a comment on that article. 

### Warning!
-  This code is not a simple plug-and-play. A web bot has certain requirement, once we complete these requirements, we can move on to running the code.
-  Im assuming that you already have installed python, here we are using python version 3.9.13

### Steps:
1. Install selenium (I prefer version 3.141.0) using pip
    - pip3 install selenium==3.141.0
2. You will need the gecko driver which you can get here:
    - https://github.com/mozilla/geckodriver/releases/download/v0.28.0/geckodriver-v0.28.0-win64.zip
3. Install Firefox
    - https://www.mozilla.org/en-US/firefox/new
4. Create a new directory
    - place the gecko driver that you downloaded in that directory
5. You will then need to create a custom firefox profile to be used by your web-bot
    - This step is vital, if you are not using the same profile each time you run your script, it is like you are running in incognito mode each time. A consistent profile allows you to get past the bot-checks and dual authentication. 
    - Press (windows-key) + R, open 'firefox.exe -p' and follow steps.. place created profile in the directory you made in step 4.
    - https://support.mozilla.org/en-US/kb/profile-manager-create-remove-switch-firefox-profiles
    - You should then see a folder titled 'tsenysae.default-release' in the directory you created in step 4.

If you followed those steps, you should be ready to go (minus the AI part, that requires an API key and adding funds to your account)
    

First we are going to import the required packages

In [1]:
import os, time
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import random as rd
from dotenv import dotenv_values
import openai

This section of code is the one where we may need to make some changes if you are doing this yourself.

Here is where we can also change the article link!

In [2]:
# --- set working directory (should be the directory you created in step 4)
os.chdir('C:/Users/mwhittlesey/Desktop/SELENIUM_TEST')

# --- select the article you want to comment on.
article_url = 'https://www.linkedin.com/pulse/data-munging-tips-sql-reversing-listagg-michael-whittlesey-vszac'
# article_url = 'https://www.linkedin.com/pulse/books-keep-you-warm-holiday-season-bill-gates-cmawc'

# --- how do you want your comment to sound
comment_tone = 'positive'

# --- load dictionary with credentials
cred_dict = dotenv_values('.env')

# --- or set credentials manually
# cred_dict = {
#     'li_email': 'LinkedInEmail',
#     'li_password': 'LinkdinPassword',
#     'GPT_KEY': 'your open ai api key'
# }

Next we need to setup the web driver. This is where we reference the mozilla profile we created and placed in the directory. 

In [3]:
# --- load mozilla profile
moz_prof = './tsenysae.default-release'
fp = webdriver.FirefoxProfile(moz_prof)
fp.set_preference("network.cookie.cookieBehavior", 0)
opts = Options()

# --- set driver
driver = webdriver.Firefox(options=opts, firefox_profile=fp, executable_path='./geckodriver.exe')

And now we are ready to start the driver. 
NOTE:
 - The first time you try to login, you will likely get bot-checked. So, just manually go through the steps of logging in and dual authentication. Luckily you only have to do this once at the very begining to setup your mozilla profile. 
 
A good rule for web-bots is to nest specific tasks (such as logging in) into a function. Then you can place that task in a 'try/except' statement. That way, if you don't need to login - because linkedin remembers you, the script doesnt immediately fail. 

In [4]:
# --- go to reports home page - its going to default to the login page
driver.get("https://www.linkedin.com/feed/")

# --- let the bot rest
time.sleep(rd.sample(list(range(1,5)), 1)[0])

# --- function logs into linkedin using email and password
def login_to_li(DriverInput):
    # -- attempt to input email
    try:
        DriverInput.find_element_by_id("username").send_keys(cred_dict['li_email'])
    except:
        print('no email needed')
    # -- attempt to input email
    try:
        DriverInput.find_element_by_id("password").send_keys(cred_dict['li_password'])
    except:
        print('no password needed')
    # -- get login button
    for i in DriverInput.find_elements_by_tag_name("button"):
        if i.text == 'Sign in':
            sign_in_element = i
    # -- try to click login button
    try:
        sign_in_element.click()
    except:
        print('no login button')

# --- attempt login
try:
    login_to_li(driver)
except:
    print('couldnt login')

# --- let the bot rest
time.sleep(rd.sample(list(range(1,5)), 1)[0])

no email needed
no password needed
no login button


Next, we are going to navigate to the linkedin article we defined at the beginning and extract the paragraph text from the page.

In [5]:
# --- navigate to article
driver.get(article_url)

# --- let the bot rest
time.sleep(rd.sample(list(range(3,8)), 1)[0])

# --- get article content
def get_article_content(DriverInput):
    # --- find all possible elements
    possible_elements = DriverInput.find_elements_by_tag_name('p')
    paragraph_elements = []
    # --- loop through and locate based on class
    for i in possible_elements:
        if i.get_attribute('class') == 'ember-view reader-text-block__paragraph':
            paragraph_elements.append(i)          
    # --- join text 
    joined_text = '\n'.join([i.text for i in paragraph_elements])
    # --- return the text
    print(joined_text)
    return joined_text

# --- grab the content
try:
    article_content = get_article_content(driver)
except:
    print('couldnt get article content')

We've all aggregated data into a list using 'listagg'. But, have you ever needed to reverse that? I recently learned how in Redshift, and I'm sure I'm not the only one who's come across this problem. Here's how:

Split the list into an array
Place both the original table and the array in the 'from' statement.
Select the array and convert the values from 'super' to 'varying character' (or whatever is appropriate)

Now let's try something harder. I'm separating everything into individual steps for clarity purposes. Obviously, we don't need this many CTEs (common table expressions) to get the job done.
Here, we are taking something we often see in web traffic. A user's entry URL. In this case, it's a link I clicked through on a 'fanatics' email ad. What we want to do is parse out all of the UTM parameters in one clean sweep. We can do this by taking the URL and splitting it into an array.
As you can see we went from a mess:

To something much more useable:

Hope you found this useful, tha

Now we are going to find that text box for commenting. This took some time. There are times when the element you need does not have a specific 'id'. When that happens, you can do what I did here:
 - Get the html tag of the element we are looking for, in this case 'div'
 - Loop through all 'div' elements until you find the one with the specific class that you are looking for.

In [6]:
# --- let the bot rest
time.sleep(rd.sample(list(range(3,8)), 1)[0])

# --- function grabs comment element
def get_comment_element(DriverInput):
    # --- find comment box - took awhile
    possible_elements = DriverInput.find_elements_by_tag_name('div')
    # --- loop through and locate based on class
    for i in possible_elements:
        if i.get_attribute('class') == 'ql-editor ql-blank':
            comment_element = i
    # --- return comment element
    return comment_element

# --- now we have somewhere to write a comment
try:
    com_elm = get_comment_element(driver)
except:
    print('couldnt find comment element')

Now we are going to ask GPT to write a comment within the context of the article with the tone we specified earlier. We are then going to extract the content of that response.

In [7]:
# -- define api key
openai.api_key = cred_dict['GPT_KEY']
 
 # -- api response product
response_content = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {
            "role": "system",
            "content": f"""
            You will be given an article
            Return a {comment_tone} response about the article
            """
        },
        {
            "role": "user",
            "content": article_content
        }
    ],
    temperature=0.5,
    max_tokens=256,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

# -- get content product
comment_response = response_content['choices'][0]['message']['content']
print(comment_response)

That's a really insightful and informative article! The step-by-step breakdown provided for reversing the 'listagg' operation in Redshift is very helpful for anyone who may encounter a similar challenge. The example of parsing out UTM parameters from a URL is a practical use case that many people working with web traffic data can relate to. The transformation from a messy URL to a clean and usable format is impressive. Thanks for sharing this valuable information!


And finally, we are going to send that response to the text box.

In [8]:
# --- place comment in text box
com_elm.send_keys(comment_response)

We should see the text in the comment section. Now we just need to find that submit button.

In [9]:
# --- let the bot rest
time.sleep(rd.sample(list(range(2,4)), 1)[0])

# --- function grabs comment element
def get_comment_submit_element(DriverInput):
    # --- find comment box - took awhile
    possible_elements = DriverInput.find_elements_by_tag_name('button')
    # --- loop through and locate based on class
    for i in possible_elements:
        if i.get_attribute('class') == 'comments-comment-box__submit-button--cr artdeco-button artdeco-button--1 artdeco-button--primary ember-view':
            comment_element = i
    # --- return comment element
    return comment_element

# --- get comment submit button
try:
    comment_submit_button = get_comment_submit_element(driver)
except:
    print('couldnt find submit button')

Yay we found the button, now lets click it and we are home-free.

In [10]:
# --- click button
comment_submit_button.click()

Finally, lets let the bot rest - make sure the button was clicked and close out our driver.

In [11]:
# --- let the bot rest
time.sleep(rd.sample(list(range(2,4)), 1)[0])
driver.close()

### Thanks for reading!