# Growing Your LinkedIn Network from 500 to 5000 with Python
If you are reading this, you probably recieved a LinkedIn request from me within the last few days. The reason for this was because I started a the [Houston Data Science meetup](http://www.meetup.com/Houston-Data-Science/) which will focus on teaching the theory and practice of data science and I wanted to attract new members.

One of the easiest ways of increasing membership in my meetup was to head to LinkedIn and start inviting anyone with any 'data science' terms in their profile (which is quite a bit these days). After a few hundred manual key invites, I felt like smashing something to relieve the tension built up from the repetitive motion. 

Instead of resorting to breaking objects, I decided to let python rip apart LinkedIn for me. I recently did some automated web scraping with a wonderful python package, selenium. Selenium is an easy to use (like nearly every python package) and powerful tool that allows you to automate browser activity.

In the following example, I will show you how to...
1. log into LinkedIn
1. Enter text into input boxes
1. Click buttons
1. Select Options
1. Handle edge cases

All with selenium

# Sneak peek at end results
It worked
![linkedin](linkedin.png)

# Getting Started
If you are using python, you should have pip installed and a simple **pip install selenium** should install the package for you. Selenium works with all the popular browsers but with Chrome (which is used in this workbook), you must install [ChromeDriver found here](https://sites.google.com/a/chromium.org/chromedriver/downloads) for it to work.

In [2]:
'''
import just a few functions from selenium to get started
the time module will be used to pause the execution of the program occasionally when web pages are loading
'''
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import time

In [3]:
'''
Begin Chrome and browse to LinkedIn. You won't be autologged into linkedin as selenium starts 
an instance of Chrome without knowledge of user data
'''
# The ChromeDriver is a necessity for selenium. CHANGE THIS LOCATION
driver_location = "/Users/Ted/Documents/chromedriver/chromedriver"

# The variable driver controls all actions
driver = webdriver.Chrome(executable_path=driver_location)

# A get request to navigate to LinkedIn
driver.get("http://www.linkedin.com")

# Inspecting the page
The above commands should open a new instance of Chrome and navigate to LinkedIn. At the top of LinkedIn's home page resides the Email and password text boxes. Right click in the Email textbox and select inspect. You should see something very similar to this image

![this](inspect_email.png)

We are interested in the highlighted html input element and specifically, its id which uniquely identifies it. The code below Enters in the user name and password into the textboxes and clicks enter

In [None]:
# Find the specific element with given id
inputElement = driver.find_element_by_id("login-email")

# Enter in your user name
inputElement.send_keys('your-email')

# Go to password textbox and enter in the password
inputElement = driver.find_element_by_id("login-password")
inputElement.send_keys('your-password')

# Press Enter
inputElement.send_keys(Keys.ENTER)

# More Inspecting
This project basically just boils down to inspecting the web page and then finding the correct selenium command to make the driver take the right action

In [None]:
# Click advanced search
inputElement = driver.find_element_by_id("advanced-search").click()

# Find out where the keywords textbox is and enter in your query
inputElement = driver.find_element_by_id("advs-keywords")
inputElement.send_keys('"data science"')

# Fill in any fields you like
Explore all the textboxes, dropdowns and checkboxes to customize your search

In [None]:
# Use the Select function imported from the top of the page
dropdown = Select(driver.find_element_by_id("advs-locationType"))

# Here I search by specific area
dropdown.select_by_value("I")

# Since I am from Houston, I choose a Houston zip code and all people within 75 miles from it
driver.find_element_by_id("advs-postalCode").send_keys('75201')
Select(driver.find_element_by_id("advs-distance")).select_by_value("75")

# Click to search
driver.find_element_by_class_name("submit-advs").click()

# Sleeping...
When running all the code at once, as is put together in one notebook cell at the bottom of this page, it will be important to make sure you pause the execution of the program with the time function inside the sleep module. Pausing execution is also a standard tactic when scraping on the web as many sites can easily detect non-human interaction and bump you off their site.

# Second degree connections
LinkedIn only allows you to freely connect with people whom you have at least one connection with. You can purchase premium plans to be able to connect with 3rd degree

# LinkedIn Premium
I purchased a premium plan (usually the first month is free) so that I could have unlimited searches. Without it, you will be limited in your searches and you might not be able to make full use of this example.

In [None]:
# Here we sleep and then search for only second degre connections - those that L
time.sleep(2)
driver.find_element_by_id("S-N-ffs").click()
time.sleep(2)

# Where the magic happens
The below code will loop through all the pages that your search yields and click connect for each person. About 90% of the time 'invite sent' will be the result of the button click. Occasionally though, you will be directed to another page to enter in more information about the person

# Edge Cases
There are two cases that occur infrequently. 
1. LinkedIn will ask you for the person's email address
2. LinkedIn will ask you how you know that person

For case 1, you won't know their email, so we will just click back and resume clicking the buttons. We have to 'reacquire' the buttons again (you will see this in the code below)

For case 2, we can connect with the person but we must select the radio button for 'friend' and click enter. This will actually navigate us to another page, and so now we need to go back twice to get back to the search page

# Looping to the next page and exiting the program
By inspecting the page again, we can see that the 'Next' button is the last of the 'page-link' class buttons. If the text of this button is not 'Next' we know that we will be on the last page and we can end the while loop. Otherwise we navigate to the next page

In [None]:
# Keep looping through the search results until no 'Next' button
while True:
    # get the current url to test whether the invite button turned to 'invite sent' or navigated to a new page
    current_url = driver.current_url
    
    # notice the 's' here to grab all 10 buttons
    invite_buttons = driver.find_elements_by_class_name("primary-action-button")
    
    # loop through all buttons. Dont use 'for button in invite_buttons' becuase of possibility of
    # navigating to another page which will lose all button information
    for i in range(len(invite_buttons)):
        
        # click 'invite'
        invite_buttons[i].click()
        time.sleep(.5)
        
        # in the edge case that we have navigated away from the search results
        if current_url != driver.current_url:
            
            # if LinkedIn requires an email we simply navigate back
            # otherwise we choose friend and navigate back twice
            if 'Why do I have to enter an email address?' in driver.page_source:
                print('Connection not possible. Need email')
                print(driver.current_url)
            else:
                print('use friend')
                print(driver.current_url)
                driver.find_element_by_id("IF-reason-iweReconnect").click()
                driver.find_element_by_id("send-invite-button").click()
                time.sleep(5)
                driver.back()
            
            # go back to search page
            time.sleep(5)   
            driver.back()
            time.sleep(5)
            
            # reset the invite_buttons b/c they were lost when navigating away
            invite_buttons = driver.find_elements_by_class_name("primary-action-button")
    
    # Find the link to navigate to the next page
    page_links = driver.find_elements_by_class_name("page-link")
    
    # wait for all the clicking to be over
    time.sleep(5)
    
    # The last link here is the 'Next' button
    last_link = page_links[-1]
    
    # end loop if no more pages
    if last_link.text[:4] != 'Next':
        print('No more pages :()')
        break
        
    # navigate to next page and ensure that it has finished loading
    last_link.click()
    time.sleep(5)

# All together with no comments

In [21]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
import time

driver_location = "/Users/Ted/Documents/chromedriver/chromedriver"
driver = webdriver.Chrome(executable_path=driver_location)
driver.get("http://www.linkedin.com")

inputElement = driver.find_element_by_id("login-email")
inputElement.send_keys('your-email')

inputElement = driver.find_element_by_id("login-password")
inputElement.send_keys('your-password')

inputElement.send_keys(Keys.ENTER)
inputElement = driver.find_element_by_id("advanced-search").click()

inputElement = driver.find_element_by_id("advs-keywords")
inputElement.send_keys('"machine learning"')

dropdown = Select(driver.find_element_by_id("advs-locationType"))
dropdown.select_by_value("I")
driver.find_element_by_id("advs-postalCode").send_keys('77005')
Select(driver.find_element_by_id("advs-distance")).select_by_value("75")

driver.find_element_by_class_name("submit-advs").click()

time.sleep(2)
driver.find_element_by_id("S-N-ffs").click()
time.sleep(5)

while True:
    current_url = driver.current_url
    invite_buttons = driver.find_elements_by_class_name("primary-action-button")
    
    for i in range(len(invite_buttons)):
        invite_buttons[i].click()
        time.sleep(.5)
        if current_url != driver.current_url:
            if 'Why do I have to enter an email address?' in driver.page_source:
                print('Connection not possible. Need email')
                print(driver.current_url)
            else:
                print('use friend')
                print(driver.current_url)
                driver.find_element_by_id("IF-reason-iweReconnect").click()
                driver.find_element_by_id("send-invite-button").click()
                time.sleep(5)
                driver.back()

            time.sleep(5)   
            driver.back()
            time.sleep(5)
            invite_buttons = driver.find_elements_by_class_name("primary-action-button")
    
    page_links = driver.find_elements_by_class_name("page-link")
    time.sleep(5)
    
    last_link = page_links[-1]

    if last_link.text[:4] != 'Next':
        print('No more pages :()')
        break
        
    last_link.click()
    time.sleep(5)

# Welcome New Connections!
If you liked this notebook, please join my meetup http://www.meetup.com/Houston-Data-Science/. We will have lots of free classes and helpful resources

# But beware!
You are limited to 5000 connections so don't waste them!