# In-Class Coding Lab: Scrape NASDAQ Stock Quotes 

For this lab we will walk you through scraping stock data from the NASDAQ website: http://www.nasdaq.com/ 

We will walk you through the process and when you're done you'll be tasked with creating a program which when given an NASDAQ symbol will retrieve the name of the stock, price, and percent change.

While we work through the example, we will use Amazon.com's stock symbol:`amzn` 

For a list of NASDAQ stocks to try with the completed program, see here: http://www.cnbc.com/nasdaq-100/ 


## Our plan

Here's our plan for a given stock symbol, for example `amzn` :

1. use Requests to get HTML from this page http://www.nasdaq.com/symbol/amzn
2. use BeautifulSoup4 to extract data from the site. save the data we extract into a dict
3. print the stock info from the dict

We will write each step as its own function below



In [2]:
import requests
from bs4 import BeautifulSoup
import time

In [1]:
!pip install lxml

[33mYou are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Use Requests to get HTML

Here's the code:

In [3]:
symbol = "amzn"
url = 'http://www.nasdaq.com/symbol/' + symbol
response = requests.get(url)
if response.ok:
    print (response.text)
else:
    print ("Error retrieving " + url)


<!doctype html>
<html lang="en-us" class="inner no-js" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">
<head>
<script>
(function(){
var is_chrome;
if(typeof navigator.vendor!="undefined")
	is_chrome = ((navigator.userAgent.toLowerCase().indexOf('chrome') > -1) &&(navigator.vendor.toLowerCase().indexOf("google") > -1));
else
	is_chrome = false;
if(is_chrome)
{
//instart
//copyright Tue Apr 25 2017 17:02:36 GMT+0000 (UTC)
(function(f){(function(){var C="undefined"===typeof IXC_115_9954108064986444||"undefined"===typeof IXC_115_9954108064986444.CanRun||"_115_9954108064986444"!==IXC_115_9954108064986444._115_9954108064986444||IXC_115_9954108064986444.CanRun("abd");if(C)try{(f.IXC_115_9954108064986444=f.IXC_115_9954108064986444||{})._115_9954108064986444="_115_9954108064986444",f[atob("SU5TVEFSVA==")+"_TARGET_NAME"]="abd",f[atob("STExQw==")]=f[atob("STExQw==")]||{}}catch(O){console.error(O)}else{var D="abd"!==f[atob("SU5TVEFSVA==")+
"_TARGET_NAME

It looks like it works, so we should refactor it into a function and then test again. we pass in the `symbol` as input and return what we would print as output.

In [4]:
def get_html_from_nasdaq(symbol):
    url = 'http://www.nasdaq.com/symbol/' + symbol
    response = requests.get(url)
    if response.ok:
        return response.text
    else:
        return "Error retrieving " + url
    

Let's try it out... should work

In [5]:
html = get_html_from_nasdaq('amzn')
html

'\r\n<!doctype html>\r\n<html lang="en-us" class="inner no-js" xmlns:og="http://ogp.me/ns#" xmlns:fb="https://www.facebook.com/2008/fbml">\r\n<head>\r\n<script>\r\n(function(){\r\nvar is_chrome;\r\nif(typeof navigator.vendor!="undefined")\r\n\tis_chrome = ((navigator.userAgent.toLowerCase().indexOf(\'chrome\') > -1) &&(navigator.vendor.toLowerCase().indexOf("google") > -1));\r\nelse\r\n\tis_chrome = false;\r\nif(is_chrome)\r\n{\r\n//instart\r\n//copyright Tue Apr 25 2017 17:02:36 GMT+0000 (UTC)\r\n(function(f){(function(){var C="undefined"===typeof IXC_115_9954108064986444||"undefined"===typeof IXC_115_9954108064986444.CanRun||"_115_9954108064986444"!==IXC_115_9954108064986444._115_9954108064986444||IXC_115_9954108064986444.CanRun("abd");if(C)try{(f.IXC_115_9954108064986444=f.IXC_115_9954108064986444||{})._115_9954108064986444="_115_9954108064986444",f[atob("SU5TVEFSVA==")+"_TARGET_NAME"]="abd",f[atob("STExQw==")]=f[atob("STExQw==")]||{}}catch(O){console.error(O)}else{var D="abd"!==f[a

## use BeautifulSoup4 to extract data from the site

Next we want to take `html` and extract out the meaningful bits. This is the part that requires time and patience. You'll need to use a browser's developer tools to find the important CSS selectors so you can retrieve the data.

For simplicity's sake, we've done this for you. Feel free to open http://www.nasdaq.com/symbol/amzn in your browser's developer tools and locate each of these three selectors:

In [6]:
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
print(name,price,change)

Amazon.com, Inc. Common Stock Quote & Summary Data $918.38 1%


We can't easily return 3 values from a function (actually you can in python, but we don't like to teach you that) so instead we will create a dictionary of these values first:

In [7]:
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
stock= { 'Name' : name,
        "Price" : price,
        "Change" : change
}

print(stock)

{'Name': 'Amazon.com, Inc. Common Stock Quote & Summary Data', 'Change': '1%', 'Price': '$918.38'}


Once again, its time to refactor this into a function we take `html` as input (its the one thing we require to make this code work) and `stock` as output (since it is what we print.

In [8]:
def extract_stock_data(html):
    soup = BeautifulSoup(html, "lxml")
    name = soup.select("div#qwidget_pageheader h1")[0].text
    price = soup.select("div#qwidget_lastsale")[0].text
    change = soup.select("div#qwidget_percent")[0].text
    stock= { 'Name' : name,
            "Price" : price,
            "Change" : change
    }

    return stock


And we should test out our new function:

In [9]:
stock = extract_stock_data(html)
print(stock)

{'Name': 'Amazon.com, Inc. Common Stock Quote & Summary Data', 'Change': '1%', 'Price': '$918.38'}


## Putting it all together

Now you need to put it all together. Write a program to:

1. input a stock symbol on the NASDAQ exchange
2. get the html from the stock symbol on the NASDAQ website
3. extract the stock data from the html
4. print out the stock informtiom

The program should work like this:

```
Enter a stock symbol on the NASDAQ Exchange: amzn
Name: Amazon.com, Inc. Common Stock Quote & Summary Data
Price: $852.53
Change: 0.24%
```

In [10]:
# todo write code here:
symbol = input("Enter a stock symbol on the NASDAQ exchange")
def get_html_from_nasdaq(symbol):
    url = 'http://www.nasdaq.com/symbol/' + symbol
    response = requests.get(url)
    if response.ok:
        return response.text
    else:
        return "Error retrieving " + url
soup = BeautifulSoup(html, "lxml")
name = soup.select("div#qwidget_pageheader h1")[0].text
price = soup.select("div#qwidget_lastsale")[0].text
change = soup.select("div#qwidget_percent")[0].text
print(name,price,change)
    

Enter a stock symbol on the NASDAQ exchangeamzn
Amazon.com, Inc. Common Stock Quote & Summary Data $918.38 1%
