# Yahoo Finance Data

Recently I was doing this option pricing study, and wanted to test the theory with real market data. Unfortunately, on my personal workstation, I can't afford the usual data APIs for option prices, and some of the old free APIs that I used to use have become unavailalbe. Now that I have some experience doing web scraping, I think it might be a good idea to fetch financial data with those technologies. Here's how I did it.

In [2]:
import urllib.request
from bs4 import BeautifulSoup
import re
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime
%matplotlib inline

In [3]:
expiration = datetime.datetime(2018,8,24,0,0)
ticker = 'GS'

In [4]:
page_url = 'https://finance.yahoo.com/quote/{}/options'.format(ticker)
page_url

'https://finance.yahoo.com/quote/GS/options'

In [5]:
page = urllib.request.urlopen(page_url)
soup = BeautifulSoup(page, 'html.parser')

In [6]:
soup.title

<title>GS Option Chain | Goldman Sachs Group, Inc. (The) Stock - Yahoo Finance</title>

In [7]:
columns = {}
columns['Contract Name'] = 'data-col0 Ta(start) Pstart(10px)'
attributes = ['Strike', 'Last Price', 'Bid', 'Ask', 'Change', 'Change Pct', 'Volumne', 'Open Interest','Implied Volatility']
for i, attr in enumerate(attributes):
    columns[attr] = 'data-col{}'.format(i+2)

In [8]:
contracts = [x.text for x in soup.findAll('td', attrs = {'class': columns['Contract Name']})]
contracts[:5]

['GS180824C00205000',
 'GS180824C00207500',
 'GS180824C00210000',
 'GS180824C00212500',
 'GS180824C00215000']

In [9]:
option_data = pd.DataFrame(np.NaN, index = contracts, columns = attributes)
option_data.head()

Unnamed: 0,Strike,Last Price,Bid,Ask,Change,Change Pct,Volumne,Open Interest,Implied Volatility
GS180824C00205000,,,,,,,,,
GS180824C00207500,,,,,,,,,
GS180824C00210000,,,,,,,,,
GS180824C00212500,,,,,,,,,
GS180824C00215000,,,,,,,,,


In [10]:
for attr in attributes:
    option_data[attr] = [x.text for x in soup.findAll('td', attrs = {'class': columns[attr]})]

In [11]:
option_data

Unnamed: 0,Strike,Last Price,Bid,Ask,Change,Change Pct,Volumne,Open Interest,Implied Volatility
GS180824C00205000,205.0,28.35,27.9,28.7,0.0,-,5,0,60.06%
GS180824C00207500,207.5,22.45,19.35,21.4,0.0,-,6,1,0.00%
GS180824C00210000,210.0,22.99,22.75,24.0,0.79,+3.56%,2,25,58.98%
GS180824C00212500,212.5,17.2,18.1,20.85,0.0,-,2,2,0.00%
GS180824C00215000,215.0,18.76,18.1,18.9,1.75,+10.29%,179,51,46.58%
GS180824C00217500,217.5,12.0,15.55,16.4,0.0,-,8,45,41.55%
GS180824C00220000,220.0,10.36,12.7,13.9,0.0,-,7,18,36.43%
GS180824C00222500,222.5,11.2,10.7,11.35,2.2,+22.56%,4,10,30.32%
GS180824C00225000,225.0,8.53,8.6,8.9,0.33,+4.02%,615,124,25.88%
GS180824C00227500,227.5,6.5,6.3,6.5,0.25,+4.00%,67,536,21.66%


As expected, for the nearest expiration options, we have got all the data that is available on the web page! To get other expiration dates, we only need to change the dates. And it should be very easy.

In [12]:
def getOptionFromYahoo(ticker, expiration = None):
    try:
        if not expiration:
            page_url = 'https://finance.yahoo.com/quote/{}/options'.format(ticker)
        else:
            page_url = 'https://finance.yahoo.com/quote/{}/options?date={}'.format(ticker, int(datetime.datetime.timestamp(expiration)))
        print('Getting data from {}'.format(page_url))
        page = urllib.request.urlopen(page_url)
        soup = BeautifulSoup(page, 'html.parser')
        
        columns = {}
        columns['Contract Name'] = 'data-col0 Ta(start) Pstart(10px)'
        attributes = ['Strike', 'Last Price', 'Bid', 'Ask', 'Change', 'Change Pct', 'Volumne', 'Open Interest','Implied Volatility']
        for i, attr in enumerate(attributes):
            columns[attr] = 'data-col{}'.format(i+2)

        contracts = [x.text for x in soup.findAll('td', attrs = {'class': columns['Contract Name']})]
        if len(contracts) == 0:
            # Didn't get any data, the URL might be corrupted.
            print('Couldn\'t read any data from {}, please inspect the url!'.format(page_url))
            raise ValueError
        
        option_data = pd.DataFrame(np.NaN, index = contracts, columns = attributes)

        for attr in attributes:
            option_data[attr] = [x.text for x in soup.findAll('td', attrs = {'class': columns[attr]})]    
    except ValueError:
        option_data = None
    return option_data

In [14]:
data = getOptionFromYahoo('GS')
data.head()

Getting data from https://finance.yahoo.com/quote/GS/options


Unnamed: 0,Strike,Last Price,Bid,Ask,Change,Change Pct,Volumne,Open Interest,Implied Volatility
GS180824C00205000,205.0,28.35,27.9,28.7,0.0,-,5,0,60.06%
GS180824C00207500,207.5,22.45,19.35,21.4,0.0,-,6,1,0.00%
GS180824C00210000,210.0,22.99,22.75,24.0,0.79,+3.56%,2,25,58.98%
GS180824C00212500,212.5,17.2,18.1,20.85,0.0,-,2,2,0.00%
GS180824C00215000,215.0,18.76,18.1,18.9,1.75,+10.29%,179,51,46.58%


So far we have got the snapshot of one day's option data. To get multiple days' data, I'm afaid I have to run this program for multiple days, a scheduler would do that job for me.

Finally, some reference for how I learned the above stuff.
https://www.pythoncentral.io/python-beautiful-soup-example-yahoo-finance-scraper/