# Web Scraping Yahoo! Finance with pandas read_html

In this tutorial, we will use the **pandas read_html** method in Python to scrape data from *Yahoo! Finance*.

We will illustrate using Ford's Statistics page on *Yahoo! Finance* ('https://finance.yahoo.com/quote/F/key-statistics?p=F').

#### Scraping Data from Tables

The **pandas read_html** function reads the contents of HTML and extracts all tables into a list of **pandas** DataFrames, making web scraping extremely easy!

However, the **read_html** function is limited to extracting ONLY data that is contained within *table* tags in the HTML code. If the data you want to scrape is NOT contained in a *table* tag, this method will not work.

The **read_html** function requires a parameter specifying the HTML data. This parameter can either be HTML stored as text or the web site address.

In [1]:
import pandas as pd

from selenium import webdriver
from selenium.webdriver.common.by import By

# Set up headless Chrome options
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Optional: runs the browser in the background
driver = webdriver.Chrome(options=options)

# Load the Yahoo Finance Profile page
url = "https://finance.yahoo.com/quote/F/key-statistics?p=F"
driver.get(url)

# Wait for the page to load
driver.implicitly_wait(5)  # Waits up to 5 seconds 

# Retrieve HTML
html = driver.page_source

# Close the browser
driver.quit()

dfs = pd.read_html(html)

  dfs = pd.read_html(html)


Let's print out the DataFrames in our **dfs** list to see what data in our web page is contained in tables.

In [2]:
for df in dfs:
    print(df)
    print('\n\n----------------------------------------------------------------------------------\n\n')

                 Unnamed: 0  Current 9/30/2024 6/30/2024 3/31/2024 12/31/2023  \
0                Market Cap   42.13B    41.98B    50.06B    52.77B     48.80B   
1          Enterprise Value  164.24B   160.31B   167.02B   163.71B    152.22B   
2              Trailing P/E    12.05     11.00     12.93     12.30       7.97   
3               Forward P/E     6.02      5.33      6.06      7.22       7.05   
4  PEG Ratio (5yr expected)     0.88      0.57      0.66      0.78       0.73   
5               Price/Sales     0.23      0.24      0.29      0.30       0.28   
6                Price/Book     0.95      0.96      1.17      1.23       1.10   
7  Enterprise Value/Revenue     0.90      0.89      0.94      0.93       0.87   
8   Enterprise Value/EBITDA    15.82     14.36     15.02     13.86      10.28   

  9/30/2023  
0    49.71B  
1   151.74B  
2     12.06  
3      6.25  
4      0.65  
5      0.29  
6      1.14  
7      0.89  
8     12.32  


-----------------------------------------------

Let's say we want to extract only the 5-year monthly Beta, which is contained in the second DataFrame in the **dfs** list.

In [3]:
dfs[8]

Unnamed: 0,0,1
0,Beta (5Y Monthly),1.63
1,52 Week Range 3,5.47%
2,S&P 500 52-Week Change 3,31.94%
3,52 Week High 3,14.85
4,52 Week Low 3,9.49
5,50-Day Moving Average 3,10.77
6,200-Day Moving Average 3,11.84


We can extract information from a DataFrame using the **.iloc[rownum,colnum]** function. The *rownum* and *colnum* are row and column numbers, respectively, and are indexed starting at 0.

Beta is contained in first row and second column, so the *rownum* is equal to 0 (i.e., first row), and the *colnum* is equal to 1 (i.e., second column).

In [4]:
beta = dfs[8].iloc[0,1]
print(beta)

1.63


#### Exercise -- Practice Using pandas read_html

1. Obtain the 'Trailing Annual Dividend Rate' for Ford listed on Ford's Yahoo Finance Statistics page.
2. Obtain the 'Shares Outstanding' for Ford listed on Ford's Yahoo Finance Statistics page.
3. Create a function to obtain the 'Trailing Annual Dividend Rate' and the 'Shares Outstanding' for any ticker. Then extract the these items for 'F','AAPL','AMZN', and 'WMT' and save the data to a new pandas DataFrame.

#### Solution for # 1

In [5]:
dfs[10]

Unnamed: 0,0,1
0,Forward Annual Dividend Rate 4,0.6
1,Forward Annual Dividend Yield 4,5.66%
2,Trailing Annual Dividend Rate 3,0.60
3,Trailing Annual Dividend Yield 3,5.66%
4,5 Year Average Dividend Yield 4,5.76
5,Payout Ratio 4,68.18%
6,Dividend Date 3,12/2/2024
7,Ex-Dividend Date 4,11/7/2024
8,Last Split Factor 2,1748175:1000000
9,Last Split Date 3,8/3/2000


In [6]:
divrate = dfs[10].iloc[3,1]
print(divrate)

5.66%


#### Solution for # 2

In [7]:
dfs[9]

Unnamed: 0,0,1
0,Avg Vol (3 month) 3,50.24M
1,Avg Vol (10 day) 3,64.43M
2,Shares Outstanding 5,3.9B
3,Implied Shares Outstanding 6,3.97B
4,Float 8,3.89B
5,% Held by Insiders 1,0.27%
6,% Held by Institutions 1,58.01%
7,Shares Short (10/15/2024) 4,99.65M
8,Short Ratio (10/15/2024) 4,2.05
9,Short % of Float (10/15/2024) 4,2.77%


In [8]:
shrout = dfs[9].iloc[3,1]
print(shrout)

3.97B


#### Solution for # 3

In [9]:
import pandas as pd

def get_data(ticker):

    # Set up headless Chrome options
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')  # Optional: runs the browser in the background
    driver = webdriver.Chrome(options=options)

    # Load the Yahoo Finance Profile page
    url = 'https://finance.yahoo.com/quote/'+ticker+'/key-statistics?p='+ticker
    driver.get(url)

    # Wait for the page to load
    driver.implicitly_wait(1)  # Waits up to 1 seconds 

    # Retrieve HTML
    html = driver.page_source

    # Close the browser
    driver.quit()

    dfs = pd.read_html(html)
    divrate = dfs[10].iloc[3,1]
    shrout = dfs[9].iloc[3,1]
    return divrate,shrout

# List of tickers to obtain
tickers = ['F','AAPL','AMZN']

# Initalize a new pandas DataFrame
df = pd.DataFrame(columns = ['ticker','divrate','shrout'])

# Iterate through list of tickers and save mktcap to our df DataFrame
for ticker in tickers:
    divrate,shrout = get_data(ticker)
    df = pd.concat([df, pd.DataFrame({'ticker':[ticker], 'divrate':[divrate], 'shrout':[shrout]})], ignore_index=True)

# Print the df DataFrame
df

  dfs = pd.read_html(html)
  dfs = pd.read_html(html)
  dfs = pd.read_html(html)


Unnamed: 0,ticker,divrate,shrout
0,F,5.66%,3.97B
1,AAPL,0.44%,15.17B
2,AMZN,0.00%,10.63B
