<h1>Scraping Stock Data</h1>
<h2>Description</h2>


I extract some stock data, then display this data in a graph.


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Define a Function that Makes a Graph</a></li>
    </ul>
</div>

<hr>


In [2]:
!pip install yfinance
#!pip install pandas
#!pip install requests
!pip install bs4
#!pip install plotly

Collecting yfinance
  Downloading yfinance-0.1.64.tar.gz (26 kB)
Collecting lxml>=4.5.1
  Downloading lxml-4.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 7.8 MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.64-py2.py3-none-any.whl size=24109 sha256=7fad7c9c047e047a9d0ccc406b787bd3e5935f4db31aeecff9ccdfbde1e16e46
  Stored in directory: /root/.cache/pip/wheels/86/fe/9b/a4d3d78796b699e37065e5b6c27b75cff448ddb8b24943c288
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully installed lxml-4.6.4 yfinance-0.1.64


In [3]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Define Graphing Function


I define the function `make_graph`. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.


In [4]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data.Date, infer_datetime_format=True), y=revenue_data.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()




## yfinance to Extract Stock Data


Using the `Ticker` function I want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.


In [5]:
tesla = yf.Ticker("TSLA")

Using the ticker object and the function `history` extract stock information and I save it in a dataframe named `tesla_data` and set the `period` parameter to `max` so I get information for the maximum amount of time.


In [6]:
tesla_data = tesla.history(period="max")

I reset the index, save, and display the first five rows of the `tesla_data` dataframe using the `head` function.


In [7]:
tesla_data=tesla_data.reset_index()
blankIndex=[''] * len(tesla_data)
tesla_data.index=blankIndex
tesla_data.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
,2010-06-29,3.8,5.0,3.508,4.778,93831500,0,0.0
,2010-06-30,5.158,6.084,4.66,4.766,85935500,0,0.0
,2010-07-01,5.0,5.184,4.054,4.392,41094000,0,0.0
,2010-07-02,4.6,4.62,3.742,3.84,25699000,0,0.0
,2010-07-06,4.0,4.0,3.166,3.222,34334500,0,0.0


## Webscraping to Extract Tesla Revenue Data


# I use the `requests` library to download the webpage [https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue](https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) and then save the text of the response as a variable named `html_data`.


In [8]:
html='https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue'
html_data=requests.get(html).text

Parsing the html data using `beautiful_soup`. 

In [9]:
soup=BeautifulSoup(html_data, 'html5lib')
table = soup.find_all('table')[1]

Using beautiful soup extract the table with Tesla Quarterly Revenue.


In [10]:
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in table.find("tbody").find_all("tr"):
    line = row.find_all("td")
    date = str(line[0].string)
    revenue = line[1].string  
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)

tesla_revenue['Revenue']=tesla_revenue['Revenue'].apply(lambda x: x.replace('$', '').replace(',', '')
                                if isinstance(x, str) else x).astype(float)
blankIndex=[''] * len(tesla_revenue)
tesla_revenue.index=blankIndex
tesla_revenue.head()

Unnamed: 0,Date,Revenue
,2021-09-30,13757.0
,2021-06-30,11958.0
,2021-03-31,10389.0
,2020-12-31,10744.0
,2020-09-30,8771.0


Removing the columns in the dataframe that are empty strings


In [11]:
tesla_revenue.dropna(subset=['Revenue'], inplace=True)

*Display* the last 5 row of the `tesla_revenue` dataframe using the `tail` function. 


In [12]:
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
,2010-09-30,31.0
,2010-06-30,28.0
,2010-03-31,21.0
,2009-09-30,46.0
,2009-06-30,27.0




Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.


In [13]:
gm_stop = yf.Ticker('GME')

Using the ticker object and the function `history` extract stock information and set the `period` parameter to `max` so we get information for the maximum amount of time.


In [14]:
gme_data = gm_stop.history(period='max')

 displaying the first five rows of the `gme_data`.


In [15]:
gme_data=gme_data.reset_index()
blankIndex=[''] * len(gme_data)
gme_data.index=blankIndex
gme_data.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
,2002-02-13,6.480514,6.7734,6.413184,6.766667,19054000,0.0,0.0
,2002-02-14,6.850828,6.864294,6.682503,6.733,2755400,0.0,0.0
,2002-02-15,6.733002,6.749834,6.632007,6.699337,2097400,0.0,0.0
,2002-02-19,6.665672,6.665672,6.312189,6.430017,1852600,0.0,0.0
,2002-02-20,6.463683,6.64884,6.413185,6.64884,1723200,0.0,0.0


## Webscraping to Extract GME Revenue Data


I use the `requests` library to download the webpage [https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue](https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ). 

In [16]:
url = 'https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue'
html_data = requests.get(url).text

Parsing the html data using `beautiful_soup`. 


In [17]:
soup = BeautifulSoup(html_data, 'html5lib')
table = soup.find_all('table')[1]

Using beautiful soup extract the table with Tesla Quarterly Revenue.


In [18]:
gme_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in table.find("tbody").find_all("tr"):
    line = row.find_all("td")
    date = str(line[0].string)
    revenue = line[1].string  
    gme_revenue = gme_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)

gme_revenue['Revenue']=gme_revenue['Revenue'].apply(lambda x: x.replace('$', '').replace(',', '')
                                if isinstance(x, str) else x).astype(float)

#blankIndex=[''] * len(gme_revenue)
#gme_revenue.index=blankIndex
gme_revenue.head()

Unnamed: 0,Date,Revenue
0,2021-07-31,1183.0
1,2021-04-30,1277.0
2,2021-01-31,2122.0
3,2020-10-31,1005.0
4,2020-07-31,942.0


Displaying the last five rows of the `gme_revenue` dataframe.


In [19]:
gme_revenue.tail(5)

Unnamed: 0,Date,Revenue
62,2006-01-31,1667.0
63,2005-10-31,534.0
64,2005-07-31,416.0
65,2005-04-30,475.0
66,2005-01-31,709.0


## Plotting Tesla Stock Graph


Using the `make_graph` function to graph the Tesla Stock Data.


In [20]:
stock = "TESLA"
make_graph(tesla_data,tesla_revenue, stock)

## Plot GameStop Stock Graph


Using the `make_graph` function to graph the GameStop Stock Data.


In [21]:
stock='GAMESTOP'
make_graph(gme_data,gme_revenue, stock)
