In [3]:
#!pip install requests -upgrade -q 

# q - quiet which removes output

## 1. Download - 100 Most Acive Stocks
We’ll start by downloading the requests library so we can open the articles as files.

In [4]:
import requests 

Here is a snapshot of the article that we will be parsing

![Getting Started](images/yfin_mostActive.jpg)

In [13]:
headers = {
    'User-Agent' : 'Ben Dover',
    'From' : 'ben@outlook.com'

    }

In [14]:
url = 'https://finance.yahoo.com/most-active?offset=0&count=100'
response = requests.get(url, headers=headers)

In [15]:
response.status_code

200

Looking at the response contents as text

In [21]:
# Displaying response
url_contents = response.text
url_contents[:1000]



'<!DOCTYPE html><html data-color-theme="light" id="atomic" class="NoJs desktop" lang="en-US"><head prefix="og: https://ogp.me/ns#"><script>window.performance && window.performance.mark && window.performance.mark(\'PageStart\');</script><meta charSet="utf-8"/><title>Most Active Stocks Today - Yahoo Finance</title><meta name="keywords" content="Stock Screener, industry, index membership, share data, stock price, market cap, beta, sales, profitability, valuation ratios, analyst estimates, large cap value, bargain growth, preset stock screens"/><meta http-equiv="x-dns-prefetch-control" content="on"/><meta property="twitter:dnt" content="on"/><meta property="fb:app_id" content="458584288257241"/><meta name="theme-color" content="#400090"/><meta name="viewport" content="width=device-width, initial-scale=1"/><meta name="description" lang="en-US" content="See the list of the most active stocks today, including share price change and percentage, trading volume, intraday highs and lows, and day 

We will now save our file as an html document

In [27]:
with open('yfin_most_active.html', 'w', encoding='utf-8') as file:
    file.write(url_contents)

We can create a function that downloads any url as a html file for future projects.

In [24]:
def download_web_page(url, html_title):
    headers = {
    'User-Agent' : 'Ben Dover',
    'From' : 'ben@outlook.com'

    } 
    response = response.get(url, headers=headers)
    status_code = response.status_code
    if status_code in range(200,300):
        url_contents = response.text
        with open('new_down_web_page.html', 'w', encoding="utf-8") as file:
            file.write(url_contents)
        print('Status code is within okay range of {}'.format(status_code))
    
    else:
        return

## 2. Create a BeautifulSoup document for parsing - 100 Most Active Stocks

We'll download the BeautifulSoup library so we can parse the articles.

In [25]:
#!pip install beautifulsoup4 -upgrade -q


Usage:   
  pip install [options] <requirement specifier> [package-index-options] ...
  pip install [options] -r <requirements file> [package-index-options] ...
  pip install [options] [-e] <vcs project url> ...
  pip install [options] [-e] <local project path> ...
  pip install [options] <archive url/path> ...

no such option: -u


In [26]:
from bs4 import BeautifulSoup

We shall open our file so that we can turn it into BS4 document

In [28]:
with open('yfin_most_active.html', 'r') as f:
    html_source = f.read()

In [29]:
doc = BeautifulSoup(html_source, 'html.parser')

In [30]:
type(doc)

bs4.BeautifulSoup

## 3. Parse the BeautifulSoup document - 100 Most Active Stocks

We'll now identify the tags and classes 

As you can see in the image below, the stock information is located within the `tr_tag`.

The class slightly differs as the rows are coloured differently. We can use a subset of the class, `class_=’simpTblRow’` that is common to both class types.

![HTML inspector](images/html_inspector.jpg)

In [31]:
tr_class_tags = doc.find_all('tr', class_='simpTblRow')
tr_class_tags[:2]

[<tr class="simpTblRow Bgc($hoverBgColor):h BdB Bdbc($seperatorColor) Bdbc($tableBorderBlue):h H(32px) Bgc($lv2BgColor)"><td aria-label="Symbol" class="Va(m) Ta(start) Pstart(6px) Pend(10px) Miw(90px) Start(0) Pend(10px) simpTblRow:h_Bgc($hoverBgColor) Pos(st) Bgc($lv3BgColor) Z(1) Bgc($lv2BgColor) Ta(start)! Fz(s)" colspan=""><label class="Ta(c) Pos(r) Va(tb) Pend(5px) D(n)--print" data-id="portfolio-checkbox"><input aria-label="Select TSLA" class="Pos(a) Op(0) checkbox" type="checkbox"/><svg class="Va(m)! H(16px) W(16px) checkbox:f+Stk($linkColor)! checkbox:f+Fill($linkColor)! Stk($plusGray) Fill($plusGray) Cur(p)" data-icon="checkbox-unchecked" height="16" style="stroke-width:0;vertical-align:bottom" viewbox="0 0 24 24" width="16"><path d="M3 3h18v18H3V3zm19-2H2c-.553 0-1 .448-1 1v20c0 .552.447 1 1 1h20c.552 0 1-.448 1-1V2c0-.552-.448-1-1-1z"></path></svg></label><a class="Fw(600) C($linkColor)" data-test="quoteLink" href="/quote/TSLA?p=TSLA" title="Tesla, Inc.">TSLA</a><div class="

In [32]:
len(tr_class_tags)

100

We have the correct number of tr tags.

Let's look at the first `tr_tag` which encompasses the stock `TSLA`


In [36]:
tr_class_tags1 = tr_class_tags[0]


If we look within the first `tr_tag` in the image below, we can see that each element of the stock is within a `td_tag`. We are going to extract these tags


![tdtag](images/yfin_tdtag.jpg)

In [40]:
td_tag = tr_class_tags1.find_all('td')
td_tag

[<td aria-label="Symbol" class="Va(m) Ta(start) Pstart(6px) Pend(10px) Miw(90px) Start(0) Pend(10px) simpTblRow:h_Bgc($hoverBgColor) Pos(st) Bgc($lv3BgColor) Z(1) Bgc($lv2BgColor) Ta(start)! Fz(s)" colspan=""><label class="Ta(c) Pos(r) Va(tb) Pend(5px) D(n)--print" data-id="portfolio-checkbox"><input aria-label="Select TSLA" class="Pos(a) Op(0) checkbox" type="checkbox"/><svg class="Va(m)! H(16px) W(16px) checkbox:f+Stk($linkColor)! checkbox:f+Fill($linkColor)! Stk($plusGray) Fill($plusGray) Cur(p)" data-icon="checkbox-unchecked" height="16" style="stroke-width:0;vertical-align:bottom" viewbox="0 0 24 24" width="16"><path d="M3 3h18v18H3V3zm19-2H2c-.553 0-1 .448-1 1v20c0 .552.447 1 1 1h20c.552 0 1-.448 1-1V2c0-.552-.448-1-1-1z"></path></svg></label><a class="Fw(600) C($linkColor)" data-test="quoteLink" href="/quote/TSLA?p=TSLA" title="Tesla, Inc.">TSLA</a><div class="W(3px) Pos(a) Start(100%) T(0) H(100%) Bg($pfColumnFakeShadowGradient) Pe(n) Pend(5px)"></div></td>,
 <td aria-label="Na

We can see that the stock ticker is located within the `a_tag`. Therefore we will extract that.

In [41]:
a_tag = td_tag[0].find_all('a', recursive=False)
a_tag

[<a class="Fw(600) C($linkColor)" data-test="quoteLink" href="/quote/TSLA?p=TSLA" title="Tesla, Inc.">TSLA</a>]

## 4. Create a function to display information - 100 Most Active Stocks

We'll now create a function that can parse a `tr_tag` and create a corresponding dictionary of all the stock information.

In [52]:
def parse_stocks(tr_class_tag_with_n):
    # <td> tags contain all of the stocks info, <tr> tags contain all of the individual details, <a> tags contain exact content.

    td_tag = tr_class_tag_with_n.find_all('td')
    a_tag = td_tag[0].find('a', recursive=False)
    #Stock ticker
    ticker_name = a_tag.text.strip()
    # Stock Name
    name_tag = td_tag[1].text.replace(',',"")
    # Last Price of stock
    price_tag = float(td_tag[2].text)
    # Stock change
    daily_change_tag = td_tag[3].text
    # Percentage Change
    daily_percentage_change_tag = td_tag[4].text
    # Volume
    volume_tag = td_tag[5].text

    # Return a dictionary
    return {
        'Stock ticker' : ticker_name,
        'Stock name' : name_tag,
        'Last price of stock' : price_tag,
        'Stock change' : daily_change_tag,
        'Stock percentage change' : daily_percentage_change_tag,
        'Volume' : volume_tag,

    }


Let's apply our function to the first stock on the list, TSLA.

In [53]:
parse_stocks(tr_class_tags[0])

{'Stock ticker': 'TSLA',
 'Stock name': 'Tesla Inc.',
 'Last price of stock': 123.22,
 'Stock change': '+4.37',
 'Stock percentage change': '+3.68%',
 'Volume': '183.292M'}

We will now create a fucntion that utilizes list comprehension to parse the stock information for all of our stocks at once.

In [54]:
most_active_stocks = [parse_stocks(x) for  x in tr_class_tags]
most_active_stocks[:3]

[{'Stock ticker': 'TSLA',
  'Stock name': 'Tesla Inc.',
  'Last price of stock': 123.22,
  'Stock change': '+4.37',
  'Stock percentage change': '+3.68%',
  'Volume': '183.292M'},
 {'Stock ticker': 'AMZN',
  'Stock name': 'Amazon.com Inc.',
  'Last price of stock': 95.09,
  'Stock change': '+5.22',
  'Stock percentage change': '+5.81%',
  'Volume': '103.126M'},
 {'Stock ticker': 'AAPL',
  'Stock name': 'Apple Inc.',
  'Last price of stock': 133.49,
  'Stock change': '+2.76',
  'Stock percentage change': '+2.11%',
  'Volume': '69.459M'}]

## 5. Create a CSV of the parsed information - 100 most Active Stocks

We can write a function that creates a CSV file from our parsed information.

In [55]:
def write_csv(items, filename):
    with open(filename, 'w') as f:
        if len(items) == 0:
            return 
        headers = list(items[0].keys())
        f.write(','.join(headers) + '\n')

        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            f.write(','.join(values) + '\n')

In [56]:
write_csv(most_active_stocks, "most-active-stocks.csv")

Let's display our csv document with the help of pandas library

In [57]:
import pandas as pd

In [58]:
df = pd.read_csv('most-active-stocks.csv')
df.head()

Unnamed: 0,Stock ticker,Stock name,Last price of stock,Stock change,Stock percentage change,Volume
0,TSLA,Tesla Inc.,123.22,4.37,+3.68%,183.292M
1,AMZN,Amazon.com Inc.,95.09,5.22,+5.81%,103.126M
2,AAPL,Apple Inc.,133.49,2.76,+2.11%,69.459M
3,AMC,AMC Entertainment Holdings Inc.,4.92,0.86,+21.18%,51.501M
4,F,Ford Motor Company,13.22,0.38,+2.96%,51.84M


In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Stock ticker             100 non-null    object 
 1   Stock name               100 non-null    object 
 2   Last price of stock      100 non-null    float64
 3   Stock change             100 non-null    float64
 4   Stock percentage change  100 non-null    object 
 5   Volume                   100 non-null    object 
dtypes: float64(2), object(4)
memory usage: 4.8+ KB
