In [None]:
# Jovian Commit Essentials
# Please retain and execute this cell without modifying the contents for `jovian.commit` to work
!pip install jovian -q
import jovian
jovian.set_project('yahoo-finance-web-scraper')
jovian.set_colab_id('1BRmuZI_aFQWd62cRys6QUgaxAnGrLT6U')

# Web Scraping Yahoo! Finance using multiple techniques in Python

A detailed guide for scraping https://finance.yahoo.com/ using ***requests***, ***BeautifulSoup***, ***selenium***, ***HTML tags*** & existing available data in ***json*** format.

![](https://imgur.com/7jMFOcE.png)

**What is Web scraping?**

Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning.

**Introduction**

The main objective of this tutorial is to showcase different web scraping methods which can be applied to any web page. 
This is for educational purposes only. Please read the Terms & Conditions carefully for any website whether you can legally use the data. 

In this Project we will perform web scraping using following 3 techniques based on the problem statement.
* use `BeautifulSoup` and `HTML tags` to extract web page
* use `selenium` to scrape data from dynamically loading websites 
* scrape data using existing data available in `json` format



**The problem statement**

1. Scrape **Stock Market News** (url : https://finance.yahoo.com/topic/stock-market-news/) :<br>
    This web page shows latest **news** related to **stock market**, we will try to extract data from this web page and store it in `CSV` (comma-separated values) file. The file layout would be as mentioned below.
    ```
    source,headline,url,content,image
    <source of the news>,<news head line>,<news url>,<news content>,<news thumbnail image>
    ```

## TODO change this ##
2. Scrape **Trending Tickers** (url : https://finance.yahoo.com/trending-tickers) :<br>
    This yahoo finance web page is showing list of trending **Tickers** in tabular format, we will perform the web scraping to retrieve first 8 columns for all available **Tickers** in `CSV` format.
    ```
    Symbol,Name,Last Price,Market Time,Change,% Change,Volume,Market Cap
    COKE,"Coca-Cola Consolidated, Inc.",446.66,4:00PM EST,-136.98,-23.47%,"100,345",4.187B
    ```
        
3. Scrape **Market Events Calendar** (url : https://finance.yahoo.com/calendar) :<br> 
    This page is showing **date-wise market events**, user have option to select the date and choose any one of the following market event **Earnings**, **Stock Splits**, **Economic Events** & **IPO**. Our aim is to create script which can be run for any single date and market event which grabs the data and load it in `CSV` format. If there is no data found then just create file with column headers.
    


**Future Work**

Automate this process to get daily calendar , trending tickers & news in CSV files
- create daily 6 files 
- move the old files in to archive folder with time stamp 
- delete older files files older than 2 weeks

**Prerequisites**

* Knowledge of Python
* Basic knowledge of HTML although it is not necessary


**How to run the Code**

You can execute the code using "Run" button on the top of this page and selecting **"Run on Colab"** or **"Run Locally"** 
<br>
<br>
**Setup and Tools**

<u>Run on Colab :</u> 
    You will need to provide the Google login to run this notebook on Colab.<br>

<u>Run Locally :</u> Download and install [Anaconda](https://www.anaconda.com/) framework, We will be using Jupyter Notebook for writing the & executing code



**Code Re-usability & Version control**

You can make changes and save your version of the notebook to [Jovian](https://jovian.ai/) by executing following cells.

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

<IPython.core.display.Javascript object>

In [3]:
# Execute this to save new versions of the notebook
jovian.commit(project="yahoo-finance-web-scraper")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "vinodvidhole/yahoo-finance-web-scraper" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/vinodvidhole/yahoo-finance-web-scraper[0m


'https://jovian.ai/vinodvidhole/yahoo-finance-web-scraper'

## 1. Scrape Stock Market News 

## TODO Add image ?
Lets kick start with the first objective. Here's an outline of the steps we'll follow<br>

**1.1 Download & Parse webpage using `requests` and `BeautifulSoup`**<br>
**1.2 Exploring and locating Elements**<br>
**1.3 Extract & Compile the information into python list**<br>
**1.4 Save the extracted information to a CSV file**<br>


### 1.1 Download & Parse webpage using requests and BeautifulSoup

First step is to install [`requests`](https://docs.python-requests.org/en/latest/) & [`beautifulsoup4`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) Libraries using `pip`.

In [4]:
!pip install requests --upgrade --quiet
!pip install beautifulsoup4 --upgrade --quiet

In [5]:
import requests
from bs4 import BeautifulSoup

The library is now installed and imported.<br>

To download the page, we can use `requests.get`, which returns a response object. the HTML information of web page is captured in `response.text`.<br>
`response.ok` & [`response.status_code`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) can be used for error trapping &  tracking.<br> 
Finally we can use `BeautifulSoup` to parse the HTML data, this will return `bs4.BeautifulSoup` object  

Lets create a function to perform this step 

In [6]:
def get_page(url):
    """Download a webpage and return a beautiful soup doc"""
    response = requests.get(url)
    if not response.ok:
        print('Status code:', response.status_code)
        raise Exception('Failed to load page {}'.format(url))
    doc = BeautifulSoup(response.text, 'html.parser')
    return doc

calling function `get_page` and analyze the output.

In [7]:
my_url = 'https://finance.yahoo.com/topic/stock-market-news/' #Global variable 
doc = get_page(my_url)

In [8]:
print('Type of doc: ',type(doc))

Type of doc:  <class 'bs4.BeautifulSoup'>


You can access different properties of HTML web page from doc, following example will display Title of the web page.  

In [9]:
doc.find('title')

<title>Latest Stock Market News</title>

**Summary** : We can now use the function `get_page` to download any web page and parse it using beautiful soup.

### 1.2 Exploring and locating Elements
Now its time to explore the elements to find the required data point from the web page. Web pages are written in a language called HTML (Hyper Text Markup Language).  HTML is a fairly simple language comprised of *tags*  (also called *nodes* or *elements*) e.g. `<a href="https://finance.yahoo.com/" target="_blank">Go to Yahoo! Finance</a>`. An HTML tag has three parts:



1. **Name**: (`html`, `head`, `body`, `div`, etc.) Indicates what the tag represents and how a browser should interpret the information inside it.
2. **Attributes**: (`href`, `target`, `class`, `id`, etc.) Properties of tag used by the browser to customize how a tag is displayed and decide what happens on user interactions.
3. **Children**: A tag can contain some text or other tags or both between the opening and closing segments, e.g., `<div>Some content</div>`.

Now lets inspect the webpage source code by right-click and select the "Inspect" option. First we need to identify the tag which represents the news listing.

## TODO FIX BELOW 

![](https://media.giphy.com/media/RQpW64jdiQG8LNCGsZ/giphy.gif)


In this case we can see the `<div>` tag having class name `"Ov(h) Pend(44px) Pstart(25px)"` is representing news listing, we can apply `find_all` method to grab this information 

In [10]:
div_tags = doc.find_all('div', {'class': "Ov(h) Pend(44px) Pstart(25px)"})

Total elements in the `<div>` tag list is matching with the numbers of news displaying in the webpage , so we are heading towards right direction.

In [11]:
len(div_tags)

7

Next step to inspect the individual `<div>` tag and try to find more information. I am using "Visual Studio Code", but you can use any tool as simple as notepad.

In [12]:
div_tags[1]

<div class="Ov(h) Pend(44px) Pstart(25px)"><div class="C(#959595) Fz(11px) D(ib) Mb(6px)">News Direct</div><h3 class="Mb(5px)"><a class="js-content-viewer wafer-caas Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-uuid="bd999a4f-1a4b-3beb-b963-8035e1ef51fb" data-wf-caas-prefetch="1" data-wf-caas-uuid="bd999a4f-1a4b-3beb-b963-8035e1ef51fb" href="/news/freshworks-alums-launch-growfin-targeting-081500142.html"><u class="StretchedBox"></u>Freshworks alums launch Growfin targeting the $125T global B2B payments market with a collaboration-first approach</a></h3><p class="Fz(14px) Lh(19px) Fz(13px)--sm1024 Lh(17px)--sm1024 LineClamp(2,38px) LineClamp(2,34px)--sm1024 M(0)">Growfin’s AI-powered platform enables businesses to improve collection efficiency and forecast better with real-time cash flow visibility and pred

![](https://i.imgur.com/ncnfg0z.png)

Luckily most of the required data points are available in this `<div>`, so we can use `find` method to grab each items.

In [13]:
print("Source: ", div_tags[1].find('div').text)
print("Head Line : {}".format(div_tags[1].find('a').text))

Source:  News Direct
Head Line : Freshworks alums launch Growfin targeting the $125T global B2B payments market with a collaboration-first approach


If any tag is not accessible directly, then you can use methods like `findParent()` or `'findChild()` to point to the required tag.

![](https://i.imgur.com/OnOAtT2.png)

In [14]:
print("Image URL: ",div_tags[1].findParent().find('img')['src'])

Image URL:  https://s.yimg.com/uu/api/res/1.2/rI2S.krfLjVNOh9OLdo3Uw--~B/Zmk9c3RyaW07aD0xMjM7cT04MDt3PTIyMDthcHBpZD15dGFjaHlvbg--/https://s.yimg.com/uu/api/res/1.2/XY6GVxhFwq2iy2RRUgpZpg--~B/aD0yMjYwO3c9Mjg4MDthcHBpZD15dGFjaHlvbg--/https://media.zenfs.com/en/news_direct/eeb8da85066380e365e04c019fc88413.cf.jpg


**Summary** : Key Takeout from this exercise is to identify the optimal tag which will provide us required information. Sometimes its straight forward, sometimes you will have to perform little more research.  

### 1.3 Extract & Compile the information into python list

Now we've identified all required tags and information, Let's put this together in the functions.

In [15]:
def get_news_tags(doc):
    """Get the list of tags containing news information"""
    news_class = "Ov(h) Pend(44px) Pstart(25px)" ## class name of div tag 
    news_list  = doc.find_all('div', {'class': news_class})
    return news_list

sample run of the function `get_news_tags`

In [16]:
my_news_tags = get_news_tags(doc)

we will create one more function, to parse individual `<div>` tag and return the information in dictionary form

In [17]:
BASE_URL = 'https://finance.yahoo.com' #Global Variable 

def parse_news(news_tag):
    news_source = news_tag.find('div').text #source
    news_headline = news_tag.find('a').text #heading
    news_url = news_tag.find('a')['href'] #link
    news_content = news_tag.find('p').text #content
    news_image = news_tag.findParent().find('img')['src'] #thumb image
    return { 'source' : news_source,
            'headline' : news_headline,
            'url' : BASE_URL + news_url,
            'content' : news_content,
            'image' : news_image
           }

Lets test this `parse_news` function on first `<div>` tag 

In [18]:
parse_news(my_news_tags[0])

{'source': 'Bloomberg',
 'headline': 'U.S. Futures Rise as Europe Stocks Pare Drop: Markets Wrap',
 'url': 'https://finance.yahoo.com/news/stocks-drop-bonds-oil-surge-223608217.html',
 'content': '(Bloomberg) -- U.S. futures rose and European stocks wiped early losses, signaling the market selloff is easing after Tuesday’s slide.Most Read from BloombergChina Spy Think Tank Advising Xi Predicts Russia Sanctions Will BackfireTeen Who Tracked Elon Musk’s Jet Is Now Chasing Russian TycoonsMicrosoft Says Son of CEO Satya Nadella Has Died at 26Russia Steps Up Aerial Campaign Against Cities: Ukraine UpdateBiden Assails Putin, Pledges Inflation Fight in State of UnionContracts on U.S. gauges adva',
 'image': 'https://s.yimg.com/uu/api/res/1.2/vYIFOtsagzMYkLUMw2g8fQ--~B/Zmk9c3RyaW07aD0xMjM7cT04MDt3PTIyMDthcHBpZD15dGFjaHlvbg--/https://s.yimg.com/uu/api/res/1.2/d5ZdqYi2ON5wPGY03c68CQ--~B/aD0xMzM0O3c9MjAwMDthcHBpZD15dGFjaHlvbg--/https://media.zenfs.com/en/bloomberg_markets_842/0affa85e6d71d331fefd

**Summary** : We can use the `get_news_tags` & `parse_news` functions to pars news.

### 1.4 Save the extracted information to a CSV file

This is the last step of this section, We are going to use Python library [`pandas`](https://pandas.pydata.org/docs/) to save the data in CSV format. Install and then Import the pandas Library.

In [19]:
!pip install pandas --upgrade --quiet

In [20]:
import pandas as pd

Now we will create one final function, in this function we will use all previously created helper functions.<br>
The `get_page` function will download HTML page,then we can pass the result in `get_news_tags` to identify list of `<div>` tags for news.<br>
After that we will use [List Comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp) technique to pars each `<div>` tag using `parse_news`, the output will be in the form of `lists` of `dictionaries`<br>
Finally we will use `DataFrame` method to create pandas [dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) and use `to_csv` method to store required data in CSV format.

In [21]:
def scrape_yahoo_news(url, path=None):
    """Get the yahoo finance market news and write them to CSV file """
    if path is None:
        path = 'stock-market-news.csv'
        
    print('Requesting html page')
    doc = get_page(url)

    print('Extracting news tags')
    news_list = get_news_tags(doc)

    print('Parsing news tags')
    news_data = [parse_news(news_tag) for news_tag in news_list]

    print('Save the data to a CSV')
    news_df = pd.DataFrame(news_data)
    news_df.to_csv('stock-market-news.csv', index=None)
    
    #This return statement is optional, we are doing this just analyze the final output 
    return news_df 

It's time to test the `scrape_yahoo_news` function 

In [22]:
YAHOO_NEWS_URL = BASE_URL+'/topic/stock-market-news/'
news_df = scrape_yahoo_news(YAHOO_NEWS_URL)

Requesting html page
Extracting news tags
Parsing news tags
Save the data to a CSV


The "stock-market-news.csv" should be available in File --> Open Menu, you can download the file or directly open it on browser. Please verify the file content and compare it with the actual information available on the webpage.

You can also check the data by grabbing few rows form the data frame returned by the `scrape_yahoo_news` function 

In [23]:
news_df[:5]

Unnamed: 0,source,headline,url,content,image
0,Bloomberg,U.S. Futures Rise as Europe Stocks Pare Drop: ...,https://finance.yahoo.com/news/stocks-drop-bon...,(Bloomberg) -- U.S. futures rose and European ...,https://s.yimg.com/uu/api/res/1.2/vYIFOtsagzMY...
1,News Direct,Freshworks alums launch Growfin targeting the ...,https://finance.yahoo.com/news/freshworks-alum...,Growfin’s AI-powered platform enables business...,https://s.yimg.com/uu/api/res/1.2/rI2S.krfLjVN...
2,Bloomberg,OPEC+ Sits on Market Sideline as Russian Invas...,https://finance.yahoo.com/news/opec-sits-marke...,(Bloomberg) -- OPEC+ ministers gather on Wedne...,https://s.yimg.com/uu/api/res/1.2/1dLNVZTPb81j...
3,Reuters,"World's major companies lag on climate, some m...",https://finance.yahoo.com/news/worlds-major-co...,The corporate world remains far from being ali...,https://s.yimg.com/uu/api/res/1.2/MLsYBX7UYX2T...
4,Business Insider,State of the Union: Joe Biden ignores Congress...,https://finance.yahoo.com/news/state-union-joe...,"President Barack Obama, his former boss, calle...",https://s.yimg.com/uu/api/res/1.2/nsUkhGK.mIYo...


**Summary** : Hopefully I was able to explain this simple but very powerful Python technique to scrape the yahoo finance market news. These steps can be used to scrape any web page, you just have to little research ti identify required <tags> and use relevant python methods to collect the data. 

## 2. Scrape **Trending Tickers**

In phase One we were able to scrape the [yahoo market news](https://finance.yahoo.com/topic/stock-market-news/) web page. However If you've noticed, as we scroll down the web page more news will appear at the bottom of the page. This is called dynamic page loading. Previous technique is a basic Python method useful to scrape static data, To scrape the dynamically loading data will required a special method that we are going to discussion in this phase.

## TODO Add image ?

about selenium 
functions
tell them about xpath 

https://www.browserstack.com/guide/find-element-by-xpath-in-selenium

giude to find xpath -> right click copy x path chk below guide 
https://analyticsindiamag.com/comprehensive-guide-to-web-scraping-with-selenium/


fist explain hoe to use diffetent methods 
then you can write the code 

###  prereq

for local enviornmant 
- !pip install webdriver-manager --upgrade --quiet
- download required chromdriver and place it in the project path 

for colab use other method 

In [24]:
print('Installation')
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !apt update --quiet
    !apt install chromium-chromedriver --quiet
else:
    print('Not running on CoLab')
    !pip install webdriver-manager --upgrade --quiet
print('Common Installation')    
!pip install selenium --quiet

Installation
Not running on CoLab
Common Installation


In [25]:
print('Library Import')
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
else:
    print('Not running on CoLab')
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager

print('Common Library Import')
from selenium import webdriver
from selenium.webdriver.common.by import By

Library Import
Not running on CoLab
Common Library Import


In [26]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    def get_driver():
        colab_options = webdriver.ChromeOptions()
        colab_options.add_argument('--no-sandbox')
        colab_options.add_argument('--disable-dev-shm-usage')
        colab_options.add_argument('--headless')
        driver = webdriver.Chrome(options=colab_options)
        driver.get(YAHOO_FINANCE_URL)
        return driver
else:
    print('Not running on CoLab')
    def get_driver():
        chrome_options = Options()
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--headless')
        serv = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(options=chrome_options, service=serv)
        driver.get(YAHOO_FINANCE_URL)
        return driver

Not running on CoLab


In [None]:
##old code remove this 
'''
def get_tickers(driver):
    TABLE_CLASS = "W(100%)"  
    driver.get(YAHOO_FINANCE_URL)
    tablerows = len(driver.find_elements(By.XPATH, value="//table[@class= '{}']/tbody/tr".format(TABLE_CLASS)))
    return tablerows
'''   

In [None]:
##old code remove this 
'''
def parse_ticker(rownum, table_driver):
    Symbol = table_driver.find_element(By.XPATH, value="//tr[{}]/td[1]".format(rownum)).text
    Name = table_driver.find_element(By.XPATH, value="//tr[{}]/td[2]".format(rownum)).text
    LastPrice = table_driver.find_element(By.XPATH, value="//tr[{}]/td[3]".format(rownum)).text
    MarketTime = table_driver.find_element(By.XPATH, value="//tr[{}]/td[4]".format(rownum)).text
    Change = table_driver.find_element(By.XPATH, value="//tr[{}]/td[5]".format(rownum)).text
    PercentChange = table_driver.find_element(By.XPATH, value="//tr[{}]/td[6]".format(rownum)).text	
    Volume = table_driver.find_element(By.XPATH, value="//tr[{}]/td[7]".format(rownum)).text
    MarketCap = table_driver.find_element(By.XPATH, value="//tr[{}]/td[8]".format(rownum)).text	

    return {
    'Symbol': Symbol,
    'Name': Name,
    'LastPrice': LastPrice,
    'MarketTime': MarketTime,
    'Change': Change,
    'PercentChange': PercentChange,
    'Volume': Volume,
    'MarketCap': MarketCap
    }
 '''   

In [27]:
def get_table_rows(driver):
    TABLE_CLASS = "W(100%)"  
    tablerows = len(driver.find_elements(By.XPATH, value="//table[@class= '{}']/tbody/tr".format(TABLE_CLASS)))
    return tablerows

In [28]:
def get_table_header(driver):
    header = driver.find_elements(By.TAG_NAME, value= 'th')
    header_list = [item.text for index, item in enumerate(header) if index < 10]
    return header_list

use below to explain 
```
rownum = 1
colnum = 3
driver.find_element(By.XPATH, value="//tr[{}]/td[{}]".format(rownum,colnum)).text
```

In [29]:
def parse_table_rows(rownum, driver, header_list):
    row_dictionary = {}
    for index , item in enumerate(header_list):
        row_dictionary[item] = driver.find_element(By.XPATH, value="//tr[{}]/td[{}]".format(rownum, index+1)).text
    return row_dictionary

In [33]:
YAHOO_FINANCE_URL = 'https://finance.yahoo.com/cryptocurrencies'#'https://finance.yahoo.com/trending-tickers'

print('Creating driver')
driver = get_driver()



Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome


Creating driver


Driver [/Users/vinoddhole/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


In [34]:
header_list = get_table_header(driver)

In [35]:
TOTAL_PAGES_TO_SCRAPE = 10
table_data = []
next_button_class = '//*[@id="scr-res-table"]/div[2]/button[3]'
for page in range(1, TOTAL_PAGES_TO_SCRAPE + 1):
    
    print('Getting Table row count for Page : {}'.format(page))
    table_rows = get_table_rows(driver)
    
    print('Parsing Page : {}'.format(page))
    table_data += [parse_table_rows(i, driver, header_list) for i in range (1, table_rows + 1)]
    
    print('Clicking Next Button')
    next_button = driver.find_elements(By.XPATH, value = next_button_class)    
    next_button[0].click()

Getting Table row count for Page : 1
Parsing Page : 1
Clicking Next Button
Getting Table row count for Page : 2
Parsing Page : 2
Clicking Next Button
Getting Table row count for Page : 3
Parsing Page : 3


StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: headless chrome=98.0.4758.109)
Stacktrace:
0   chromedriver                        0x0000000100c94ee9 chromedriver + 5013225
1   chromedriver                        0x0000000100c201d3 chromedriver + 4534739
2   chromedriver                        0x00000001007f6a68 chromedriver + 170600
3   chromedriver                        0x00000001007f98b1 chromedriver + 182449
4   chromedriver                        0x00000001007f96d1 chromedriver + 181969
5   chromedriver                        0x00000001007f996c chromedriver + 182636
6   chromedriver                        0x0000000100825b59 chromedriver + 363353
7   chromedriver                        0x00000001008487e2 chromedriver + 505826
8   chromedriver                        0x000000010081fbf5 chromedriver + 338933
9   chromedriver                        0x00000001008488ee chromedriver + 506094
10  chromedriver                        0x000000010085b074 chromedriver + 581748
11  chromedriver                        0x00000001008486d3 chromedriver + 505555
12  chromedriver                        0x000000010081e76e chromedriver + 333678
13  chromedriver                        0x000000010081f745 chromedriver + 337733
14  chromedriver                        0x0000000100c50efe chromedriver + 4734718
15  chromedriver                        0x0000000100c6aa19 chromedriver + 4839961
16  chromedriver                        0x0000000100c701c8 chromedriver + 4862408
17  chromedriver                        0x0000000100c6b3aa chromedriver + 4842410
18  chromedriver                        0x0000000100c45a01 chromedriver + 4688385
19  chromedriver                        0x0000000100c86538 chromedriver + 4953400
20  chromedriver                        0x0000000100c866c1 chromedriver + 4953793
21  chromedriver                        0x0000000100c9c225 chromedriver + 5042725
22  libsystem_pthread.dylib             0x00007fff2065a8fc _pthread_start + 224
23  libsystem_pthread.dylib             0x00007fff20656443 thread_start + 15


In [None]:
len(table_data)

In [None]:
!pip install pandas --upgrade --quiet

In [None]:
import pandas as pd 

In [None]:
print('Save the data to a CSV')
table_df = pd.DataFrame(table_data)
#print(table_df)
table_df.to_csv('cryptocurrencies.csv', index=None)
#### add timer 

In [None]:
table_df

In [None]:
header = driver.find_elements(By.TAG_NAME, value= 'th')

In [None]:
header[0].text

In [None]:
rownum=2
txt=driver.find_element(By.XPATH, value="//tr[{}]/td[2]".format(rownum)).text
txt

In [None]:
##OLD CODE REMOVE 
'''
YAHOO_FINANCE_URL = 'https://finance.yahoo.com/trending-tickers'

print('Creating driver')
driver = get_driver()

print('Fetching the page')
table_rows = get_tickers(driver)

print(f'Found {table_rows} Tickers')

print('Parsing Trending tickers')
ticker_data = [parse_ticker(i, driver) for i in range (1, table_rows + 1)]

print('Save the data to a CSV')
tickers_df = pd.DataFrame(ticker_data)
#print(tickers_df)
tickers_df.to_csv('trending-tickers.csv', index=None)
'''

**Installation**

Anaconda: Download and install it from this link https://www.anaconda.com/ . We will be using Jupyter Notebook for writing the code
Chromedriver — Webdriver for Chrome: Download it from this link https://chromedriver.chromium.org/downloads. No need of installing, just copy the file in the folder where we will create the python file. But before downloading, confirm that the driver‘s version matches that of the Chrome browser installed.

In [1]:
jovian.commit(project="yahoo-finance-web-scraper")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "vinodvidhole/yahoo-finance-web-scraper" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/vinodvidhole/yahoo-finance-web-scraper[0m


'https://jovian.ai/vinodvidhole/yahoo-finance-web-scraper'

future 
fix timezone in market events 

-do to

-check above notes 

-testing - done normal, zero rows 

-comments 

-print statements & function doc strings  

-code clean up *** is applicable 

-documentation

## reference
    https://htmldog.com/guides/html/