# Historical Stock Prices

We need to get the stock prices of all the companies for which we have fundamental data.

We will use this information to compute returns, and fundamental metrics.

We will relay on the package YahooFinancials in order to get the ticker prices.

In [None]:
from yahoofinancials import YahooFinancials
import pandas as pd

In [161]:
# Load all the companies tickers and pivot by ticker_type
original_tickers_df = pd.read_csv('tickers.csv', header=None, names=["ccvm", "ticker", "ticker_type"])
tickers_df = pd.pivot(original_tickers_df, index='ccvm', columns='ticker_type', values='ticker')
tickers_df = tickers_df.add_prefix("ticker_type_")
tickers_df

ticker_type,ticker_type_3,ticker_type_4,ticker_type_5,ticker_type_6
ccvm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
94,PATI3,PATI4,,
531,ARTR3,,,
574,ARLA3,ARLA4,,
701,BAHI3,,,
906,BBDC3,BBDC4,,
...,...,...,...,...
23612,MSRO3,,,
23710,HCBR3,,,
23728,PRCA3,,,
24783,NTCO3,,,


## Ticker Types

In the cash market, tickers are composed by four letters, a number, and a suffix in some cases. The letters stand for the listed company and the number disclosed the equity type, as follows

|  Number |  Class | Trade name indication
|---|---|---|
| 3 | common share | ON (''ordinária nominativa'') |
| 4 | preferred share | PN (''preferencial nominativa'') |
| 5 | preferred share class A | PNA |
| 6 | preferred share class B | PNB |

In order to get the stock prices, we are going to start with the ON ticker, and then try with the PN, PNA, and PNB.

In [162]:
# Initialize DSE Cluster connection
try:
    from dse.cluster import Cluster
except ImportError:
    from cassandra.cluster import Cluster

cluster = Cluster(['tfm_uoc_dse'])  # provide contact points and port
session = cluster.connect('tfm_uoc')

In [163]:
original_companies_df = session.execute("select ccvm, company_name, cnpj from bovespa_company;")
original_companies_df = pd.DataFrame(
    [{
        "ccvm": int(x.ccvm), 
        "company_name": x.company_name,
        "cnpj": x.cnpj} 
            for x in list(original_companies_df)])

In [164]:
def get_companies_accounts(session):
    """
    This function computes the total number of financial accounts
    informed by company.
    
    It returns a pandas dataframe with the company id (ccvm)
    and the number of accounts.
    """
    import json

    solr_query = {
        "q": "*:*",
        "facet": {
            "field": "ccvm_exact",
            "limit": 10000
        }
    }

    accounts_per_company_query = \
        f"select * from bovespa_account WHERE solr_query='{json.dumps(solr_query)}'"

    accounts_per_company = session.execute(accounts_per_company_query).one()
    accounts_per_company = json.loads(accounts_per_company.facet_fields)["ccvm_exact"]
    accounts_per_company = pd.DataFrame([{
        "ccvm": int(ccvm), 
        "num_accounts": num_accounts } 
            for ccvm, num_accounts in accounts_per_company.items()])
    display("Total number of companies: {}".format(len(accounts_per_company)))
    return accounts_per_company

In [165]:
# Obtain all the companies with fundamental data 
companies_with_fundamentals_df = get_companies_accounts(session)
companies_with_fundamentals_df.head(10)

'Total number of companies: 782'

Unnamed: 0,ccvm,num_accounts
0,11070,48111
1,21067,44244
2,20010,43019
3,20931,42939
4,22020,41496
5,15253,40267
6,21636,39393
7,17450,38341
8,21490,38089
9,19569,38064


In [166]:
# Compute the ticker for each company (CCVM)
companies_df = pd.merge(left=companies_with_fundamentals_df, 
                        right=original_companies_df, 
                        how='left', 
                        left_on='ccvm', 
                        right_on='ccvm')

companies_df = pd.merge(left=companies_df, 
                        right=tickers_df, 
                        how='left', 
                        left_on='ccvm', 
                        right_on='ccvm')

companies_df.head(10)

Unnamed: 0,ccvm,num_accounts,company_name,cnpj,ticker_type_3,ticker_type_4,ticker_type_5,ticker_type_6
0,11070,48111,WLM INDÚSTRIA E COMÉRCIO S.A.,33.228.024/0001-51,SGAS3,SGAS4,,
1,21067,44244,MOURA DUBEUX ENGENHARIA S/A,12.049.631/0001-84,,,,
2,20010,43019,EQUATORIAL ENERGIA S/A,03.220.438/0001-73,EQTL3,,,
3,20931,42939,MINERVA S/A,67.620.377/0001-14,BEEF3,,,
4,22020,41496,JSL S.A.,52.548.435/0001-79,JSLG3,,,
5,15253,40267,ENERGISA SA,00.864.214/0001-06,ENGI3,ENGI4,,
6,21636,39393,RENOVA ENERGIA S/A,08.534.605/0001-74,RNEW3,RNEW4,,
7,17450,38341,RUMO S.A.,02.387.241/0001-60,,,,
8,21490,38089,ALUPAR INVESTIMENTO S/A,08.364.948/0001-38,ALUP3,ALUP4,,
9,19569,38064,GOL LINHAS AEREAS INTELIGENTES SA,06.164.253/0001-87,,GOLL4,,


In [167]:
companies_no_ticker_df = companies_df[(pd.isnull(companies_df["ticker_type_3"])) & 
                                      (pd.isnull(companies_df["ticker_type_4"])) &
                                      (pd.isnull(companies_df["ticker_type_5"])) &
                                      (pd.isnull(companies_df["ticker_type_6"]))]
companies_no_ticker_df = companies_no_ticker_df.sort_values(['ccvm'], ascending=[1])
print(f"Companies without Ticker Info: {companies_no_ticker_df.count()}")

Companies without Ticker Info: ccvm             454
num_accounts     454
company_name     454
cnpj             454
ticker_type_3      0
ticker_type_4      0
ticker_type_5      0
ticker_type_6      0
dtype: int64


In [168]:
companies_ticker_df = companies_df[(pd.notnull(companies_df["ticker_type_3"])) |
                                      (pd.notnull(companies_df["ticker_type_4"])) |
                                      (pd.notnull(companies_df["ticker_type_5"])) |
                                      (pd.notnull(companies_df["ticker_type_6"]))]
companies_ticker_df = companies_ticker_df.sort_values(['ccvm'], ascending=[1])
print(f"Companies with Ticker Info: {companies_ticker_df.count()}")

Companies with Ticker Info: ccvm             328
num_accounts     328
company_name     328
cnpj             328
ticker_type_3    326
ticker_type_4    111
ticker_type_5     24
ticker_type_6      1
dtype: int64


In [174]:
# for index, ticker in enumerate(tickers.to_list()):
company_stock_prices = {}
for index, row in companies_ticker_df.iterrows():
    available_tickers = []
    if pd.notna(row["ticker_type_3"]):
        available_tickers.append("ticker_type_3")
    if pd.notna(row["ticker_type_4"]):
        available_tickers.append("ticker_type_4")
    if pd.notna(row["ticker_type_5"]):
        available_tickers.append("ticker_type_5")
    if pd.notna(row["ticker_type_6"]):
        available_tickers.append("ticker_type_6")

    for ticker_column in available_tickers:    
        ticker = row[ticker_column]
        print(f"Getting ticker [{index}]: {ticker}.SA")
        yahoo_financials = YahooFinancials(f"{ticker}.SA")
        historical_stock_prices = yahoo_financials.get_historical_price_data('2000-01-01', '2020-01-03', 'daily')
        company_stock_prices[ticker] = historical_stock_prices[f"{ticker}.SA"]

Getting ticker [86]: BBAS3.SA
Getting ticker [459]: BGIP3.SA
Getting ticker [459]: BGIP4.SA
Getting ticker [144]: BEES3.SA
Getting ticker [144]: BEES4.SA
Getting ticker [616]: BPAR3.SA
Getting ticker [194]: BRSR3.SA
Getting ticker [194]: BRSR5.SA
Getting ticker [277]: BNBR3.SA
Getting ticker [453]: BMIN3.SA
Getting ticker [453]: BMIN4.SA
Getting ticker [175]: BMEB3.SA
Getting ticker [175]: BMEB4.SA
Getting ticker [215]: BRIV3.SA
Getting ticker [215]: BRIV4.SA
Getting ticker [77]: BDLL3.SA
Getting ticker [77]: BDLL4.SA
Getting ticker [229]: BALM3.SA
Getting ticker [229]: BALM4.SA
Getting ticker [314]: BAUH3.SA
Getting ticker [314]: BAUH4.SA
Getting ticker [131]: BMKS3.SA
Getting ticker [460]: BMTO3.SA
Getting ticker [460]: BMTO4.SA
Getting ticker [684]: BUET3.SA
Getting ticker [684]: BUET4.SA
Getting ticker [530]: CAMB3.SA
Getting ticker [530]: CAMB4.SA
Getting ticker [519]: RANI3.SA
Getting ticker [519]: RANI4.SA
Getting ticker [103]: CMIG3.SA
Getting ticker [103]: CMIG4.SA
Getting tic

Getting ticker [239]: CSRN3.SA
Getting ticker [239]: CSRN5.SA
Getting ticker [226]: CELP3.SA
Getting ticker [226]: CELP5.SA
Getting ticker [342]: PRPT3B.SA
Getting ticker [244]: GETI3.SA
Getting ticker [244]: GETI4.SA
Getting ticker [251]: GEPA3.SA
Getting ticker [251]: GEPA4.SA
Getting ticker [180]: TRPL3.SA
Getting ticker [180]: TRPL4.SA
Getting ticker [24]: IDNT3.SA
Getting ticker [126]: UGPA3.SA
Getting ticker [379]: MNZC3B.SA
Getting ticker [249]: DTCY3.SA
Getting ticker [249]: DTCY4.SA
Getting ticker [498]: SAPR3.SA
Getting ticker [498]: SAPR4.SA
Getting ticker [406]: VDNP3B.SA
Getting ticker [197]: CPFE3.SA
Getting ticker [128]: PEAB3.SA
Getting ticker [128]: PEAB4.SA
Getting ticker [160]: BRAP3.SA
Getting ticker [160]: BRAP4.SA
Getting ticker [94]: IVPR3B.SA
Getting ticker [94]: IVPR4B.SA
Getting ticker [61]: CCRO3.SA
Getting ticker [217]: CANT3B.SA
Getting ticker [217]: CANT4B.SA
Getting ticker [694]: CALA3B.SA
Getting ticker [694]: CALA4B.SA
Getting ticker [369]: CAIA3B.SA
Ge

In [287]:
# Generate the dataset with the historical prices for all the tickers.
stock_prices = []
tickers_with_prices = []
for ticker, historical_data in company_stock_prices.items():
    if "prices" in historical_data and len(historical_data["prices"]) > 0:
        tickers_with_prices.append(ticker)
        for stock_price in historical_data["prices"]:
            data = stock_price.copy()
            data["ticker"] = ticker
            data["type"] = "EQUITY"
            stock_prices.append(data)
                        
stock_prices_df = pd.DataFrame(stock_prices)
# Convert the 'formatted_date' column type from string to datetime
stock_prices_df['formatted_date'] =  pd.to_datetime(stock_prices_df['formatted_date'], format='%Y-%m-%d')            

In [288]:
print(f"We have {len(tickers_with_prices)} with historic stock prices")

We have 320 with historic stock prices


In [289]:
stock_prices_df.head(10)

Unnamed: 0,date,high,low,open,close,volume,adjclose,formatted_date,ticker,type,amount,data
0,946900800,2.85333,2.73333,2.8,2.73333,514800.0,1.549851,2000-01-03,BBAS3,EQUITY,,
1,946987200,2.74,2.60333,2.73333,2.60333,314100.0,1.476138,2000-01-04,BBAS3,EQUITY,,
2,947073600,2.63333,2.50667,2.6,2.63,478800.0,1.49126,2000-01-05,BBAS3,EQUITY,,
3,947160000,2.66667,2.6,2.63,2.66667,205200.0,1.512054,2000-01-06,BBAS3,EQUITY,,
4,947246400,2.66667,2.60667,2.66667,2.60667,394200.0,1.478032,2000-01-07,BBAS3,EQUITY,,
5,947505600,2.75333,2.66667,2.73,2.75,549000.0,1.559303,2000-01-10,BBAS3,EQUITY,,
6,947592000,2.85333,2.67667,2.76667,2.71333,642600.0,1.53851,2000-01-11,BBAS3,EQUITY,,
7,947678400,2.76667,2.68333,2.73333,2.75,584100.0,1.559303,2000-01-12,BBAS3,EQUITY,,
8,947764800,2.83333,2.70333,2.8,2.70333,551700.0,1.532841,2000-01-13,BBAS3,EQUITY,,
9,947851200,2.76667,2.7,2.73333,2.7,248400.0,1.530952,2000-01-14,BBAS3,EQUITY,,


In [242]:
# Save the dataset as a CSV file
stock_prices_df.to_csv('stock_prices.csv')

In [255]:
# Generate the dataset with the historical events for all the tickers.
ticker_events = []
for ticker, historical_data in company_stock_prices.items():
    if "eventsData" in historical_data:        
        if "dividends" in historical_data["eventsData"] and len(historical_data["eventsData"]["dividends"]) > 0:
            for date, dividend in historical_data["eventsData"]["dividends"].items():
                data = dividend.copy()
                data["type"] = "dividend"
                data["ticker"] = ticker
                ticker_events.append(data)

        if "splits" in historical_data["eventsData"] and len(historical_data["eventsData"]["splits"]) > 0:
            for date, split in historical_data["eventsData"]["splits"].items():
                data = split.copy()
                data["type"] = "split"                
                data["ticker"] = ticker
                ticker_events.append(data)

ticker_events_df = pd.DataFrame(ticker_events)
# Convert the 'formatted_date' column type from string to datetime
ticker_events_df['formatted_date'] =  pd.to_datetime(ticker_events_df['formatted_date'], format='%Y-%m-%d')            

In [256]:
ticker_events_df.head(10)

Unnamed: 0,amount,date,formatted_date,type,ticker,numerator,denominator,splitRatio
0,0.285821,1347454800,2012-09-12,dividend,BBAS3,,,
1,0.156962,1305896400,2011-05-20,dividend,BBAS3,,,
2,0.125885,1321617600,2011-11-18,dividend,BBAS3,,,
3,0.126652,1473771600,2016-09-13,dividend,BBAS3,,,
4,0.090568,1457010000,2016-03-03,dividend,BBAS3,,,
5,0.297177,1355400000,2012-12-13,dividend,BBAS3,,,
6,0.141039,1440421200,2015-08-24,dividend,BBAS3,,,
7,0.446155,1566478800,2019-08-22,dividend,BBAS3,,,
8,0.451138,1432299600,2015-05-22,dividend,BBAS3,,,
9,0.137755,1465909200,2016-06-14,dividend,BBAS3,,,


In [257]:
# Save the dataset as a CSV file
ticker_events_df.to_csv('ticker_events.csv')

In [262]:
# Generate the dataset with the ticker details.
ticker_details = []
for ticker, historical_data in company_stock_prices.items():
    ticker_data = {
        "ticker": ticker        
    }
    
    if "firstTradeDate" in historical_data: 
        ticker_data["first_trade_date"] = historical_data["firstTradeDate"]["date"]
        ticker_data["first_trade_formatted_date"] = pd.to_datetime(
            historical_data["firstTradeDate"]["formatted_date"], 
            format='%Y-%m-%d')
        
    if "currency" in historical_data:
        ticker_data["currency"] = historical_data["currency"]

    if "instrumentType" in historical_data:
        ticker_data["instrument_type"] = historical_data["instrumentType"]

    if "timeZone" in historical_data:
        ticker_data["time_zone_gmt_offset"] = historical_data["timeZone"]["gmtOffset"]
    
    ticker_details.append(ticker_data)

ticker_details_df = pd.DataFrame(ticker_details)

In [263]:
ticker_details_df.head(10)

Unnamed: 0,ticker,first_trade_date,first_trade_formatted_date,currency,instrument_type,time_zone_gmt_offset
0,BBAS3,946899900.0,2000-01-03,BRL,EQUITY,-7200.0
1,BGIP3,1535720000.0,2018-08-31,BRL,EQUITY,-7200.0
2,BGIP4,1199274000.0,2008-01-02,BRL,EQUITY,-7200.0
3,BEES3,946899900.0,2000-01-03,BRL,EQUITY,-7200.0
4,BEES4,1211892000.0,2008-05-27,BRL,EQUITY,-7200.0
5,BPAR3,1556196000.0,2019-04-25,BRL,EQUITY,-7200.0
6,BRSR3,946899900.0,2000-01-03,BRL,EQUITY,-7200.0
7,BRSR5,1534424000.0,2018-08-16,BRL,EQUITY,-7200.0
8,BNBR3,946986300.0,2000-01-04,BRL,EQUITY,-7200.0
9,BMIN3,1530536000.0,2018-07-02,BRL,EQUITY,-7200.0


In [264]:
# Save the dataset as a CSV file
ticker_details_df.to_csv('ticker_details.csv')

In [272]:
print(f"Total data points: {stock_prices_df.count()}")

Total data points: date              924037
high              859704
low               859704
open              859704
close             859704
volume            859704
adjclose          859704
formatted_date    924037
ticker            924037
amount                 5
type                   5
data                   5
dtype: int64


In [271]:
stock_prices_aggr_df = stock_prices_df.groupby(["ticker"]).count()
stock_prices_aggr_df.mean()

date              2887.615625
high              2686.575000
low               2686.575000
open              2686.575000
close             2686.575000
volume            2686.575000
adjclose          2686.575000
formatted_date    2887.615625
amount               0.015625
type                 0.015625
data                 0.015625
dtype: float64

# Bovespa Index

We need to get the history index price of the index of reference of Bovespa, IBOV.

This index is accessible through the ticker: __^BVSP__.

We will save the result in the file: __ibov.csv__.

In [281]:
finance = YahooFinancials(f"^BVSP")
ibov_data = finance.get_historical_price_data('2000-01-01', '2020-01-03', 'daily')

In [282]:
ibov_data

{'^BVSP': {'eventsData': {},
  'firstTradeDate': {'formatted_date': '1993-04-27', 'date': 735914700},
  'currency': 'BRL',
  'instrumentType': 'INDEX',
  'timeZone': {'gmtOffset': -7200},
  'prices': [{'date': 946900800,
    'high': 17408.0,
    'low': 16719.0,
    'open': 17098.0,
    'close': 16930.0,
    'volume': 0,
    'adjclose': 16930.0,
    'formatted_date': '2000-01-03'},
   {'date': 946987200,
    'high': 16908.0,
    'low': 15851.0,
    'open': 16908.0,
    'close': 15851.0,
    'volume': 0,
    'adjclose': 15851.0,
    'formatted_date': '2000-01-04'},
   {'date': 947073600,
    'high': 16302.0,
    'low': 15350.0,
    'open': 15871.0,
    'close': 16245.0,
    'volume': 0,
    'adjclose': 16245.0,
    'formatted_date': '2000-01-05'},
   {'date': 947160000,
    'high': 16499.0,
    'low': 15977.0,
    'open': 16237.0,
    'close': 16107.0,
    'volume': 0,
    'adjclose': 16107.0,
    'formatted_date': '2000-01-06'},
   {'date': 947246400,
    'high': 16449.0,
    'low': 161

In [297]:
# Generate the dataset with the historical prices for all the tickers.
ibov_prices = []
for ticker, historical_data in ibov_data.items():
    if "prices" in historical_data and len(historical_data["prices"]) > 0:
        for index_price in historical_data["prices"]:
            data = index_price.copy()
            data["ticker"] = ticker
            data["type"] = "INDEX"
            ibov_prices.append(data)
                        
stock_prices_df = pd.DataFrame(stock_prices)
# Convert the 'formatted_date' column type from string to datetime
stock_prices_df['formatted_date'] =  pd.to_datetime(stock_prices_df['formatted_date'], format='%Y-%m-%d')            

In [298]:
GEBR10Y_data

{'^GEBR10Y': {'eventsData': {}}}