# COURSERA CAPSTONE NOTEBOOK
## Alternative NASDAQ stock price predictor

#### Background:

I never quite understood how stock prices work. When I go on websites to read about stock quotes, it’s common for me to look at historical prices and graphs that show trends of prices. However, there are often other information of the stock, for example dividend yield, volume, etc. which I am never quite able to wrap my mind around how these indices affect the price. This project aims to analyse stock quotes of the components in US NASDAQ and see how well a model predicts prices using indices other than historical prices.

#### Data Description:

I would use the yfinance library ticker module to obtain information of stocks in NASDAQ-100.


#### Methodology:

I would import the information dictionary into a pandas DataFrame and clean the irrelevant information. The remaining data, namely the market cap, enterprise value, enterprise value/revenue, enterprise value/EBITDA, financial data, entire balance sheet, entire cash flow statement, entire share statistics, entire dividends & splits values except the forward values. The variable to be predicted is the stock price. I would split the data into train and test sets. I would train the model with the train set and use the model to predict results for the test set. Then I would run accuracy analysis on the results and see how well the model performs. 
I would also rank each variable by their correlation to the price. Some graphs are plotted for visualization.


In [None]:
# import libraries
!pip install yfinance
import yfinance as yf
import pandas as pd
import numpy as np

In [None]:
# import more libraries
!pip install bs4
import requests
from bs4 import BeautifulSoup

In [None]:
# website containing tickers of stocks in NASDAQ-100 for webscraping
url= "https://en.wikipedia.org/wiki/Nasdaq-100"

In [None]:
# scrape the website
html_data= requests.get(url).text

In [None]:
soup=BeautifulSoup(html_data,"html5lib")

In [None]:
# save the tickers in a list
list_of_tickers=[]
for x in range(1,408,4):
    list_of_tickers.append(soup.find_all("tbody")[3].find_all("td")[x].text)
print(list_of_tickers)

In [None]:
# let's have a look at what Apple ticker.info contains and choose the relevant keys into our analysis
AAPL=yf.Ticker("AAPL").info

In [None]:
# input the relevant data from ticker.info into our Dataframe
raw_data=pd.DataFrame()
for i in ['symbol',
           'marketCap',
           'enterpriseValue',
           'enterpriseToRevenue',
           'enterpriseToEbitda',
           'profitMargins',
           'trailingAnnualDividendYield',
           'payoutRatio',
           'averageDailyVolume10Day',
           'trailingAnnualDividendRate',
           'averageVolume',
           'askSize',
           'fiveYearAvgDividendYield',
           'bidSize',
           'dividendYield',
           'sharesOutstanding',
           'bookValue',
           'sharesShort',
           'sharesPercentSharesOut',
           'netIncomeToCommon',
           'trailingEps',
           'lastDividendValue',
           'shortRatio',
           'floatShares',
           'earningsQuarterlyGrowth',
           'shortPercentOfFloat',
           'sharesShortPriorMonth',
           'heldPercentInstitutions',
           'heldPercentInsiders']:
    temporary_list=[]
    for x in list_of_tickers[:2]:
        if yf.Ticker(x).info[i]!= None:
            temporary_list=temporary_list + [yf.Ticker(x).info[i]] 
        else:
            temporary_list=temporary_list + [0]
    raw_data[i]=temporary_list
print(raw_data)

In [None]:
raw_data=pd.DataFrame()
for x in list_of_tickers:
    print(x)
    raw_data[i]=yf.Ticker(x).info[i] for i in ['symbol',
                                               'marketCap',
                                               'enterpriseValue',
                                               'enterpriseToRevenue',
                                               'enterpriseToEbitda',
                                               'profitMargins',
                                               'trailingAnnualDividendYield',
                                               'payoutRatio',
                                               'averageDailyVolume10Day',
                                               'trailingAnnualDividendRate',
                                               'averageVolume',
                                               'askSize',
                                               'fiveYearAvgDividendYield',
                                               'bidSize',
                                               'dividendYield',
                                               'sharesOutstanding',
                                               'bookValue',
                                               'sharesShort',
                                               'sharesPercentSharesOut',
                                               'netIncomeToCommon',
                                               'trailingEps',
                                               'lastDividendValue',
                                               'shortRatio',
                                               'floatShares',
                                               'earningsQuarterlyGrowth',
                                               'shortPercentOfFloat',
                                               'sharesShortPriorMonth',
                                               'heldPercentInstitutions',
                                               'heldPercentInsiders']
raw_data.set_index('symbol', inplace=True)
print(raw_data)

In [None]:
yf.Ticker('AAPL').financials.iloc[:,0]

In [None]:
raw_data=pd.DataFrame()
for i in ['Research Development',
          'Income Before Tax',
          'Net Income',
          'Selling General Administrative',
          'Gross Profit',
          'Ebit',
          'Operating Income',
          'Interest expenses',
          'Income Tax Expense',
          'Total Revenue',
          'Total Operating Expenses',
          'Cost of Revenue',
          'Total Other Income Expense Net',
          'Net Income From Continuing Ops',
          'Net Income Applicable To Common Shares']:
    temporary_list=[]
    for x in list_of_tickers[:2]:
        if yf.Ticker(x).financials.iloc[:,0][i]!= None:
            temporary_list=temporary_list + [yf.Ticker(x).financials.iloc[:,0].loc[i]] 
        else:
            temporary_list=temporary_list + [0]
    raw_data[i]=temporary_list
print(raw_data)

In [None]:
yf.Ticker('AAPL').financials
for x in list_of_tickers:
    raw_data[i]=yf.Ticker(x).financials.iloc[:,0].loc[i] for i in ['Research Development','Income Before Tax','Net Income','Selling General Administrative','Gross Profit','Ebit','Operating Income','Interest expenses','Income Tax Expense','Total Revenue', 'Total Operating Expenses','Cost of Revenue', 'Total Other Income Expense Net','Net Income From Continuing Ops','Net Income Applicable To Common Shares']
print(raw_data)

In [None]:
yf.Ticker('AAPL').balance_sheet
for x in list_of_tickers:
    raw_data[i]=yf.Ticker(x).balance_sheet.iloc[:,0].loc[i] for i in yf.Ticker(x).balance_sheet.index.tolist()

In [None]:
yf.Ticker('AAPL').cashflow
for x in list_of_tickers:
    raw_data[i]=yf.Ticker(x).cashflow.iloc[:,0].loc[i] for i in yf.Ticker(x).cashflow.index.tolist()
print(raw_data)