# Prototype Of Stock Sets

This notebook will serve as a small analyzing of a set of stocks to determine whether this data is scalable. At a minimum, 5 - 10 stocks are selected, and will be prototyped in order to apply to a larger dataset. 

The stocks that are chosen for this particular prototype are the top 10 stocks across the years 2010-2020. 

In [5]:
# If you wish to install the modules needed, here are a list of the installations
#!pip install pandas
#!pip install pandas_datareader

# Import Statements
import pandas as pd
from pandas import DataFrame
import pandas_datareader.data as web
import numpy as np
import scipy as sp
import csv

# For graphing purposes
import math
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib import style
%matplotlib inline

# For styling
import datetime as dt
import os

# Imports for collections
import collections
from collections import Counter

# Imports for Machine Learning
import sklearn
from sklearn.model_selection import train_test_split

print("Done importing")

Done importing


# Stock Analysis

The important segment of understanding stock analysis is to run a regression on the years where no significant events may have caused any problems to arise. An example of this would be years where there may have been political fluctuations where there is a cause for the stock market to be affected. 

For the purposes of this study, the dates of 2000 - 2005 are analyzed, since no significant event on a sector of the stocks may have been affected. 

In [2]:
# Usage of the Stock Market Data
start = dt.datetime(2000, 1, 1) # First of the 5 year increment
end = dt.datetime(2005, 1, 1) # Second of the 5 year increment

# Selection of 5 stocks to run a base analysis on
aapl = web.DataReader('AAPL', 'yahoo', start, end)
nvda = web.DataReader('NVDA', 'yahoo', start, end)
msft = web.DataReader('MSFT', 'yahoo', start, end)
amzn = web.DataReader('AMZN', 'yahoo', start, end)
goog = web.DataReader('GOOGL', 'yahoo', start, end)

# Print out and determine the given values
print(aapl)
print(nvda)
print(msft)
print(amzn)
print(goog)

                High       Low      Open     Close       Volume  Adj Close
Date                                                                      
2000-01-03  1.004464  0.907924  0.936384  0.999442  535796800.0   0.856887
2000-01-04  0.987723  0.903460  0.966518  0.915179  512377600.0   0.784642
2000-01-05  0.987165  0.919643  0.926339  0.928571  778321600.0   0.796124
2000-01-06  0.955357  0.848214  0.947545  0.848214  767972800.0   0.727229
2000-01-07  0.901786  0.852679  0.861607  0.888393  460734400.0   0.761677
...              ...       ...       ...       ...          ...        ...
2004-12-27  1.163393  1.122857  1.157143  1.127857  559490400.0   0.966985
2004-12-28  1.147321  1.108036  1.130357  1.146071  611755200.0   0.982601
2004-12-29  1.160357  1.135179  1.139464  1.150714  449562400.0   0.986582
2004-12-30  1.161250  1.146786  1.157321  1.157143  345340800.0   0.992094
2004-12-31  1.160714  1.143393  1.158750  1.150000  278588800.0   0.985970

[1256 rows x 6 columns]


## GICS Sector and Market Analysis

The GICS sector in economics comes from an understanding that there is methodology for assigning companies and their value to the economic sector that is correlated to its operation of business. There are a total of 11 GICS sector, with their corresponding stocks being a sub-division of each.

The sector definitions are as follows:

* Energy
* Materials
* Industrials
* Consumer Discretionary
* Consumer Staples
* Health Care
* Financials
* Information Technology
* Real Estate
* Communication Services
* Utilities Sector

## Stock Analysis of Volatility

In [4]:
'''
This is dedicated to the NASDAQ CSV's and their respective downloads
'''
# All stocks
nasdaq_stocks = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_all.csv')
# Energy
energy_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_energy.csv')
# Materials
    
# Industrials
# Consumer Discretionary
cons_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_consumerservices.csv')
# Consumer Staples
# Health Care
health_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_healthcare.csv')
# Financials
finance_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_finance.csv')
# Information Technology
tech_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_tech.csv')
# Real Estate
# Communication Services
# Utilities Sector
utils_nasdaq = pd.read_csv('/Users/taylor/UH_Manoa/ICS_438/Assignments/Final-Project/big-data-stock-analysis/data/nasdaq_utils.csv')

# Store the sectors in a dictionary with a specific bin


In [14]:
# Select an n number of stocks randomly from m given stocks
# Can do NASDAQ, NYSE, AMEX 
# Downloaded from NASDAQ
# https://www.nasdaq.com/market-activity/stocks/screener?exchange=NASDAQ&render=download
num_stock_available = 500 # The total number of stocks (S & P 500)
num_stocks = 5 # The number of stocks to select as a 1%
x = sp.random.uniform(low=1, high=num_stock_available, size=num_stocks)
sp.random.seed(50)
y = []
for i in range(num_stocks):
    y.append(int(x[i]))
unique_stocks = np.unique(y)
print(unique_stocks)
print(len(unique_stocks))

# The above is placeholder code for the actual stocks that are going to be used

[114 128 189 198 247]
5


In [None]:
# Classify top stocks from 2000 - 2005

# Verification into different sectors

# Classify top stocks from 2012-2017