# Analyzing Stock Prices with Dictionaries

In this project, we will be analyzing daily stock prices from `2007-01-01` to `2017-04-17` for several hundred stocks traded on the NASDAQ stock exchange. The data are stored in the `prices` folder of which, each csv file corresponds to a particular stock. The data was scraped from Yahoo Finance using the `yahoo_finance` module, of which the script can be found in the same folder as this Jupyter notebook - `download_data.py`. It is important to note that stock trading does not happen on weekends of special days like holidays so there would be gaps between dates.

The data dictionary describing what each column represents is as follows:
* `date` -- date that the data is from.
* `close` -- the closing price on that day, which is the price when the trading day ends.
* `open` -- the opening price on that day, which is the price when the trading day starts.
* `high` -- the highest price the stock reached during trading.
* `low` -- the lowest price the stock reached during trading.
* `volume` -- the number of shares that were traded during the day.

# Reading in the Data

In [1]:
# importing the relevant libraries
import pandas as pd
import os

# storing the data into a dictionary with the stock tick symbol as the key, a pandas DataFrame as value
data = {}
for fn in os.listdir("prices"):
    df = pd.read_csv(os.path.join("prices", fn))
    key = fn.rstrip(".csv")
    data[key] = df

In [2]:
# displaying first five rows of data corresponding to Apple's stock
data["aapl"].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600


# Summary Statistics for Each Stock

Subsequently, we can proceed to compute the summary statistics/aggregates for each stock, as well as across all stocks. We can find the following:

* The average closing price of each stock.
* The minimum average closing price over all stocks.
* The maximum average closing price over all stocks.

In [3]:
# dictionary to store the average closing prices for each stock
avg_close = {}

for stock in data:
    avg_close[stock] = data[stock]["close"].mean()

In [4]:
avg_close

{'dgica': 14.986583006177607,
 'bdge': 24.12035132432432,
 'cvco': 53.36543631042471,
 'blkb': 33.75537838185328,
 'bbox': 25.997579137451737,
 'ffb': 16.002316604633204,
 'fbiz': 22.958876448262547,
 'ffi': 86.29457917374518,
 'bdsi': 4.8207065644787646,
 'amgn': 92.2331003965251,
 'expe': 53.78315830308881,
 'expd': 42.86821235366795,
 'cur': 1.907691699604743,
 'clct': 14.4366796011583,
 'alny': 39.171486488030894,
 'evol': 5.701853281853282,
 'ahgp': 38.20530885868726,
 'dfbg': 1.4005010393822395,
 'afsi': 26.69982658918919,
 'chy': 12.45603860888031,
 'bmrn': 50.52171040733592,
 'agy': 10.303613901544402,
 'adrd': 22.51748262046332,
 'drrx': 2.352779922779923,
 'cru': 20.50549421969112,
 'brew': 8.094903474517373,
 'fbm': 14.224830092664092,
 'emcf': 22.219915054826256,
 'bsqr': 4.305370656370656,
 'csfl': 11.947644780694981,
 'car': 6.87222652972973,
 'cmcsa': 35.904505792277995,
 'capr': 2.473247462919594,
 'cmtl': 30.96300771621621,
 'elo': 11.39328957027027,
 'cplp': 9.9274822

In [9]:
import pandas as pd
avg_close_df = pd.DataFrame.from_dict(avg_close, orient='index', columns=["Average Closing Daily Price"])
avg_close_df

Unnamed: 0,Average Closing Daily Price
dgica,14.986583
bdge,24.120351
cvco,53.365436
blkb,33.755378
bbox,25.997579
...,...
banr,26.604235
an,62.325201
dynt,1.822120
eng,3.644197


In [None]:
# determine the stock symbol with the minimum average closing price
stock_min_avg_close_price = min(avg_close.keys(), key=lambda key: avg_close[key])
stock_min_avg_close_price

In [None]:
min_avg_close_price = avg_close[stock_min_avg_close_price]
min_avg_close_price

In [None]:
# determine the stock symbol with the maximum average closing price
stock_max_avg_close_price = max(avg_close.keys(), key=lambda key: avg_close[key])
stock_max_avg_close_price

In [None]:
max_avg_close_price = avg_close[stock_max_avg_close_price]
max_avg_close_price

# Grouping Trades per Day

Let's create a dictionary where the keys are the dates the values would be a list of all the trades from all stock symbols on that trading day, which would be a tuple of `(volume, stock_symbol)`.

In [None]:
# Dictionary to store the daily trade volume for each stock
daily_trades = {}

for stock in data:
    for index, row in data[stock].iterrows():
        date = row["date"]
        volume = row["volume"]
        if date not in daily_trades:
            daily_trades[date] = []
        daily_trades[date].append((volume,stock))

# Finding Most Traded Stock Each Day

Next, let's make use of the dictionary for daily trade volume for each stock to create another dictionary to determine the most traded stock each day, where the keys would be the dates and the values being a tuple - `(volume, stock_symbol)`.

In [None]:
# Dictionary to store the most traded stock each day
most_traded_daily = {}

for date in daily_trades:
    daily_trades[date].sort()
    most_traded_daily[date] = daily_trades[date][-1]

In [None]:
most_traded_daily

# Searching for High Volume Days

Next, let's search for all transactions on days with unusually high volume. In order to do this, we will need to
* Compute the total volume of trading for each day
* Sort and find the 10 highest volume days overall

In [None]:
daily_volumes = []

for day in daily_trades:
    day_volume = sum([volume for volume, _ in daily_trades[day]])
    daily_volumes.append((day_volume, day))

daily_volumes.sort()

daily_volumes[-10:]

# Finding Profitable Stocks

Let's see which stocks would have been the most profitable to buy. We can do this by:

* Subtracting the initial close price (first row) from the final close price (last row), then computing a percentage relative to the initial price. This will tell us how much our initial investment would have grown or shrunk.
* Sorting all of the percentages.
* Finding the ten stocks that grew the most in the time period.

In [None]:
percentages = []

for stock in data:
    prices = data[stock]
    initial = prices.loc[0, "close"]
    final = prices.loc[prices.shape[0] - 1, "close"]
    percentage = 100 * (final - initial) / initial
    percentages.append((percentage, stock))

percentages.sort()

percentages[-10:]

The most profitable stock would have been ADMP. 