# Analyzing Stock Prices

In this project, we'll work with stock market data downloaded from [Yahoo Finance](https://finance.yahoo.com/) using the [yahoo_finance](https://pypi.python.org/pypi/yahoo-finance) Python package. This data consists of the daily stock prices from `2007-1-1` to `2017-04-17` for several hundred stock symbols traded on the [NASDAQ](http://www.nasdaq.com/) stock exchange, stored in the prices folder. We used the `download_data.py` script in the same folder as the Jupyter notebook to download all of the stock price data. Each file in the prices folder has a specific stock symbol for its name, and each contains the following information:

- `date` — the data's date
- `close` — the date's closing price
- `open` — the date's opening price
- `high` — the date's highest stock price during trading
- `low` — the date's lowest stock price during trading
- `volume` — the date's number of shares traded

In each stock symbol file the prices are sorted in ascending order by day. Stock trading doesn't happen on certain days, like weekends and holidays, so there are gaps between days — we only have data for days trading occurs. 

We will read in and store all of the data using a dictionary with the stock symbols (name of the file without the .csv extension) as the values and the value associated with each key as a DataFrame storing the data from the CSV file.

In [1]:
import pandas as pd
import os

In [2]:
prices = {}
for fn in os.listdir("prices"):
    filename = fn.split('.')[0]  # getting the filename without the extension
    file = pd.read_csv(os.path.join("prices", fn))  # read in each file in prices as a df
    prices[filename] = file  # store the df in the prices dictionary
    
print(prices['aapl'])

            date       close        open        high         low     volume
0     2007-01-03   83.800002   86.289999   86.579999   81.899999  309579900
1     2007-01-04   85.659998   84.050001   85.949998   83.820003  211815100
2     2007-01-05   85.049997   85.770000   86.199997   84.400002  208685400
3     2007-01-08   85.470000   85.959998   86.529998   85.280003  199276700
4     2007-01-09   92.570003   86.450003   92.979999   85.150000  837324600
...          ...         ...         ...         ...         ...        ...
2585  2017-04-10  143.169998  143.600006  143.880005  142.899994   18473000
2586  2017-04-11  141.630005  142.940002  143.350006  140.059998   30275300
2587  2017-04-12  141.800003  141.600006  142.149994  141.009995   20238900
2588  2017-04-13  141.050003  141.910004  142.380005  141.050003   17652900
2589  2017-04-17  141.830002  141.479996  141.880005  140.869995   16424000

[2590 rows x 6 columns]


## Minimum and Maximum Average Closing Prices

Now that we've read in the data, we can use it to compute aggregates. For example, we can find the following:

- The average closing price of each stock
- The minimum average closing price over all stocks
- The maximum average closing price over all stocks

We can leverage the `DataFrame.mean()` function to do this. We will create a dictionary with stock symbols as keys and average closing prices as values and then find the stocks with minimum and maximum average closing prices.

In [3]:
# Dictionary with stock symbol keys and average closing prices as value
avg_closing_prices = {}
for stock, data in prices.items():
    avg_closing_prices[stock] = data['close'].mean()

In [4]:
# Stock with the minimum average closing price
min_avg_stock = min(avg_closing_prices, key=avg_closing_prices.get)  # return key having min value
min_avg_price = avg_closing_prices[min_avg_stock]  # get the value of the key
print("Stock min avg closing price: ", min_avg_stock, min_avg_price)  # print both

# Stock with the maximum average closing price
max_avg_stock = max(avg_closing_prices, key=avg_closing_prices.get)  # return key having min value
max_avg_price = avg_closing_prices[max_avg_stock]  # get the value of the key
print("Stock max avg closing price: ", max_avg_stock, max_avg_price)  # print both

Stock min avg closing price:  blfs 0.8122763011583011
Stock max avg closing price:  amzn 275.13407757104255


## Grouping Trades per Day

To answer the following questions, it will be easier to organize the trades by date. To do so, we'll calculate a dictionary where the keys are the dates and the values are a list of all trades from all stock symbols that occurred on that day.

For each day, we'll want a list of pairs `(volume, stock_symbol)` of all trades that occurred on that day.

In [5]:
# Dictionary with date as keys and (volume, stock symbol) as values
trades_per_day = {}  # initialize empty dictionary
for stock, data in prices.items():  # iterate through the prices dictionary
    for index, row in data.iterrows():  # iterate through the rows of the data (each stock)
        if row['date'] not in trades_per_day:  # if the date is not in the dictionary
            trades_per_day[row['date']] = []  # add the date as a key with an empty list as value
        trades_per_day[row['date']].append((row['volume'], stock))  # append the volume and stock symbol to the list

In [6]:
print(trades_per_day['2017-02-09'])

[(12000, 'dgica'), (52700, 'bdge'), (32200, 'cvco'), (2637100, 'blkb'), (142300, 'bbox'), (135600, 'ffbc'), (23200, 'fbiz'), (33200, 'ffic'), (372800, 'bdsi'), (4281400, 'amgn'), (3165200, 'expe'), (760200, 'expd'), (164500, 'cur'), (51600, 'clct'), (2262800, 'alny'), (4500, 'evol'), (84200, 'ahgp'), (3300, 'dfbg'), (420100, 'afsi'), (268700, 'chy'), (810000, 'bmrn'), (33900, 'agys'), (2300, 'adrd'), (376300, 'drrx'), (1959500, 'crus'), (105300, 'brew'), (700, 'fbms'), (0, 'emcf'), (6800, 'bsqr'), (293100, 'csfl'), (1795400, 'car'), (14329400, 'cmcsa'), (63400, 'capr'), (161400, 'cmtl'), (176100, 'elos'), (354500, 'cplp'), (4800, 'eltk'), (104300, 'agii'), (31000, 'coke'), (39800, 'egan'), (28200, 'cpss'), (346800, 'adtn'), (714500, 'ffiv'), (16900, 'cspi'), (101600, 'bwen'), (319200, 'cgnx'), (1682300, 'cdns'), (12400, 'egt'), (2077600, 'cray'), (26300, 'arcw'), (195400, 'bncn'), (248900, 'admp'), (261800, 'cnsl'), (144200, 'abax'), (29700, 'aris'), (175300, 'cyrn'), (90700, 'asys'), 

## Finding the Most Traded Stock Each Day

Now that we've computed some aggregates, we can find the most-traded stock each day. For that, we'll calculate a dictionary where the keys are the days and the values are a tuple `(trade_vol, stock_symbol)`. The `stock_symbol` represents the most-traded stock symbol on that day, and `trade_vol` represents the trade volume (stored in the `volume` column).

Finding the most-traded stock can help us find trends in the broader market and see which companies are "hot" at which times.

In [7]:
# Dictionary with date as keys and (volume, stock symbol (max volume on that day)) as values
most_traded_per_day = {}  # initialize empty dictionary
for date, trades in trades_per_day.items():  # iterate through the trades_per_day dictionary
    most_traded_per_day[date] = max(trades, key=lambda x: x[0])[1]  # get the stock symbol with the max volume for each day from tuples

In [8]:
list(most_traded_per_day.items())[:5]  # print first 5 items

[('2007-01-03', 'aapl'),
 ('2007-01-04', 'aapl'),
 ('2007-01-05', 'aapl'),
 ('2007-01-08', 'aapl'),
 ('2007-01-09', 'aapl')]

In [10]:
print(most_traded_per_day['2017-02-09'])  # print the most traded stock on 2017-02-09

# Finding the volume of that stock on that day
volume_on_date = prices[most_traded_per_day['2017-02-09']][prices[most_traded_per_day['2017-02-09']]['date'] == '2017-02-09']['volume'].values[0]
print(f"Most traded stock on 2017-02-09: {most_traded_per_day['2017-02-09']} with volume {volume_on_date}")

amd
Most traded stock on 2017-02-09: amd with volume 72435600


## Searching for High Volume Days

Let's search for all transactions on days with unusually high volume. In order to do this, we'll need to do the following:

- Compute total volume of trading for each day
- Sort and find the 10 highest volume days overall

We will create a dictionary where the keys are the dates and the values are the total volume of that day (over all stocks).

In [11]:
# Dictionary with date as keys and total volume (all stocks) as values
total_volume_per_day = {}  # initialize empty dictionary
for date, trades in trades_per_day.items():  # iterate through the trades_per_day dictionary
    total_volume_per_day[date] = sum([trade[0] for trade in trades])  # sum the volume of all stocks for each

In [12]:
# Print 10 highest volume days
highest_volume_days = sorted(total_volume_per_day, key=total_volume_per_day.get, reverse=True)[:10]
print(highest_volume_days)

['2008-01-23', '2008-10-10', '2007-07-26', '2008-10-08', '2008-01-22', '2008-02-07', '2008-09-29', '2007-11-08', '2008-01-16', '2008-01-24']


## Finding Profitable Stocks

Let's see which stocks would have been the most profitable to buy. We can do this by doing the following:

- Subtracting the initial close price (first row) from the final close price (last row), then computing a percentage relative to the initial price. This tells us how much our initial investment would have grown or reduced.
- Sorting all of the percentages.
- Finding the 10 stocks that grew the most in the time period.

In [13]:
# Calculate the profit for each stock in percent over total time period
profit_per_stock = {}
for stock, data in prices.items():
    opening_price = data.iloc[0]['open']
    closing_price = data.iloc[-1]['close']
    profit_per_stock[stock] = (closing_price - opening_price) / opening_price * 100

In [16]:
# Identify the top 10 most profitable stocks
most_profitable = sorted(profit_per_stock, key=profit_per_stock.get, reverse=True)[:10]

# Print the profit for each of the top 10 most profitable stocks
for stock in most_profitable:
    print(f"Stock: {stock}, Profit: {profit_per_stock[stock]:.2f}%")

Stock: admp, Profit: 7483.84%
Stock: adxs, Profit: 4461.11%
Stock: arcw, Profit: 3898.60%
Stock: blfs, Profit: 2799.96%
Stock: amzn, Profit: 2231.93%
Stock: anip, Profit: 1681.90%
Stock: apdn, Profit: 1549.67%
Stock: cui, Profit: 1525.16%
Stock: axgn, Profit: 1502.74%
Stock: bcli, Profit: 1449.92%
