In this project, we'll mostly play around with Python dictionaries.

We'll work with stock market data downloaded from Yahoo Finance using the yahoo_finance Python package. This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the NASDAQ stock exchange. Each file in the prices folder has a specific stock symbol for its name.

First, we import all the data into a dictionary called `stocks`

In [17]:
import pandas as pd
import os

stocks = {}

for fn in os.listdir("prices"):
    df = pd.read_csv(os.path.join("prices", fn))
    name = fn.split('.')[0]
    stocks[name] = df

list(stocks.keys())[:5] + list(stocks.keys())[-5:]

['aal', 'aame', 'aaon', 'aapl', 'aaww', 'flir', 'fll', 'flws', 'flxs', 'fmbi']

In [2]:
stocks['aal'].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,56.299999,53.889999,56.919998,53.889999,2955600
1,2007-01-04,58.84,56.299999,59.150002,53.650002,2614500
2,2007-01-05,58.290001,58.830002,59.150002,57.900002,1656300
3,2007-01-08,57.93,57.299999,60.48,57.040001,2163200
4,2007-01-09,57.900002,59.439999,60.200001,57.560001,2098600


Here, we find the average closing value for each stock and find the stocks with the minimum and maximum average close value.

In [3]:
avg_close = {stock: stocks[stock]['close'].mean() for stock in stocks}

print(f"The stock with the minimum average close is '{min(avg_close, key=avg_close.get)}.'")
print(f"The stock with the maximum average close is '{max(avg_close, key=avg_close.get)}.'")


The stock with the minimum average close is 'blfs.'
The stock with the maximum average close is 'amzn.'


Next, we find the stocks with the ten highest trading volumes.

In [4]:
volumes = {}

for stock in stocks:
    for _, row in stocks[stock].iterrows():
        if row['date'] in volumes: volumes[row['date']].append((row['volume'], stock))
        else: volumes[row['date']] = [(row['volume'], stock)]

volumes['2007-01-03'][:5]

[(2955600, 'aal'),
 (5500, 'aame'),
 (146700, 'aaon'),
 (309579900, 'aapl'),
 (255800, 'aaww')]

In [5]:
most_traded_stock = {}

for day in volumes:
    max_volume = 0
    max_stock = None
    for volume, stock in volumes[day]:
        if volume >= max_volume:
            max_stock = stock
            max_volume = volume
    most_traded_stock[day] = (max_volume, max_stock)

print(f"The top ten most traded stocks are: \n\n\
{pd.Series([most_traded_stock[day][1] for day in most_traded_stock]).value_counts().head(10)}")

The top ten most traded stocks are: 

aapl     2235
amd       148
csco       78
cpsh       33
cmcsa      32
ebay       28
bbry       20
cpah       13
drys        7
brcd        5
dtype: int64


Next, we find the top ten days with the highest volume of trades.

In [6]:
volumes_total = {day: 0 for day in most_traded_stock}

for stock in stocks:
    for _, row in stocks[stock].iterrows():
        volumes_total[row['date']] += row['volume']

In [7]:
print(f"The following ten days had the most trades (in order): \n\n \
{[day[0] for day in sorted(volumes_total.items(), key=lambda x: x[1], reverse=True)[:10]]}")

The following ten days had the most trades (in order): 

 ['2008-01-23', '2008-10-10', '2007-07-26', '2008-10-08', '2008-01-22', '2008-02-07', '2008-09-29', '2007-11-08', '2008-01-16', '2008-01-24']


Finally, we find the stocks that were most profitable over the time frame.

In [8]:
profits = {}

for stock in stocks:
    initial_price = stocks[stock]['close'].iloc[0]
    end_price = stocks[stock]['close'].iloc[-1]
    percent_change = (end_price - initial_price) / initial_price
    profits[stock] = percent_change

In [9]:
print(f"The following ten stocks grew the most from the start to the end (in order): \n\n \
{[x[0] for x in sorted(profits.items(), key=lambda x: x[1], reverse=True)[:10]]}.")

The following ten stocks grew the most from the start to the end (in order): 

 ['admp', 'adxs', 'arcw', 'blfs', 'amzn', 'anip', 'apdn', 'cui', 'bcli', 'achc'].
