In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
#Setting up Plotly
import plotly
plotly.tools.set_credentials_file(username='velej', api_key='4zGYqZ7w9NBJilKZ3Pnf')

# Label Columns Keys or Values

Date and symbol combined are keys because they are independent from the other quantitative variables: open, close, low, high and volume. However, to uniquely identify a value  you need to use both date and symbol, as they act as an index. Because the quantitative information is dependent, they can be thought of as value attributes.

In [3]:
#import NYSE information
data = pd.read_csv('/Users/JosephVele/Downloads/nyse/prices.csv')

#verify information has been loaded correctly
data.tail()

Unnamed: 0,date,symbol,open,close,low,high,volume
851259,2016-12-30,ZBH,103.309998,103.199997,102.849998,103.93,973800.0
851260,2016-12-30,ZION,43.07,43.040001,42.689999,43.310001,1938100.0
851261,2016-12-30,ZTS,53.639999,53.529999,53.27,53.740002,1701200.0
851262,2016-12-30 00:00:00,AIV,44.73,45.450001,44.41,45.59,1380900.0
851263,2016-12-30 00:00:00,FTV,54.200001,53.630001,53.389999,54.48,705100.0


# Task

A user may be interested in verifying that the top companies as  of 2016-12-30 all followed a similar trends over the years in terms of price .From a high level, this task would fall under consumption, and since, we are trying to verify a hypothesis it would be categorized as discovery. In terms of search, the location is known but the target is not. From a low level, we want to compare targets.

In [4]:
import plotly.graph_objs as go
import plotly.plotly as py

# Filter for top 5 companies (keys) as of 12-31-2016
top_5 = data[data.date =='2016-12-30'].nlargest(5,'close').symbol.values
#Establish colors for encoding symbol attribute
c = ['r','b', 'g','m', 'y']
data['date'] = data['date'].astype('datetime64[ns]')

trace = [ ]
c = ['red','blue', 'green','magenta', 'gold']
for i in range(len(top_5)):
    trace_i = go.Scatter(
        x = data[data['symbol']==top_5[i]]['date'],
        y = data[data['symbol']==top_5[i]]['close'],
        name = top_5[i],
        marker = dict(color = c[i]))
    trace.append(trace_i)

plotly_data = trace

layout= go.Layout(
    title= 'Closing Prices for Top Companies ',
    hovermode= 'closest',
    showlegend= True, 
    xaxis= dict(
        title= 'Date',
        ticklen= 5,
        zeroline= False,
        gridwidth= 2,
        hoverformat = '%x'
    ),
    yaxis=dict(
        title= 'Closing Prices',
        ticklen= 5,
        gridwidth= 2,
    )
)
fig= go.Figure(data=plotly_data, layout=layout)
py.iplot(fig, filename = 'line')


## Line Plot - Interactive Plot 1 

In the below plot, we encode two quantitative variables using the horizontal and spatial position. Per slide 17 in the Visual Encoding Presentation, this aligns with the task as it can be used to analyze trends, correlations, and distributions. Although, the line graph is typically used for one key (slide 18) , we can consider that we have one key for each color channel. The colors have also been chosen to be distinct. "In the design of color codes, the two primary considerations must be visual distinctness, to support visual search operations, and learnability, so that particular colors come to “stand for” particular entities. (Ware, Colin. Visual Thinking: for Design (Morgan Kaufmann Series in Interactive Technologies) (p. 77)). One thing I immediately noticed was the correlation between GOOG and GOOGL , but this is expected since it was the result of stock split. However, the correlation between autozone and amazon was pretty interesting. Using the interactive plot below, the user can look at a per record basis and specifically zoom in on specific dates. We have also limited the number of companies shown as to not overwhelm the user (VAD pg 45) and address the main task. Also, the user can limit clutter by removing some of the companies, which can be very useful for direct comparisons.

In [5]:
import plotly.plotly as py
import plotly.graph_objs as go

top_20 = data[data.date =='2016-12-30'].nlargest(20,'close').symbol.values
   
trace = go.Heatmap(z = data[data.symbol.isin(top_20)].close,
                   x = data[data.symbol.isin(top_20)].date,
                   y = data[data.symbol.isin(top_20)].symbol,
                   colorscale = 'Viridis')

layout= go.Layout(
    title= ' HeatMap of Stock Prices over Time ',
    hovermode= 'closest',
    showlegend= True, 
    xaxis= dict(
        title= 'Date',
        ticklen= 5,
        zeroline= False,
        gridwidth= 2,
        hoverformat = '%x'
    ),
    yaxis=dict(
        title= 'Stock Symbol',
        ticklen= 5,
        gridwidth= 2,
    )
)
plotly_data=[trace]
fig= go.Figure(data=plotly_data, layout=layout)
py.iplot(fig, filename = 'heat')

## Heat Map - Interactive Plot 2

For this plot, I leveraged a recommendation from the Visual Encodings presentations in slide 16. Because there are two keys I implemented a heat map. The previous plot dealt with the issue of having a large number of categorical variables and a limited number of distinct colors. This plot acts as a solution but with a trade off. Visualizing a trend for a single symbol is not easy, but there is more information. One thing that immediately pops out is that a majority of the companies don't seem to be over 400 in closing price and that after the top 5, there is quite a drop. The interactivity chosen for this plot is similar to the previous, as it encompasses a large amount of usefulness for any user. They have the ability to hover and look at the date and close price for a given stock price. This is at an item level (VAD p. 23). The zoom-in capability also allows the user to explore any trend. For example, the user can hold the cursor over 2015 - 2016 to narrow down the dates. 