# Exploring the data

In this notebook I will be doing the bulk of my data exploratory analysis. Work below was done both before and after modeling, but for the purposes of clarity has been organized into a single continous notebook. Many of the visualizations I made can also be found in this notebook.

---

## Imports and data reading

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.plotly as py
import plotly.graph_objs as go

import pandas as pd
from datetime import datetime

In [10]:
bnb = pd.read_csv('/Users/zoenawar/DSI/RNN_LSTM_Cryptocurrency_Project/datasets/binance_modeling_dataset.csv')
btc = pd.read_csv('/Users/zoenawar/DSI/RNN_LSTM_Cryptocurrency_Project/datasets/bitcoin_modeling_dataset.csv')

btc_no_pct = pd.read_csv('/Users/zoenawar/DSI/RNN_LSTM_Cryptocurrency_Project/datasets/kraken_btc_without_pct_changes.csv')
bnb_no_pct = pd.read_csv('/Users/zoenawar/DSI/RNN_LSTM_Cryptocurrency_Project/datasets/bnb_without_pct_changes.csv')

---

# Binance Data Analysis and Visualization

> The first thing to note is that I am missing a lot of sentiment data. This is an issue I plan to fix in the future.

In [12]:
bnb['compound'].describe() # A Pretty positive subreddit for the most part.

count    601.000000
mean       0.121372
std        0.114070
min       -0.317166
25%        0.050807
50%        0.115796
75%        0.187117
max        0.726400
Name: compound, dtype: float64

### Visualizing the price history of BNB

In [9]:
data = [go.Scatter(x=bnb.Date, y=bnb['Close'])]

py.iplot(data, filename = 'binance_price_change_history')


Consider using IPython.display.IFrame instead



In [11]:
data = [go.Scatter(x=bnb_no_pct.Date, y=bnb_no_pct['Close'])]

py.iplot(data, filename = 'binance_price_history')


Consider using IPython.display.IFrame instead



> This is a surprisingly solid coin, I would like to see what happened in January of 2018

In [13]:
trace1 = go.Scatter(
    x=bnb.Date,
    y=bnb['Close'],
    name='Daily closing price'
)
trace2 = go.Scatter(
    x=bnb.Date,
    y=bnb['compound'],
    name='Daily compound sentiment',
    yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
    title='Comparing Price Changes with Average /r/Binance Sentiment',
    yaxis=dict(
        title='sentiment'
    ),
    yaxis2=dict(
        title='closing percent chang',
        titlefont=dict(
            color='rgb(148, 103, 189)'
        ),
        tickfont=dict(
            color='rgb(148, 103, 189)'
        ),
        overlaying='y',
        side='right'
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='binance_vs_sentiment_pct_change')


Consider using IPython.display.IFrame instead



In [16]:
trace1 = go.Scatter(
    x=bnb.Date,
    y=bnb_no_pct['Close'],
    name='Daily Closing Price on coin'
)
trace2 = go.Scatter(
    x=bnb.Date,
    y=bnb['compound'],
    name='Daily Compound Sentiment',
    yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
    title='Comparing Prices with Average /r/Binance Sentiment',
    yaxis=dict(
        title='Closing Price'
    ),
    yaxis2=dict(
        title='Sentiment',
        titlefont=dict(
            color='rgb(148, 103, 189)'
        ),
        tickfont=dict(
            color='rgb(148, 103, 189)'
        ),
        overlaying='y',
        side='right'
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='bnb_and_sentiment')


Consider using IPython.display.IFrame instead



Like I said, missing some sentiment data! I've added lots of annotations and inferences in the actual Plotly interface. If you'd like to see these, check the visualizations directory in my github repo.

# Bitcoin Data Analysis and Visualization

> This was arguably my favorite part of the project. It's really cool looking at price and sentiment, especially for bitcoin.

In [12]:
btc['compound'].describe() # A Pretty positive subreddit for the most part.

count    601.000000
mean       0.121372
std        0.114070
min       -0.317166
25%        0.050807
50%        0.115796
75%        0.187117
max        0.726400
Name: compound, dtype: float64

### Visualizing the price history of btc

In [17]:
data = [go.Scatter(x=btc_no_pct.Date, y=btc_no_pct['Close'])]

py.iplot(data, filename = 'bitcoin_price_history')


Consider using IPython.display.IFrame instead



> This is a surprisingly solid coin, I would like to see what happened in January of 2018

In [18]:
trace1 = go.Scatter(
    x=btc.Date,
    y=btc['Close'],
    name='Daily closing price'
)
trace2 = go.Scatter(
    x=btc.Date,
    y=btc['compound'],
    name='Daily compound sentiment',
    yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
    title='Comparing Price Changes with Average /r/Binance Sentiment',
    yaxis=dict(
        title='sentiment'
    ),
    yaxis2=dict(
        title='closing percent chang',
        titlefont=dict(
            color='rgb(148, 103, 189)'
        ),
        tickfont=dict(
            color='rgb(148, 103, 189)'
        ),
        overlaying='y',
        side='right'
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='binance_vs_sentiment_pct_change')


Consider using IPython.display.IFrame instead



In [19]:
trace1 = go.Scatter(
    x=btc.Date,
    y=btc_no_pct['Close'],
    name='Daily Closing Price on coin'
)
trace2 = go.Scatter(
    x=btc.Date,
    y=btc['compound'],
    name='Daily Compound Sentiment',
    yaxis='y2'
)
data = [trace1, trace2]
layout = go.Layout(
    title='Comparing Prices with Average /r/Binance Sentiment',
    yaxis=dict(
        title='Closing Price'
    ),
    yaxis2=dict(
        title='Sentiment',
        titlefont=dict(
            color='rgb(148, 103, 189)'
        ),
        tickfont=dict(
            color='rgb(148, 103, 189)'
        ),
        overlaying='y',
        side='right'
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='btc_and_sentiment')


Consider using IPython.display.IFrame instead



> To me, the most interesting aspects of this graph are the beginning and the 2018 bull market. You can see the early days of the subreddit are extremely positive, from which we could perhaps infer that early adopters were quite all in on this new technology. During the bull market however, it appears users got increasingly nervous, with the peak price lining up with a daily compound of .0418 - nearly two standard deviations away from the mean!