# Data analysis using pandas & matplotlib

In this assignment, you are invited to analyze cryptocurrency data using the pandas and matplotlib libraries.
The task looks like a laboratory work in which you have to fill in the blank cells and answer a number of questions.

Minimum to pass - 3 points (out of ten). If you fail the lab work you fail the entire course (see 1st seminar slides).

 - [Pandas docs](https://pandas.pydata.org/)
 - [Matplotlib docs](https://matplotlib.org/index.html)

## 1. Data (2 points)

Let's start with the necessary preparations.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl  
import matplotlib.pyplot as plt
import ipywidgets  # library for interactive controls in jupyter notebook

%matplotlib inline

#### Load the dataset from "coins.csv". Create a pandas.DataFrame object with name *coins* and date as index.

In [2]:
# Paste your code here.

Let's see what we've got

In [3]:
coins.head(4)

Unnamed: 0_level_0,date,price,txCount,txVolume,activeAddresses,symbol,name,open,high,low,close,volume,market
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2013-04-28,2013-04-28,135.3,41702.0,68798680.0,117984.0,BTC,Bitcoin,135.3,135.98,132.1,134.21,0.0,1500520000.0
2013-04-28,2013-04-28,4.3,9174.0,44319520.0,17216.0,LTC,Litecoin,4.3,4.4,4.18,4.35,0.0,73773400.0
2013-04-29,2013-04-29,134.44,51602.0,113812800.0,86925.0,BTC,Bitcoin,134.44,147.49,134.0,144.54,0.0,1491160000.0
2013-04-29,2013-04-29,4.37,9275.0,36478100.0,18395.0,LTC,Litecoin,4.37,4.57,4.23,4.38,0.0,74952700.0


column descriptions:
 - date - date of measurement
 - name - coin's name
 - symbol - coin's symbol
 - price - average coin price per trading day in USD
 - txCount - the number of transactions in the coin network
 - txVolume - coin's volume transferred between the coin network addresses
 - activeAddresses - the number of addresses that transacted on the trading day in the coin network
 - open - coin price at the beginning of the trading day
 - close - coin price at the end of the trading day
 - high - highest coin price during the trading day
 - low - lowest coin price during the trading day
 - volume - trading volume of this coin on exchanges on the trading day
 - market - capitalization of this coin on the trading day

#### Let's study the data. Answer the following questions (insert cells with code and text below):
#### 1. How many different coins are present in the dataset? (0.4 points)

#### 2. For what period do we have data? (0.4 points)

#### 3. Are there gaps in the data? Of what nature are they? (0.5 points)

#### 4. Which coin had the highest price and when? (0.2 points)

#### 5. Which coin has the highest and lowest total capitalization? Build a pie chart with shares. (0.5 points)

## 2. Visualization (1 point)

The most interesting part of the analyst’s work is to carefully look at the correctly selected and constructed charts.

#### Implement the function to visualize the prices of the selected currency for the selected date range.
The graph should show the prices of the beginning and the end of the trading day. As well as the minimum and maximum price for the day. Sign the graph and coordinate axes. Add a grid. Increase image size.
You can try using matplotlib.finance.candlestick_ohlc, but you can complete the task without it.

In [16]:
def plot_fancy_price_action(coins, symbol, start_date, end_date):
    # Paste your code here

Let's see:

In [None]:
plot_fancy_price_action(coins=coins, symbol='VERI', start_date='2013-06-01', end_date='2019-06-30') 

There is no data science here at all (yet). Analysist just have to know how to build graphs or what tools to use for that.

## 3. Pump & dump (1 point)
Cryptocurrency exchanges still remain a marginal place, a kind of wild west of the financial world. As a result, here schemes for almost honest money-taking flourish. One of them is pump'n'dump. It looks like this. Several large players or many small ones agree together to buy a little-known coin with a low price and trading volume. This leads to an instant price rise (pump), then inexperienced players come in the hope to make money on such growth. At this point, the organizers of the scheme begin to sell everything (dump). The whole process takes from several minutes to several hours.

#### Your task is to find the strongest pump'n'dump of a coin for a given period of time. To do this, for each day we define a pnd value as the ratio of the maximum coin price on a given day to the maximum of the opening and closing prices on the same day. It is necessary to find the day when pnd was maximum and the value of the pnd.

In [18]:
def find_most_severe_pump_and_dump(coins, symbol, start_date, end_date):
    # Paste your code here

In [None]:
find_most_severe_pump_and_dump(coins, symbol='BTC', start_date='2017-06-01', end_date='2018-06-01')

#### Compare the results for different coins

## 4. Return on Investment (1 point)

#### You need to calculate the return on investment in cryptocurrencies for a given period of time. Return on investment is defined as the ratio of the change in the price of a portfolio to the initial price of a portfolio. Portfolio price is the total value (in USD) of all coins in the portfolio.
investments - dict with coin symbols as keys and USD amounts as values

In [21]:
def compute_roi(coins, investments, start_date, end_date):
    # Paste your code here

In [None]:
compute_roi(coins, investments={'BTC': 1000, 'LTC': 500}, start_date='2018-04-04', end_date='2018-06-01')

In [None]:
compute_roi(coins, investments={'BTC': 1000, 'LTC': 500}, start_date='2013-05-28', end_date='2018-06-06')

## 5. Technical analysis (1 point)

Technical analysis is a way of predicting the behavior of a chart by some auxiliary values constructed according to the original chart. One of the simplest methods of technical analysis is the Bollinger Boundary. Someone believes that once the graph touches its border it should change its direction.

#### Draw a graph of price, moving average and [Bollinger bands](https://en.wikipedia.org/wiki/Bollinger_Bands) with parameters N (window) = 21, K (width) = 2.

Bollinger bands are calculated as follows: (MA + Kσ) and (MA - Kσ), where MA - moving average for N days, and σ - moving standard deviation for N days.

Here you can exploit rolling function to calculate moving average and standard deviation.

Don't forget to sign the graph and axes, draw a legend and choose the best position for it.

In [24]:
def plot_bollinger_bands(coins, symbol, window, width):
    # Paste your code here

In [None]:
# Executing this cell should result in a drawn graph
plot_bollinger_bands(coins=coins, symbol='EOS', window=21, width=2)  

#### Make a conclusion on the Bollinger rule (does it work or not).

## 6. Capitalization as an indicator (2 points)

Many people who trade cryptocurrency like to look at capitalization. Let's understand why.

#### Draw two more graphs. The first should be the market capitalizations of Bitcoin (BTC), ether (ETH), eos (EOS), bitcoin cache (BCH), stellar (XLM) and litecoin (LTC). On the second - the share of capitalizations of these coins from the total market capitalization. Use data starting from 2017-07-01.

In [26]:
def plot_coins_capitalizations(coins, symbols, start_date):
    # Paste your code here

In [None]:
plot_coins_capitalizations(
    coins=coins,
    symbols=('BTC', 'ETH', 'EOS', 'BCH', 'XLM', 'LTC'),
    start_date='2017-07-01'
)

#### Analyze the dependence of the share of altcoin capitalization on the share of bitcoin capitalization. What is the reason for this trend?

## 7. Correlation of coins (2 points)

#### Now you need to look more closely at the correlations of average coin capitalizations. We will look at the average smoothed over the last window days to the specified date with a smoothing coefficient alpha for a set of coins.  
#### Implement a function that will return a square DataFrame with the number of rows and columns equal to the number of coins considered and containing the correlation values.

In [28]:
def calc_coins_correlations(coins, date, symbols, window, alpha):
    # Paste your code here

In [None]:
correlations = calc_coins_correlations(coins, date="2018-06-06",
                                       symbols=['BTC', 'ETH', 'EOS', 'BCH', 'XLM', 'LTC', 'ADA'],
                                       window=21, alpha=0.1)
# Now we will look at it this way
correlations.style.background_gradient(cmap='coolwarm').set_precision(2)

It's also interesting to look at 2017-12-27:

In [None]:
correlations = calc_coins_correlations(coins, date="2017-12-27",
                                       symbols=['BTC', 'ETH', 'EOS', 'BCH', 'XLM', 'LTC', 'ADA'],
                                       window=21, alpha=0.1)
# Now we will look at it this way
correlations.style.background_gradient(cmap='coolwarm').set_precision(2)