## Introduction

Recently, Bitcoin has been very popular in public as the price go up. It is decentralized network and digital currency that uses a peer-to-peer system to process transactions. In this project, we are going to predict Bitcoin price.

We will use Logistic regression model for predicting the Bitcoin price as it is an open-source additive regression model. It also has advanced version of the prophet like NeuralProphet. We will be using the simplified version for predicting the Bitcoin price. The dataset is downloaded from [Yahoo Finance](https://in.finance.yahoo.com).

### Import required libraries

In [353]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [320]:
df = pd.read_csv('BTC-USD.csv')

In [321]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2020-02-16,9889.179688,10053.96875,9722.386719,9934.433594,9934.433594,43374780000.0
1,2020-02-17,9936.560547,9938.81543,9507.637695,9690.142578,9690.142578,45998300000.0
2,2020-02-18,9691.230469,10161.935547,9632.382813,10141.996094,10141.996094,47271020000.0
3,2020-02-19,10143.798828,10191.675781,9611.223633,9633.386719,9633.386719,46992020000.0
4,2020-02-20,9629.325195,9643.216797,9507.900391,9608.475586,9608.475586,44925260000.0


In [322]:
len(df.columns)

7

In [323]:
df.isnull().any()

Date         False
Open          True
High          True
Low           True
Close         True
Adj Close     True
Volume        True
dtype: bool

In [324]:
df.isnull().sum()

Date         0
Open         4
High         4
Low          4
Close        4
Adj Close    4
Volume       4
dtype: int64

We found that there are only 4 rows which has null values in each column. Since, it is not too much, we will drop them all.

In [325]:
df = df.dropna(axis=0)

In [326]:
df.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [327]:
df.dtypes

Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume       float64
dtype: object

Lets see if one closed up or down each day. And create binary column 0 = loss, 1= gain to have something to predict.

In [328]:
df['pos_neg'] = df['Open']-df['Close']

In [339]:
df['Up/Down'] = np.where(df['pos_neg'] > 0, '0','1')

We need to make sure the columns for feature are integer in order to make prediction.

In [330]:
df[['Open','High','Low','Close','Up/Down']] = df[['Open','High',
                                                  'Low','Close','Up/Down']].astype(int)

In [331]:
df.dtypes

Date          object
Open           int64
High           int64
Low            int64
Close          int64
Adj Close    float64
Volume       float64
pos_neg      float64
Up/Down        int64
dtype: object

In [338]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,pos_neg,Up/Down
0,2020-02-16,9889,10053,9722,9934,9934.433594,43374780000.0,-45.253906,1
1,2020-02-17,9936,9938,9507,9690,9690.142578,45998300000.0,246.417969,0
2,2020-02-18,9691,10161,9632,10141,10141.996094,47271020000.0,-450.765625,1
3,2020-02-19,10143,10191,9611,9633,9633.386719,46992020000.0,510.412109,0
4,2020-02-20,9629,9643,9507,9608,9608.475586,44925260000.0,20.849609,0


Drop columns we are not using.

Now let's split the data and fit into the linear regression model

In [332]:
X = df[['Open','High','Low','Close']]
y = df['Up/Down']
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size = 0.30,random_state=0)

In [333]:
model_log = LogisticRegression()
model_log.fit(X_train,y_train)

LogisticRegression()

### Predicting the test set results and calculating the accuracy

In [334]:
predictions = model_log.predict(X_test)

In [343]:
print('Accuracy of logistic regression classifier on test set:\
      {:.2f}'.format(model_log.score(X_test,y_test)))

Accuracy of logistic regression classifier on test set:      0.98


### Confufsion matrix

In [337]:
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, predictions)
print(confusion_matrix)

[[40  0]
 [ 2 67]]


## Conclusion

The confusion matrix telling us the model predicted 40 zeros which means loss and 67 ones which are gain correctly. 2 values predicted wrongly.