Skip to content

xinbinhuang/bitcoin-analysis

Repository files navigation

Bitcoin-analysis

Author: Xinbin Huang

Last updated: Dec 16, 2017

Project Overview

The value of Bitcoin has increased a lot since it was invented. Also, more and more people are interested in investing in Bitcoin. It seems interesting to investigate the factors that affect the price.

This project performs a simple analysis on the effect of two factors on Bitcoin price.

File structure

  • data : raw data (two CSV files bitcoin_price.csv and bitcoin_dataset.csv)
  • src : code files and analysis scripts (.R, .Rmd)
  • results : rendered documents and generated analysis results
  • doc : rendered report (bitcoin_report.md)

Research Question

  • Does the difficulty to find a new block affect the price of Bitcoin?
  • Does the volume of the Bitcoin affect the price of Bitcoin?

Hypothesis

  • The price of Bitcoin would be higher with increasing difficulty to find a new block because lower supplies (new blocks) makes Bitcoin more valuable.
  • The volume of the Bitcoin would positively affect the price of Bitcoin because higher the volume, more investors would like to buy it.

Data

The dataset includes the historical price and features data of the cryptocurrency Bitcoin. It is retrieved from Kaggle Cryptocurrency Historical Prices

  • The downloaded files are located in data folder.
    • bitcoin_dataset.csv: include some features describing the Bitcoin
    • bitcoin_price.csv : include price information about the Bitcoin
  • There are two .csv files (features.csv and price.csv) in the results folder for testing purposes.

variables

  • Date record the date from 2013-4-28 to 2017-11-07.
  • Close is the daily closing price of Bitcoin from 2013-4-28 to 2017-11-07.
  • btc_difficulty is a relative measure of the difficulty in finding a new block.
  • Volume is the volume of transactions on the given day.

Analysis Overview

I generated a pair-plot with the variables Close, btc_difficulty and Volume to first explore their relationship. Then I will run a linear regression model to see if latter two variables affect the Bitcoin price. The following part is the procedure to reproduce the analysis.

Data analysis pipeline

Dependencies diagram for the analysis piepline

Usage

  1. Get Docker Image:
docker pull xhuang09/bitcoin-analysis
  1. Clone the repo:

For HTTPS:

git clone https://github.com/xinbinhuang/bitcoin-analysis.git

For SSH:

git clone git@github.com:xinbinhuang/bitcoin-analysis.git
  1. Run the Docker Image:
docker run -it --rm -v YOUR_LOCAL_DIRECTORY_OF_CLONED_REPO/:/home/bitcoin-analysis xhuang09/bitcoin-analysis /bin/bash
  1. Change Directory:
cd home/bitcoin-analysis/
  1. To run the project analysis:    
make all

  1. To clean previously outputted files:
make clean

Analysis script usage

Run the following command to regenerate the analysis. All commands should be run in the project root directory. Regardless of the dependency requirements, The following commands will give the same results of running the command make all in the root directory.

Download the data

This command will download the two required dataframes to the data folder as bitcoin_dataset.csv and bitcoin_price.csv.

# first data frame
Rscript src/download-data.R https://raw.githubusercontent.com/xinbinhuang/data-bitcoin/master/bitcoin_dataset.csv data/bitcoin_dataset.csv

# second data frame
Rscript src/download-data.R https://raw.githubusercontent.com/xinbinhuang/data-bitcoin/master/bitcoin_price.csv data/bitcoin_price.csv

Merge data for analysis

This command will merge the two dataframes into one dataframe for subsequent analysis. The output CSV file will be stored in data/bitcoin_dataset.csv

Rscript src/merge-data.R data/bitcoin_price.csv data/bitcoin_dataset.csv results/merged-data.csv

Perform descriptive analysis

This command will perform a descriptive analysis on the three variables. The output CSV file will be stored in results/descriptive-result.csv

Rscript src/descriptive.R results/merged-data.csv results/descriptive-result.csv

Perform regression analysis

This command will perform a regression analysis on the three variables. The output CSV file will be stored in results/regression-result.csv

Rscript src/regression.R results/merged-data.csv results/regression-result.csv

Generate pair-plot for the data

This command will generate a pair-plot on the three variables from the merged data. The output png file will be stored in results/figure/analysis-plot.png.

Rscript src/plot.R results/merged-data.csv results/figure/analysis-plot.png

Generate the report from R markdown

This command will generate the report in markdown file from a R markdown file. The generated report can be found in results.

Rscript -e 'ezknitr::ezknit("src/bitcoin_report.Rmd", out_dir = "doc")'