yiqiao-yin/Applied-Data-Science-in-Stock-Market

This project is the beta version of "Central Intelligence Platform" designed by me. The platform serves for stock trading and money management purpose.
R
Latest commit dc13326 Aug 10, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
figs
scripts

Applied Data Science in Stock Market

This repository provides crucial analysis of applied data science in stock market.

Prerequisite

This project requires reader to have a broad range of knowledge including but not limited to (1) financial accounting, (2) time-series analysis, (3) predictive modeling skills, (4) coding in R, (5) design software package such as R Shiny, and (6) parallel computing using shell script.

Abstract

Project summary: What is tomorrow's stock price? Under big data era, what searching technique can we use to grasp the useful information so that we can minimize our prediction error predicting a regression problem? This project studies price actions in capital market as a random walk from limit theorems. Through clear construction, we derive algorithms from a series of theorems to create standardized buy signals given a trader's committed frequency to participate in the market. Using such processed data, we can use influence measure, I-score, to select robust stock clusters to construct portfolio. Simulation result shows under the same risk profile a \$1,000 initial investment returns \$5,000 while the same time S&P500 returns less than \$1,500. Empirical evidence show results of on average 97% error reduction.

Mathematical Model

Lo et al (2002) have introduced a non-parametric statistics that measures the predictivity of a cluster of variables given a data set in discrete framework. After reading dissertation from Huang (2004) and Hsu (2014), we have adopted the extension of their methodology to measure predicitivity in continuous framework.

The following graph is taken from Hsu (2014) and it presents an illustration to use nearest neighborhood to measure local mean in predictivity score.

Performance and Results

We present a 97% error reduction on average on 30 stocks in Dow Jones 30 Component on held-out test set. Below we present a sample of selected test set resutls for MMM for two comparisons: (1) the first is using time-series ARMA model, and (2) the second is using I-score as feature selection method before we do regression.

Presentation

Yiqiao Yin is the designated presenter for this presentation. He will mainly be using Presentation Slides for the main material. For detailed reference, we invite our audience to read the paper on the research site. The paper is also uploaded to zip folder in Github folder doc.

Shiny App

We also build a platform using Shiny App and this app should serve as supplement in addition to the paper and presentation. Due to slow speed from Shiny server when executing code to download data lively, we will present limited information from Shiny App. The app can be accessed here.

R Notebook

In additional to files above, we also provide a R notebook, image. This R notebook calls RData saved in the doc folder. Then the script produces the graphs such as the following. The R Notebook is meant to work as one of the many supplements in support of the presentation materials just like R Shiny App above.