Chapter6-CodeFix

tatsath · Jul 26, 2020 · 42d11ca · 42d11ca
1 parent 117a295
commit 42d11ca
Show file tree

Hide file tree

Showing 10 changed files with 600,043 additions and 22 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1,5 +0,0 @@
-*.psd filter=lfs diff=lfs merge=lfs -text
-Chapter[[:space:]]6[[:space:]]-[[:space:]]Sup.[[:space:]]Learning[[:space:]]-[[:space:]]Classification[[:space:]]models/CaseStudy1[[:space:]]-[[:space:]]Fraud[[:space:]]Detection/creditcard.csv filter=lfs diff=lfs merge=lfs -text
-Chapter[[:space:]]6[[:space:]]-[[:space:]]Sup.[[:space:]]Learning[[:space:]]-[[:space:]]Classification[[:space:]]models/CaseStudy2[[:space:]]-[[:space:]]Loan[[:space:]]Default[[:space:]]Probability/LoansData.csv.gz filter=lfs diff=lfs merge=lfs -text
-Chapter[[:space:]]6[[:space:]]-[[:space:]]Sup.[[:space:]]Learning[[:space:]]-[[:space:]]Classification[[:space:]]models/CaseStudy3[[:space:]]-[[:space:]]Bitcoin[[:space:]]Trading[[:space:]]Strategy/BitstampData.csv filter=lfs diff=lfs merge=lfs -text
-

diff --git a/...- Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/FraudDetection.ipynb b/...- Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/FraudDetection.ipynb
@@ -80,11 +80,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 148,
+   "execution_count": 1,
    "metadata": {
     "_cell_guid": "5d8fee34-f454-2642-8b06-ed719f0317e1"
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Using TensorFlow backend.\n"
+     ]
+    }
+   ],
    "source": [
     "# Load libraries\n",
     "import numpy as np\n",
@@ -125,16 +133,26 @@
     "## 2.2. Loading the Data"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load the data in this step.\n",
+    "\n",
+    "#### <font color='red'>Note : Due to limit in the github for the data size, a sample of the data has been loaded in the jupyter notebook repository of this book. However, all the subsequent results in this jupyter notebook is with actual data (144MB) under https://www.kaggle.com/mlg-ulb/creditcardfraud. You should load the full data in case you want to reproduce the results. </font> "
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 52,
+   "execution_count": 2,
    "metadata": {
     "_cell_guid": "787e35f7-bf9e-0969-8d13-a54fa87f3519"
    },
    "outputs": [],
    "source": [
     "# load dataset\n",
-    "dataset = read_csv('creditcard.csv')"
+    "dataset = read_csv('creditcard_sample.csv')\n",
+    "#dataset = read_csv('creditcard.csv') #Load this for the actual data."
    ]
   },
   {

diff --git a/...ter 6 - Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/creditcard.csv b/...ter 6 - Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/creditcard.csv
diff --git a/... Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/creditcard_sample.csv b/... Sup. Learning - Classification models/CaseStudy1 - Fraud Detection/creditcard_sample.csv
diff --git a/... Classification models/CaseStudy2 - Loan Default Probability/LoanDefaultProbability.ipynb b/... Classification models/CaseStudy2 - Loan Default Probability/LoanDefaultProbability.ipynb
@@ -68,7 +68,7 @@
     "after you’ve missed payments for several months. The predicted variable takes value 1\n",
     "in case of charge-off and 0 otherwise.\n",
     "\n",
-    "This case study aims to analyze data for loans through 2007-2017Q3 from Lending Club available on Kaggle (https://www.kaggle.com/przemekblo/lending-club-loan-classification/data). Dataset contains over 887 thousand observations and 150 variables among which one is describing the loan status. "
+    "This case study aims to analyze data for loans through 2007-2017Q3 from Lending Club available on Kaggle. Dataset contains over 887 thousand observations and 150 variables among which one is describing the loan status. "
    ]
   },
   {
@@ -142,17 +142,27 @@
     "## 2.2. Loading the Data"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load the data in this step.\n",
+    "\n",
+    "#### <font color='red'>Note : Due to limit in the github for the data size, a sample of the data has been loaded in the jupyter notebook repository of this book. However, all the subsequent results in this jupyter notebook is with actual data (~1GB) under https://www.kaggle.com/mlfinancebook/lending-club-loans-data. You should load the full data in case you want to reproduce the results. </font> "
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 2,
    "metadata": {
     "_cell_guid": "787e35f7-bf9e-0969-8d13-a54fa87f3519",
     "scrolled": true
    },
    "outputs": [],
    "source": [
     "# load dataset\n",
-    "loans = pd.read_csv('LoansData.csv.gz', compression='gzip', low_memory=True)"
+    "loans = pd.read_csv('LoansData_sample.csv.gz', compression='gzip', encoding='utf-8')\n",
+    "#loans = pd.read_csv('LoansData.csv.gz', compression='gzip', low_memory=True) #Use this for the actual data"
    ]
   },
   {

diff --git a/.... Learning - Classification models/CaseStudy2 - Loan Default Probability/LoansData.csv.gz b/.... Learning - Classification models/CaseStudy2 - Loan Default Probability/LoansData.csv.gz
diff --git a/...ing - Classification models/CaseStudy2 - Loan Default Probability/LoansData_sample.csv.gz b/...ing - Classification models/CaseStudy2 - Loan Default Probability/LoansData_sample.csv.gz
diff --git a/... Classification models/CaseStudy3 - Bitcoin Trading Strategy/BitcoinTradingStrategy.ipynb b/... Classification models/CaseStudy3 - Bitcoin Trading Strategy/BitcoinTradingStrategy.ipynb
@@ -137,6 +137,13 @@
     "## 2.2. Loading the Data"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### <font color='red'>Note : Due to limit in the github for the data size, a sample of the data has been loaded in the jupyter notebook repository of this book. However, all the subsequent results in this jupyter notebook is with actual data (~150MB) under https://www.kaggle.com/mlfinancebook/bitstamp-bicoin-minutes-data. You should load the full data in case you want to reproduce the results. </font> "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -146,7 +153,7 @@
    "outputs": [],
    "source": [
     "# load dataset\n",
-    "dataset = pd.read_csv('BitstampData.csv')"
+    "dataset = pd.read_csv('BitstampData_sample.csv')"
    ]
   },
   {

diff --git a/.... Learning - Classification models/CaseStudy3 - Bitcoin Trading Strategy/BitstampData.csv b/.... Learning - Classification models/CaseStudy3 - Bitcoin Trading Strategy/BitstampData.csv