TrainingByPackt
diff --git a/‎Lesson01/.ipynb_checkpoints/Chapter_1_Case_Study-checkpoint.ipynb
Lines changed: 4 additions & 4 deletions b/‎Lesson01/.ipynb_checkpoints/Chapter_1_Case_Study-checkpoint.ipynb
Lines changed: 4 additions & 4 deletions
diff --git a/‎Lesson01/Chapter_1_Case_Study.ipynb
Lines changed: 4 additions & 4 deletions b/‎Lesson01/Chapter_1_Case_Study.ipynb
Lines changed: 4 additions & 4 deletions
diff --git a/‎Lesson02/.ipynb_checkpoints/Chapter_2_Case_Study-checkpoint.ipynb
Lines changed: 7 additions & 10 deletions b/‎Lesson02/.ipynb_checkpoints/Chapter_2_Case_Study-checkpoint.ipynb
Lines changed: 7 additions & 10 deletions
diff --git a/‎Lesson02/Chapter_2_Case_Study.ipynb
Lines changed: 7 additions & 10 deletions b/‎Lesson02/Chapter_2_Case_Study.ipynb
Lines changed: 7 additions & 10 deletions
diff --git a/‎Lesson03/.ipynb_checkpoints/Chapter_3_Case_Study-checkpoint.ipynb
Lines changed: 103 additions & 101 deletions b/‎Lesson03/.ipynb_checkpoints/Chapter_3_Case_Study-checkpoint.ipynb
Lines changed: 103 additions & 101 deletions
diff --git a/‎Lesson03/Chapter_3_Case_Study.ipynb
Lines changed: 103 additions & 101 deletions b/‎Lesson03/Chapter_3_Case_Study.ipynb
Lines changed: 103 additions & 101 deletions
diff --git a/‎Lesson04/.ipynb_checkpoints/Chapter_4_Case_Study-checkpoint.ipynb
Lines changed: 24 additions & 9 deletions b/‎Lesson04/.ipynb_checkpoints/Chapter_4_Case_Study-checkpoint.ipynb
Lines changed: 24 additions & 9 deletions
@@ -104,7 +104,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Data consists of one month of credit card balances. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
+    "The data consist of one month of credit card account information, with historical data looking back six months. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
    ]
   },
   {
@@ -562,7 +562,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise 4: Continuing verification of data integrity"
+    "# Exercise 4: Continuing Verification of Data Integrity"
    ]
   },
   {
@@ -2688,7 +2688,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exploring the Financial History Features in the Data Set"
+    "# Exploring the Financial History Features in the Dataset"
    ]
   },
   {
@@ -3462,7 +3462,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Activity: Exploring Remaining Financial Features in the Data Set"
+    "# Activity 1: Exploring Remaining Financial Features in the Dataset"
    ]
   },
   {
 
@@ -104,7 +104,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Data consists of one month of credit card balances. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
+    "The data consist of one month of credit card account information, with historical data looking back six months. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
    ]
   },
   {
@@ -562,7 +562,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise 4: Continuing verification of data integrity"
+    "# Exercise 4: Continuing Verification of Data Integrity"
    ]
   },
   {
@@ -2688,7 +2688,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exploring the Financial History Features in the Data Set"
+    "# Exploring the Financial History Features in the Dataset"
    ]
   },
   {
@@ -3462,7 +3462,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Activity: Exploring Remaining Financial Features in the Data Set"
+    "# Activity 1: Exploring Remaining Financial Features in the Dataset"
    ]
   },
   {
 
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Explore the target variable and conclude initial exploration"
+    "# Exploring the Response Variable and Concluding the Initial Exploration"
    ]
   },
   {
@@ -85,7 +85,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Scikit-Learn intro"
+    "# Introduction to Scikit-Learn"
    ]
   },
   {
@@ -326,7 +326,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Synthetic Data"
+    "# Generating Synthetic Data"
    ]
   },
   {
@@ -437,7 +437,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Linear regression in Scikit-Learn"
+    "# Exercise 8: Linear regression in Scikit-Learn"
    ]
   },
   {
@@ -805,11 +805,10 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"
+    "# Exercise 9: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"
    ]
   },
   {
@@ -1021,11 +1020,10 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"
+    "# Exercise 10: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"
    ]
   },
   {
@@ -1256,7 +1254,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -1353,7 +1350,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Activity: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"
+    "# Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"
    ]
   },
   {
 
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Explore the target variable and conclude initial exploration"
+    "# Exploring the Response Variable and Concluding the Initial Exploration"
    ]
   },
   {
@@ -85,7 +85,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Scikit-Learn intro"
+    "# Introduction to Scikit-Learn"
    ]
   },
   {
@@ -326,7 +326,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Synthetic Data"
+    "# Generating Synthetic Data"
    ]
   },
   {
@@ -437,7 +437,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Linear regression in Scikit-Learn"
+    "# Exercise 8: Linear regression in Scikit-Learn"
    ]
   },
   {
@@ -805,11 +805,10 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"
+    "# Exercise 9: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"
    ]
   },
   {
@@ -1021,11 +1020,10 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise X: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"
+    "# Exercise 10: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"
    ]
   },
   {
@@ -1256,7 +1254,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -1353,7 +1350,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Activity: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"
+    "# Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"
    ]
   },
   {
 
@@ -17,10 +17,11 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 1. How are the coefficients and intercept of logistic regression estimated?"
+    "# Estimating the Coefficients and Intercepts of Logistic Regression"
    ]
   },
   {
@@ -88,7 +89,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise 1: Gradient descent"
+    "# Gradient Descent to Find Optimal Parameter Values"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 16: Using Gradient Descent to Minimize a Cost Function"
    ]
   },
   {
@@ -324,7 +332,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 2. Assumptions of logistic regression"
+    "# Assumptions of Logistic Regression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note: adjust the path in the following cell to the location where you saved the cleaned data from Chapter 1."
    ]
   },
   {
@@ -724,7 +739,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 3. Motivation for regularization: the bias-variance tradeoff\n",
+    "# The Motivation for Regularization: the Bias-Variance Trade-off\n",
     "Generate quadratic data with random noise to illustrate this."
    ]
   },
@@ -905,7 +920,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise 2: Generate synthetic classification data and model it"
+    "# Exercise 17: Generating and Modeling Synthetic Classification Data"
    ]
   },
   {
@@ -1111,7 +1126,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 3.2. Lasso (L1) and ridge (L2) regularization"
+    "# Lasso (L1) and Ridge (L2) Regularization"
    ]
   },
   {
@@ -1234,7 +1249,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 3.3. How to choose the regularization parameter and other hyperparameters: cross validation"
+    "# Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters"
    ]
   },
   {
@@ -1398,7 +1413,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Exercise 3: Reduce overfitting on the synthetic data classification problem"
+    "# Exercise 18: Reducing Overfitting on the Synthetic Data Classification Problem"
    ]
   },
   {
@@ -1967,7 +1982,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Activity: Cross validation and feature engineering with case study data"
+    "# Activity 4: Cross-Validation and Feature Engineering with the Case Study Data"
    ]
   },
   {
Original file line number	Diff line number	Diff line change
`@@ -104,7 +104,7 @@`
`104`	`104`	`"cell_type": "markdown",`
`105`	`105`	`"metadata": {},`
`106`	`106`	`"source": [`
`107`		- "Data consists of one month of credit card balances. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
	`107`	+ "The data consist of one month of credit card account information, with historical data looking back six months. A basic quality check is to make sure that we have data for as many accounts as we think we do. In particular, account IDs should all be distinct. We can check to see the number of distinct ID's with the Pandas function `.nunique()`."
`108`	`108`	`]`
`109`	`109`	`},`
`110`	`110`	`{`
`@@ -562,7 +562,7 @@`
`562`	`562`	`"cell_type": "markdown",`
`563`	`563`	`"metadata": {},`
`564`	`564`	`"source": [`
`565`		`- "# Exercise 4: Continuing verification of data integrity"`
	`565`	`+ "# Exercise 4: Continuing Verification of Data Integrity"`
`566`	`566`	`]`
`567`	`567`	`},`
`568`	`568`	`{`
`@@ -2688,7 +2688,7 @@`
`2688`	`2688`	`"cell_type": "markdown",`
`2689`	`2689`	`"metadata": {},`
`2690`	`2690`	`"source": [`
`2691`		`- "# Exploring the Financial History Features in the Data Set"`
	`2691`	`+ "# Exploring the Financial History Features in the Dataset"`
`2692`	`2692`	`]`
`2693`	`2693`	`},`
`2694`	`2694`	`{`
`@@ -3462,7 +3462,7 @@`
`3462`	`3462`	`"cell_type": "markdown",`
`3463`	`3463`	`"metadata": {},`
`3464`	`3464`	`"source": [`
`3465`		`- "# Activity: Exploring Remaining Financial Features in the Data Set"`
	`3465`	`+ "# Activity 1: Exploring Remaining Financial Features in the Dataset"`
`3466`	`3466`	`]`
`3467`	`3467`	`},`
`3468`	`3468`	`{`
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		`- "# Explore the target variable and conclude initial exploration"`
	`7`	`+ "# Exploring the Response Variable and Concluding the Initial Exploration"`
`8`	`8`	`]`
`9`	`9`	`},`
`10`	`10`	`{`
`@@ -85,7 +85,7 @@`
`85`	`85`	`"cell_type": "markdown",`
`86`	`86`	`"metadata": {},`
`87`	`87`	`"source": [`
`88`		`- "# Scikit-Learn intro"`
	`88`	`+ "# Introduction to Scikit-Learn"`
`89`	`89`	`]`
`90`	`90`	`},`
`91`	`91`	`{`
`@@ -326,7 +326,7 @@`
`326`	`326`	`"cell_type": "markdown",`
`327`	`327`	`"metadata": {},`
`328`	`328`	`"source": [`
`329`		`- "# Synthetic Data"`
	`329`	`+ "# Generating Synthetic Data"`
`330`	`330`	`]`
`331`	`331`	`},`
`332`	`332`	`{`
`@@ -437,7 +437,7 @@`
`437`	`437`	`"cell_type": "markdown",`
`438`	`438`	`"metadata": {},`
`439`	`439`	`"source": [`
`440`		`- "# Exercise X: Linear regression in Scikit-Learn"`
	`440`	`+ "# Exercise 8: Linear regression in Scikit-Learn"`
`441`	`441`	`]`
`442`	`442`	`},`
`443`	`443`	`{`
`@@ -805,11 +805,10 @@`
`805`	`805`	`]`
`806`	`806`	`},`
`807`	`807`	`{`
`808`		`- "attachments": {},`
`809`	`808`	`"cell_type": "markdown",`
`810`	`809`	`"metadata": {},`
`811`	`810`	`"source": [`
`812`		`- "# Exercise X: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"`
	`811`	`+ "# Exercise 9: Calculating the True and False Positive and Negative Rates and Confusion Matrix in Python"`
`813`	`812`	`]`
`814`	`813`	`},`
`815`	`814`	`{`
`@@ -1021,11 +1020,10 @@`
`1021`	`1020`	`]`
`1022`	`1021`	`},`
`1023`	`1022`	`{`
`1024`		`- "attachments": {},`
`1025`	`1023`	`"cell_type": "markdown",`
`1026`	`1024`	`"metadata": {},`
`1027`	`1025`	`"source": [`
`1028`		`- "# Exercise X: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"`
	`1026`	`+ "# Exercise 10: Obtaining Predicted Probabilities from a Trained Logistic Regression Model"`
`1029`	`1027`	`]`
`1030`	`1028`	`},`
`1031`	`1029`	`{`
`@@ -1256,7 +1254,6 @@`
`1256`	`1254`	`]`
`1257`	`1255`	`},`
`1258`	`1256`	`{`
`1259`		`- "attachments": {},`
`1260`	`1257`	`"cell_type": "markdown",`
`1261`	`1258`	`"metadata": {},`
`1262`	`1259`	`"source": [`
`@@ -1353,7 +1350,7 @@`
`1353`	`1350`	`"cell_type": "markdown",`
`1354`	`1351`	`"metadata": {},`
`1355`	`1352`	`"source": [`
`1356`		`- "# Activity: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"`
	`1353`	`+ "# Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve"`
`1357`	`1354`	`]`
`1358`	`1355`	`},`
`1359`	`1356`	`{`
Original file line number	Diff line number	Diff line change
`@@ -17,10 +17,11 @@`
`17`	`17`	`]`
`18`	`18`	`},`
`19`	`19`	`{`
	`20`	`+ "attachments": {},`
`20`	`21`	`"cell_type": "markdown",`
`21`	`22`	`"metadata": {},`
`22`	`23`	`"source": [`
`23`		`- "# 1. How are the coefficients and intercept of logistic regression estimated?"`
	`24`	`+ "# Estimating the Coefficients and Intercepts of Logistic Regression"`
`24`	`25`	`]`
`25`	`26`	`},`
`26`	`27`	`{`
`@@ -88,7 +89,14 @@`
`88`	`89`	`"cell_type": "markdown",`
`89`	`90`	`"metadata": {},`
`90`	`91`	`"source": [`
`91`		`- "# Exercise 1: Gradient descent"`
	`92`	`+ "# Gradient Descent to Find Optimal Parameter Values"`
	`93`	`+ ]`
	`94`	`+ },`
	`95`	`+ {`
	`96`	`+ "cell_type": "markdown",`
	`97`	`+ "metadata": {},`
	`98`	`+ "source": [`
	`99`	`+ "# Exercise 16: Using Gradient Descent to Minimize a Cost Function"`
`92`	`100`	`]`
`93`	`101`	`},`
`94`	`102`	`{`
`@@ -324,7 +332,14 @@`
`324`	`332`	`"cell_type": "markdown",`
`325`	`333`	`"metadata": {},`
`326`	`334`	`"source": [`
`327`		`- "# 2. Assumptions of logistic regression"`
	`335`	`+ "# Assumptions of Logistic Regression"`
	`336`	`+ ]`
	`337`	`+ },`
	`338`	`+ {`
	`339`	`+ "cell_type": "markdown",`
	`340`	`+ "metadata": {},`
	`341`	`+ "source": [`
	`342`	`+ "Note: adjust the path in the following cell to the location where you saved the cleaned data from Chapter 1."`
`328`	`343`	`]`
`329`	`344`	`},`
`330`	`345`	`{`
`@@ -724,7 +739,7 @@`
`724`	`739`	`"cell_type": "markdown",`
`725`	`740`	`"metadata": {},`
`726`	`741`	`"source": [`
`727`		`- "# 3. Motivation for regularization: the bias-variance tradeoff\n",`
	`742`	`+ "# The Motivation for Regularization: the Bias-Variance Trade-off\n",`
`728`	`743`	`"Generate quadratic data with random noise to illustrate this."`
`729`	`744`	`]`
`730`	`745`	`},`
`@@ -905,7 +920,7 @@`
`905`	`920`	`"cell_type": "markdown",`
`906`	`921`	`"metadata": {},`
`907`	`922`	`"source": [`
`908`		`- "# Exercise 2: Generate synthetic classification data and model it"`
	`923`	`+ "# Exercise 17: Generating and Modeling Synthetic Classification Data"`
`909`	`924`	`]`
`910`	`925`	`},`
`911`	`926`	`{`
`@@ -1111,7 +1126,7 @@`
`1111`	`1126`	`"cell_type": "markdown",`
`1112`	`1127`	`"metadata": {},`
`1113`	`1128`	`"source": [`
`1114`		`- "# 3.2. Lasso (L1) and ridge (L2) regularization"`
	`1129`	`+ "# Lasso (L1) and Ridge (L2) Regularization"`
`1115`	`1130`	`]`
`1116`	`1131`	`},`
`1117`	`1132`	`{`
`@@ -1234,7 +1249,7 @@`
`1234`	`1249`	`"cell_type": "markdown",`
`1235`	`1250`	`"metadata": {},`
`1236`	`1251`	`"source": [`
`1237`		`- "# 3.3. How to choose the regularization parameter and other hyperparameters: cross validation"`
	`1252`	`+ "# Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters"`
`1238`	`1253`	`]`
`1239`	`1254`	`},`
`1240`	`1255`	`{`
`@@ -1398,7 +1413,7 @@`
`1398`	`1413`	`"cell_type": "markdown",`
`1399`	`1414`	`"metadata": {},`
`1400`	`1415`	`"source": [`
`1401`		`- "# Exercise 3: Reduce overfitting on the synthetic data classification problem"`
	`1416`	`+ "# Exercise 18: Reducing Overfitting on the Synthetic Data Classification Problem"`
`1402`	`1417`	`]`
`1403`	`1418`	`},`
`1404`	`1419`	`{`
`@@ -1967,7 +1982,7 @@`
`1967`	`1982`	`"cell_type": "markdown",`
`1968`	`1983`	`"metadata": {},`
`1969`	`1984`	`"source": [`
`1970`		`- "# Activity: Cross validation and feature engineering with case study data"`
	`1985`	`+ "# Activity 4: Cross-Validation and Feature Engineering with the Case Study Data"`
`1971`	`1986`	`]`
`1972`	`1987`	`},`
`1973`	`1988`	`{`