Skip to content

Testing 21 models (machine learning ensemble) on the WDBC data set

Notifications You must be signed in to change notification settings

Thom-J-H/Capstone2_Harvard_edX

Repository files navigation

Capstone2_Harvard_edX

In partial fulfillment of the requirements for the Harvard edX: Data Science Professional Certificate, this repository contains the following files:

And the original Breast Cancer Wisconsin (Diagnostic) data set (WDBC), available from the UCI Machine Learning Repository, Center for Machine Learning and Intelligent Systems, University of California, Irvine:

The script (and RMD) import the data set from the UCI source, so there is no need to download it first.

Please note:
The RMD will take a minimum of 40 minutes -- and more likely over an hour -- to run. It also requires that the user has installed a number of ML packages for R, consistent with those used in for ensemble modelling in the Harvard edX course on Machine Learning.

The script largely runs silently (the output captured). Any warnings or error messages may be safely ignored. Not every model works perfectly on each testing condition/variation, which is the point of testing the various models against similar controlled conditions.

Thank you,
Thom J. Haslam
March 12, 2019

 
Run Two: Visual Overview

Update: 2019-03-14

I thank the Harvard edX peer and staff reviewers for their encouraging and helpful comments. One suggestion was to change the loading procedure in the RMD from

  • library(tidyverse)
  • library(caret) # etc

To

Which will ensure that if someone is missing the needed packages, the packages will be installed from CRAN so that the RMD runs without terminating by error. (Please see Packages_Required_Set_up.R).

This is an excellent suggestion, so I will update the script and the RMD (by 15 March 2019) for future use/reference. I will also take one last crack at fixing any typos or infelicities of expression in the report, even though the project has received full marks (50 out of 50) and for all practical purpose is done: certificate earned!

Otherwise, I will leave this Machine Learning project up as an archive: as part of what I hope will be a growing R for Data Science portfolio.

 
PCA Graphs 1-2, 4-5

Releases

No releases published

Packages

No packages published