Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a module to prune boosted ensemble. #2

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions pkg/C50/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
Package: C50
Type: Package
Title: C5.0 Decision Trees and Rule-Based Models
Version: 0.1.0-24
Date: 2015-03-08
Author: Max Kuhn, Steve Weston, Nathan Coulter, Mark Culp. C code for C5.0 by R. Quinlan
Version: 0.1.0-21
Date: 2014-11-18
Author: Max Kuhn, Steve Weston, Nathan Coulter. C code for C5.0 by R. Quinlan
Maintainer: Max Kuhn <mxkuhn@gmail.com>
Description: C5.0 decision trees and rule-based models for pattern recognition.
Imports: partykit
Description: C5.0 decision trees and rule-based models for pattern recognition
Depends: R (>= 2.10.0)
License: GPL-3
LazyLoad: yes
Packaged: 2014-11-18 15:51:04 UTC; kuhna03
NeedsCompilation: yes
Repository: CRAN
Date/Publication: 2014-11-18 17:16:38
6 changes: 0 additions & 6 deletions pkg/C50/NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
useDynLib(C50)

import(partykit)

export(C5.0Control,
C5.0,
C5.0.default,
Expand All @@ -18,12 +16,8 @@ S3method(print, summary.C5.0)

S3method(print, C5.0)

S3method(plot, C5.0)

S3method(predict, C5.0)

S3method(as.party, C5.0)

S3method(QuinlanAttributes, numeric)
S3method(QuinlanAttributes, factor)
S3method(QuinlanAttributes, character)
Expand Down
19 changes: 12 additions & 7 deletions pkg/C50/R/C5.0.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ C5.0 <- function(x, ...) UseMethod("C5.0")

C5.0.default <- function(x, y,
trials = 1,
prunem = trials,
rules = FALSE,
weights = NULL,
control = C5.0Control(),
Expand All @@ -16,7 +17,10 @@ C5.0.default <- function(x, y,
warning("rule banding only works with rules; 'rules' was changed to TRUE")
rules <- TRUE
}

if(prunem > trials)
{
stop("prunem should be lesser than trials")
}
## to do add weightings

lvl <- levels(y)
Expand All @@ -41,7 +45,10 @@ C5.0.default <- function(x, y,
maxtrials <- 100
if(trials < 1 | trials > maxtrials)
stop(paste("number of boosting iterations must be between 1 and" ,maxtrials))


if(prunem < 1 | prunem > maxtrials)
stop(paste("number of boosting iterations to be pruned must be between 1 and" ,maxtrials))

if(!is.data.frame(x) & !is.matrix(x)) stop("x must be a matrix or data frame")

if(!is.null(weights) && !is.numeric(weights))
Expand All @@ -50,7 +57,6 @@ C5.0.default <- function(x, y,
## TODO: add case weights to these files when needed
namesString <- makeNamesFile(x, y, w = weights, label = control$label, comments = TRUE)
dataString <- makeDataFile(x, y, weights)

Z <- .C("C50",
as.character(namesString),
as.character(dataString),
Expand Down Expand Up @@ -83,28 +89,27 @@ C5.0.default <- function(x, y,
as.logical(control$fuzzyThreshold),
# -p "use the Fuzzy thresholds option" var name: PROBTHRESH
as.logical(control$earlyStopping), # toggle C5.0 to check to see if we should stop boosting early
as.integer((prunem)), # -p : " ditto with specified number of prunem", var name: PRUNEM
## the model is returned in 2 files: .rules and .tree
tree = character(1), # pass back C5.0 tree as a string
rules = character(1), # pass back C5.0 rules as a string
output = character(1), # get output that normally goes to screen
PACKAGE = "C50"
)

## Figure out how may trials were actually used.
## Figure out how may trials were actually used.
modelContent <- strsplit(if(rules) Z$rules else Z$tree, "\n")[[1]]
entries <- grep("^entries", modelContent, value = TRUE)
if(length(entries) > 0)
{
actual <- as.numeric(substring(entries, 10, nchar(entries)-1))
} else actual <- trials

if(trials > 1)
{
boostResults <- getBoostResults(Z$output)
## This next line is here to avoid a false positive warning in R
## CMD check:
## * checking R code for possible problems ... NOTE
## C5.0.default: no visible binding for global variable 'Data'
## C5.0.default: no visible binding for global variable Data
Data <- NULL
size <- if(!is.null(boostResults)) subset(boostResults, Data == "Training Set")$Size else NA
} else {
Expand Down
Empty file added pkg/C50/man/.Rhistory
Empty file.
13 changes: 10 additions & 3 deletions pkg/C50/man/C5.0.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ C5.0 algorithm
\usage{
C5.0(x, ...)

\method{C5.0}{default}(x, y, trials = 1, rules= FALSE,
\method{C5.0}{default}(x, y, trials = 1, prunem = trials,
rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)
Expand All @@ -32,6 +33,9 @@ a factor vector with 2 or more levels
\item{trials}{
an integer specifying the number of boosting iterations. A value of
one indicates that a single model is used.
}
\item{prunem}{
an integer specifying the number of classifiers permitted in the ensemble after pruning of boosted iterations. A value of one indicates that a single model is used.
}
\item{rules}{
A logical: should the tree be decomposed into a rule-based model?
Expand Down Expand Up @@ -86,8 +90,8 @@ generated using \code{\link{predict.C5.0}}.
Internally, the code will attempt to halt boosting if it appears to be
ineffective. For this reason, the value of \code{trials} may be
different from what the model actually produced. There is an option to
turn this off in \code{\link{C5.0Control}}.

turn this off in \code{\link{C5.0Control}}. Additionally, if \code{prunem}
is specified, the boosted ensemble is pruned using Kappa Pruning to select \code{prunem} best classifiers from \code{trials} number of classifiers.
}
\value{
An object of class \code{C5.0} with elements:
Expand Down Expand Up @@ -125,6 +129,9 @@ a string version of the command line output
\item{predictors}{
a character vector of predictor names
}
\item{prunem}{
a integer vector specifying the number of classifiers in the pruned model
}
\item{rbm}{
a logical for rules
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/C50/src/classify.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
/* General Public License for more details. */
/* */
/* You should have received a copy of the GNU General Public License */
/* (gpl.txt) along with C5.0 GPL Edition. If not, see */
/* (gpl.txt) along witih C5.0 GPL Edition. If not, see */
/* */
/* <http://www.gnu.org/licenses/>. */
/* */
Expand Down
Loading