You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[R] replace uses of T and F with TRUE and FALSE (dmlc#5778)
* [R-package] replace uses of T and F with TRUE and FALSE
* enable linting
* Remove skip
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Copy file name to clipboardExpand all lines: R-package/demo/caret_wrapper.R
+1-1
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ require(e1071)
9
9
# Load Arthritis dataset in memory.
10
10
data(Arthritis)
11
11
# Create a copy of the dataset with data.table package (data.table is 100% compliant with R dataframe but its syntax is a lot more consistent and its performance are really good).
12
-
df<- data.table(Arthritis, keep.rownames=F)
12
+
df<- data.table(Arthritis, keep.rownames=FALSE)
13
13
14
14
# Let's add some new categorical features to see if it helps. Of course these feature are highly correlated to the Age feature. Usually it's not a good thing in ML, but Tree algorithms (including boosted trees) are able to select the best features, even in case of highly correlated features.
15
15
# For the first feature we create groups of age by rounding the real age. Note that we transform it to factor (categorical data) so the algorithm treat them as independant values.
Copy file name to clipboardExpand all lines: R-package/demo/create_sparse_matrix.R
+1-1
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ if (!require(vcd)) {
19
19
data(Arthritis)
20
20
21
21
# create a copy of the dataset with data.table package (data.table is 100% compliant with R dataframe but its syntax is a lot more consistent and its performance are really good).
> `data.table` is 100% compliant with **R**`data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python**[included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost****R** package use `data.table`.
test <- fread('data/test.csv', header=TRUE, stringsAsFactors = FALSE)
35
35
```
36
36
> `magrittr` and `data.table` are here to make the code cleaner and much more rapid.
37
37
@@ -42,13 +42,13 @@ Let's explore the dataset.
42
42
dim(train)
43
43
44
44
# Training content
45
-
train[1:6,1:5, with =F]
45
+
train[1:6,1:5, with =FALSE]
46
46
47
47
# Test dataset dimensions
48
48
dim(test)
49
49
50
50
# Test content
51
-
test[1:6,1:5, with =F]
51
+
test[1:6,1:5, with =FALSE]
52
52
```
53
53
> We only display the 6 first rows and 5 first columns for convenience
54
54
@@ -70,7 +70,7 @@ According to its description, the **Otto** challenge is a multi class classifica
70
70
71
71
```{r searchLabel}
72
72
# Check the content of the last column
73
-
train[1:6, ncol(train), with = F]
73
+
train[1:6, ncol(train), with = FALSE]
74
74
# Save the name of the last column
75
75
nameLastCol <- names(train)[ncol(train)]
76
76
```
@@ -86,7 +86,7 @@ For that purpose, we will:
86
86
87
87
```{r classToIntegers}
88
88
# Convert from classes to numbers
89
-
y <- train[, nameLastCol, with = F][[1]] %>% gsub('Class_','',.) %>% {as.integer(.) -1}
89
+
y <- train[, nameLastCol, with = FALSE][[1]] %>% gsub('Class_','',.) %>% {as.integer(.) -1}
90
90
91
91
# Display the first 5 levels
92
92
y[1:5]
@@ -95,7 +95,7 @@ y[1:5]
95
95
We remove label column from training dataset, otherwise **XGBoost** would use it to guess the labels!
96
96
97
97
```{r deleteCols, results='hide'}
98
-
train[, nameLastCol:=NULL, with = F]
98
+
train[, nameLastCol:=NULL, with = FALSE]
99
99
```
100
100
101
101
`data.table` is an awesome implementation of data.frame, unfortunately it is not a format supported natively by **XGBoost**. We need to convert both datasets (training and test) in `numeric` Matrix format.
@@ -163,7 +163,7 @@ Each *split* is done on one feature only at one value.
163
163
Let's see what the model looks like.
164
164
165
165
```{r modelDump}
166
-
model <- xgb.dump(bst, with.stats = T)
166
+
model <- xgb.dump(bst, with.stats = TRUE)
167
167
model[1:10]
168
168
```
169
169
> For convenience, we are displaying the first 10 lines of the model only.
Copy file name to clipboardExpand all lines: doc/R-package/discoverYourData.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ The first step is to load `Arthritis` dataset in memory and wrap it with `data.t
52
52
53
53
```r
54
54
data(Arthritis)
55
-
df<- data.table(Arthritis, keep.rownames=F)
55
+
df<- data.table(Arthritis, keep.rownames=FALSE)
56
56
```
57
57
58
58
> `data.table` is 100% compliant with **R**`data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python**[included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost****R** package use `data.table`.
0 commit comments