New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New arguments to step umap #213
Conversation
@@ -105,6 +113,8 @@ step_umap <- | |||
metric = "euclidean", | |||
learn_rate = 1, | |||
epochs = NULL, | |||
initial = "spectral", | |||
target_weight = 0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would default it to zero and, if it is not zero, check to make sure that the outcome
is not NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would break backwards compatibility, computation-wise
in main
, the following code uses target_weight = 0.5
by default in uwot::umap()
.
supervised <-
recipe(Species ~ ., data = iris) %>%
step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
prep()
With this proposed change, setting the outcome
variable would no longer have an effect unless they also set the new target_weight
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, go ahead and default it to 1 / 2. They ignore target_weight
unless y
is passed. This is a modified version of what is in ?umap
:
library(uwot)
#> Loading required package: Matrix
ind <- c(1:10, 51:60, 101:110)
iris30 <- iris[ind,]
set.seed(1)
iris_umap_un_1 <-
umap(
iris30[, 1:4],
n_neighbors = 5,
learning_rate = 0.5,
init = "random",
n_epochs = 20,
target_weight = 0.5
)
set.seed(1)
iris_umap_un_2 <-
umap(
iris30[, 1:4],
n_neighbors = 5,
learning_rate = 0.5,
init = "random",
n_epochs = 20,
target_weight = 0.0
)
# just checking!
all.equal(iris_umap_un_1, iris_umap_un_2)
#> [1] TRUE
set.seed(1)
iris_umap_sup <-
umap(
iris30[, 1:4],
n_neighbors = 5,
learning_rate = 0.5,
init = "random",
n_epochs = 20,
y = iris30$Species,
target_weight = 0.5
)
all.equal(iris_umap_un_1, iris_umap_sup)
#> [1] "Attributes: < Component \"scaled:center\": Mean relative difference: 0.6626731 >"
#> [2] "Mean relative difference: 0.3312773"
Created on 2024-01-12 with reprex v2.0.2
Note that in their man page, they pass the outcome in with the original data as well as with the y
argument. That seems like a mistake. My example above avoids that and you get the same results in the first two cases regardless of the target weight.
I almost forgot. Since we are adding arguments, we need to make sure the step is backwards compatible |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
This PR adds
initial
andtarget_weight
tostep_umap()