Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New arguments to step umap #213

Merged
merged 5 commits into from Jan 17, 2024
Merged

New arguments to step umap #213

merged 5 commits into from Jan 17, 2024

Conversation

EmilHvitfeldt
Copy link
Member

This PR adds initial and target_weight to step_umap()

@@ -105,6 +113,8 @@ step_umap <-
metric = "euclidean",
learn_rate = 1,
epochs = NULL,
initial = "spectral",
target_weight = 0.5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would default it to zero and, if it is not zero, check to make sure that the outcome is not NULL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would break backwards compatibility, computation-wise

in main, the following code uses target_weight = 0.5 by default in uwot::umap().

supervised <-
  recipe(Species ~ ., data = iris) %>%
  step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
  prep()

With this proposed change, setting the outcome variable would no longer have an effect unless they also set the new target_weight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, go ahead and default it to 1 / 2. They ignore target_weight unless y is passed. This is a modified version of what is in ?umap:

library(uwot)
#> Loading required package: Matrix

ind <- c(1:10, 51:60, 101:110)

iris30 <- iris[ind,]

set.seed(1)
iris_umap_un_1 <-
  umap(
    iris30[, 1:4],
    n_neighbors = 5,
    learning_rate = 0.5,
    init = "random",
    n_epochs = 20,
    target_weight = 0.5
  )

set.seed(1)
iris_umap_un_2 <-
  umap(
    iris30[, 1:4],
    n_neighbors = 5,
    learning_rate = 0.5,
    init = "random",
    n_epochs = 20,
    target_weight = 0.0
  )

# just checking!
all.equal(iris_umap_un_1, iris_umap_un_2)
#> [1] TRUE

set.seed(1)
iris_umap_sup <-
  umap(
    iris30[, 1:4],
    n_neighbors = 5,
    learning_rate = 0.5,
    init = "random",
    n_epochs = 20,
    y = iris30$Species,
    target_weight = 0.5
  )

all.equal(iris_umap_un_1, iris_umap_sup)
#> [1] "Attributes: < Component \"scaled:center\": Mean relative difference: 0.6626731 >"
#> [2] "Mean relative difference: 0.3312773"

Created on 2024-01-12 with reprex v2.0.2

Note that in their man page, they pass the outcome in with the original data as well as with the y argument. That seems like a mistake. My example above avoids that and you get the same results in the first two cases regardless of the target weight.

@EmilHvitfeldt
Copy link
Member Author

I almost forgot. Since we are adding arguments, we need to make sure the step is backwards compatible

@EmilHvitfeldt EmilHvitfeldt merged commit f205fd8 into main Jan 17, 2024
9 checks passed
@EmilHvitfeldt EmilHvitfeldt deleted the new-arguments-to-step_umap branch January 17, 2024 01:41
Copy link

github-actions bot commented Feb 1, 2024

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants