Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

Closed
JenniferSLyon opened this issue Aug 21, 2019 · 2 comments

Comments

@JenniferSLyon
Copy link
Contributor

Based on the example in the vignette:

iris.data = iris[, grep("Sepal|Petal", colnames(iris))]
iris.labels = iris[, "Species"]
custom.config = umap.defaults
custom.config$n_components = 1
iris.umap = umap(iris.data, config=custom.config)

set.seed(19)
iris.wnoise = iris.data + matrix(rnorm(150*40, 0, 0.1), ncol=4)
colnames(iris.wnoise) = colnames(iris.data)
iris.wnoise.umap = predict(iris.umap, iris.wnoise)

Error in colMeans(embedding[knn.indexes[i, ], ]) :
'x' must be an array of at least two dimensions

traceback()
6: stop("'x' must be an array of at least two dimensions")
5: colMeans(embedding[knn.indexes[i, ], ])
4: make.initial.spectator.embedding(umap$layout, spectator.knn$indexes)
3: implementations[[method]](object, data)
2: predict.umap(iris.umap, iris.wnoise)
1: predict(iris.umap, iris.wnoise)

Looking at make.initial.spectator.embedding, it looks like a drop=F
is missing (line with ## <-----):

trace(umap:::make.initial.spectator.embedding, edit=T)

function (embedding, knn.indexes)
{
result = matrix(0, nrow = nrow(knn.indexes), ncol = ncol(embedding))
rownames(result) = rownames(knn.indexes)
knn.indexes = knn.indexes[, 2:ncol(knn.indexes), drop = FALSE]
for (i in 1:nrow(result)) {
result[i, ] = colMeans(embedding[knn.indexes[i, ], ,
drop = FALSE]) ## <------- added drop = FALSE
}
result
}

This change leads to a new error:

iris.wnoise.umap = predict(iris.umap, iris.wnoise)
Error in temp.embedding[, temp.index] <- result[, indeces[i]] :
incorrect number of subscripts on matrix

traceback()
4: naive.simplicial.set.embedding(graph, embedding, config,
fix.observations = V)
3: implementations[[method]](object, data)
2: predict.umap(iris.umap, iris.wnoise)
1: predict(iris.umap, iris.wnoise)

And it also looks like a drop=F is missing in naive.simlicial.set.embedding:

naive.simplicial.set.embedding
function (g, embedding, config, fix.observations = NULL)
{
if (config$n_epochs == 0) {
return(embedding)
}
result = t(embedding)
gmax = max(g$coo[, "value"])
g$coo[g$coo[, "value"] < gmax/config$n_epochs, "value"] = 0
g = reduce.coo(g)
eps = cbind(g$coo, eps = make.epochs.per.sample(g$coo[, "value"],
config$n_epochs))
if (is.null(fix.observations)) {
result = naive.optimize.embedding(result, config, eps)
}
else {
eps = eps[eps[, "from"] > fix.observations, ]
indeces = seq(fix.observations + 1, ncol(result))
seeds = column.seeds(result[, indeces, drop = FALSE],
key = config$transform_state)
temp.index = fix.observations + 1
temp.embedding = result[, seq_len(fix.observations +
1), drop = FALSE] ## <----- added drop=FALSE
temp.eps = split.data.frame(eps, eps[, "from"])
for (i in seq_along(indeces)) {
temp.embedding[, temp.index] = result[, indeces[i]]
set.seed(seeds[i])
i.eps = temp.eps[[as.character(indeces[i])]]
if (!is.null(i.eps)) {
i.eps[, "from"] = temp.index
temp.result = naive.optimize.embedding(temp.embedding,
config, i.eps)
}
result[, indeces[i]] = temp.result[, temp.index]
}
}
colnames(result) = g$names
t(result)
}

With these two changes predict() now runs without error and returns values. I am not sure if there are deeper issues with predicting with n_components=1, or if these two changes are sufficient.

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS: /mnt/drive2/r-project/R-3.6.1/lib/libRblas.so
LAPACK: /mnt/drive2/r-project/R-3.6.1/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics utils datasets grDevices methods base

other attached packages:
[1] umap_0.2.3 colorspace_1.4-1

loaded via a namespace (and not attached):
[1] compiler_3.6.1 Matrix_1.2-17 tools_3.6.1 reticulate_1.13
[5] Rcpp_1.0.2 RSpectra_0.15-0 grid_3.6.1 jsonlite_1.6
[9] openssl_1.4.1 lattice_0.20-38 askpass_1.1

@tkonopka
Copy link
Owner

Thanks for pointing that out. Yes, those two drop=FALSE will fix this. Would you like to make a pull request, or should I go ahead and edit?

@JenniferSLyon
Copy link
Contributor Author

You can just go ahead and edit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants