predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

JenniferSLyon · 2019-08-21T21:46:48Z

Based on the example in the vignette:

iris.data = iris[, grep("Sepal|Petal", colnames(iris))]
iris.labels = iris[, "Species"]
custom.config = umap.defaults
custom.config$n_components = 1
iris.umap = umap(iris.data, config=custom.config)

set.seed(19)
iris.wnoise = iris.data + matrix(rnorm(150*40, 0, 0.1), ncol=4)
colnames(iris.wnoise) = colnames(iris.data)
iris.wnoise.umap = predict(iris.umap, iris.wnoise)

Error in colMeans(embedding[knn.indexes[i, ], ]) :
'x' must be an array of at least two dimensions

traceback()
6: stop("'x' must be an array of at least two dimensions")
5: colMeans(embedding[knn.indexes[i, ], ])
4: make.initial.spectator.embedding(umap$layout, spectator.knn$indexes)
3: implementations[[method]](object, data)
2: predict.umap(iris.umap, iris.wnoise)
1: predict(iris.umap, iris.wnoise)

Looking at make.initial.spectator.embedding, it looks like a drop=F
is missing (line with ## <-----):

trace(umap:::make.initial.spectator.embedding, edit=T)

function (embedding, knn.indexes)
{
result = matrix(0, nrow = nrow(knn.indexes), ncol = ncol(embedding))
rownames(result) = rownames(knn.indexes)
knn.indexes = knn.indexes[, 2:ncol(knn.indexes), drop = FALSE]
for (i in 1:nrow(result)) {
result[i, ] = colMeans(embedding[knn.indexes[i, ], ,
drop = FALSE]) ## <------- added drop = FALSE
}
result
}

This change leads to a new error:

iris.wnoise.umap = predict(iris.umap, iris.wnoise)
Error in temp.embedding[, temp.index] <- result[, indeces[i]] :
incorrect number of subscripts on matrix

traceback()
4: naive.simplicial.set.embedding(graph, embedding, config,
fix.observations = V)
3: implementations[[method]](object, data)
2: predict.umap(iris.umap, iris.wnoise)
1: predict(iris.umap, iris.wnoise)

And it also looks like a drop=F is missing in naive.simlicial.set.embedding:

naive.simplicial.set.embedding
function (g, embedding, config, fix.observations = NULL)
{
if (config$n_epochs == 0) {
return(embedding)
}
result = t(embedding)
gmax = max(g$coo[, "value"])
g$coo[g$coo[, "value"] < gmax/config$n_epochs, "value"] = 0
g = reduce.coo(g)
eps = cbind(g$coo, eps = make.epochs.per.sample(g$coo[, "value"],
config$n_epochs))
if (is.null(fix.observations)) {
result = naive.optimize.embedding(result, config, eps)
}
else {
eps = eps[eps[, "from"] > fix.observations, ]
indeces = seq(fix.observations + 1, ncol(result))
seeds = column.seeds(result[, indeces, drop = FALSE],
key = config$transform_state)
temp.index = fix.observations + 1
temp.embedding = result[, seq_len(fix.observations +
1), drop = FALSE] ## <----- added drop=FALSE
temp.eps = split.data.frame(eps, eps[, "from"])
for (i in seq_along(indeces)) {
temp.embedding[, temp.index] = result[, indeces[i]]
set.seed(seeds[i])
i.eps = temp.eps[[as.character(indeces[i])]]
if (!is.null(i.eps)) {
i.eps[, "from"] = temp.index
temp.result = naive.optimize.embedding(temp.embedding,
config, i.eps)
}
result[, indeces[i]] = temp.result[, temp.index]
}
}
colnames(result) = g$names
t(result)
}

With these two changes predict() now runs without error and returns values. I am not sure if there are deeper issues with predicting with n_components=1, or if these two changes are sufficient.

sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS: /mnt/drive2/r-project/R-3.6.1/lib/libRblas.so
LAPACK: /mnt/drive2/r-project/R-3.6.1/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics utils datasets grDevices methods base

other attached packages:
[1] umap_0.2.3 colorspace_1.4-1

loaded via a namespace (and not attached):
[1] compiler_3.6.1 Matrix_1.2-17 tools_3.6.1 reticulate_1.13
[5] Rcpp_1.0.2 RSpectra_0.15-0 grid_3.6.1 jsonlite_1.6
[9] openssl_1.4.1 lattice_0.20-38 askpass_1.1

tkonopka · 2019-08-22T05:55:06Z

Thanks for pointing that out. Yes, those two drop=FALSE will fix this. Would you like to make a pull request, or should I go ahead and edit?

JenniferSLyon · 2019-08-22T13:35:58Z

You can just go ahead and edit.

tkonopka closed this as completed Aug 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

JenniferSLyon commented Aug 21, 2019

tkonopka commented Aug 22, 2019

JenniferSLyon commented Aug 22, 2019

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

predict() on a umap object with n_components=1 gets two errors -- Looks like missing drop=F #10

Comments

JenniferSLyon commented Aug 21, 2019

tkonopka commented Aug 22, 2019

JenniferSLyon commented Aug 22, 2019