Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling missing data in decomposition #4

Closed
nipunbatra opened this issue Apr 7, 2017 · 12 comments
Closed

Handling missing data in decomposition #4

nipunbatra opened this issue Apr 7, 2017 · 12 comments

Comments

@nipunbatra
Copy link

Hi, One of the use case of matrix and tensor factorization is in movie recommendation where the matrix/tensor are sparse. I tried Tensorly with missing data and it fails.

I was wondering if handling missing data functionality can be added for the decomposition routines. I wrote a couple of blog posts on how we can handle missing entries for matrix factorization when using least squares based implementation; and if we use gradient descent based solutions.

@JeanKossaifi
Copy link
Member

Nice posts. Would you like to create a pull request to add support for missing values?

Currently the Robust Tensor PCA handles missing values but Tucker and CP don't.
There are a number of things that we want to add, like having the option of choosing the solver as @ahwillia suggested.

@nipunbatra
Copy link
Author

I'd love to do a PR. Sadly, at this point of time, I don't know where the masking would need to be done in the code. Would you have a clue? If the factorisation can be reduced to least squares, this should be trivial.

@ahwillia
Copy link

ahwillia commented Oct 19, 2017

Just FYI - I have (scipy/numpy) code that handles this (see link below). Agree it would be a nice addition to tensorly!

https://github.com/ahwillia/tensortools/blob/master/tensortools/least_squares.py

@JeanKossaifi
Copy link
Member

Agreed, let's make that happen!

Robust tensor PCA in TensorLy already handles missing values, ideally this should be the case for all decompositions.

@ShivangiM
Copy link
Contributor

Hi! Has it been worked on? If not I would like to start working on it

@JeanKossaifi
Copy link
Member

You're welcome to take a crack at it @ShivangiM!

@JeanKossaifi
Copy link
Member

Hi @ShivangiM, any luck with this?

@ShivangiM
Copy link
Contributor

@JeanKossaifi not yet, been busy lately.

@jkjk82
Copy link

jkjk82 commented Nov 21, 2018

Hi all

I am wondering if robust_pca has been implemented as intended in terms of handling the missing values.

I understand the requested format for the missing value mask, but it seems that in the underlying data array X, the missing data cannot be a nan value, so it seems that you have to use a numerical value for missing data points in X.

However, I have noticed that the results are sensitive to the particular numerical value used for missing points, which I think cannot be the intended behavior? Is there an assumed value that missing points must have?

Sorry for lack of code, am on a mobile as not allowed GitHub at work!

@milanlanlan
Copy link

milanlanlan commented Nov 24, 2019

Hi, I am trying to use CP for decomposition on my experiment (a part of data missing), and I notice that function parafac provides parameter "mask" for handling the missing values.

I am successful when the tensor is 2-dimensions, e.g. tl.tensor([[1., 2.], [3., 4.]]), and the mask array is same with the tensor while its value is 0/1.

However, when I repeat this process when the tensor is in higher-dimension, e.g. tl.tensor([ [[1., 2.], [3., 4.]], [[5.,6.],[7.,8.]] ]), it doesn't work.

The error is as followed:

Traceback (most recent call last):
  File "t.py", line 40, in <module>
    factors = parafac(X, rank=2, mask=kk)
  File "/home/a/Documents/tensorly/tensorly/decomposition/candecomp_parafac.py", line 185, in parafac
    tensor = tensor*mask + tl.kruskal_to_tensor((None, factors), mask=1-mask)
  File "/home/a/Documents/tensorly/tensorly/kruskal_tensor.py", line 188, in kruskal_to_tensor
    full_tensor = T.sum(khatri_rao([factors[0]*weights]+factors[1:], mask=mask), axis=1)
  File "/home/a/Documents/tensorly/tensorly/tenalg/_khatri_rao.py", line 98, in khatri_rao
    return T.kr(matrices, weights=weights, mask=mask)
  File "/home/a/Documents/tensorly/tensorly/backend/__init__.py", line 160, in inner
    return _get_backend_method(name)(*args, **kwargs)
  File "/home/a/Documents/tensorly/tensorly/backend/numpy_backend.py", line 69, in kr
    return np.einsum(operation, *matrices).reshape((-1, n_columns))*mask
ValueError: operands could not be broadcast together with shapes (16,2) (2,2,2,2) 

It seems that the direct problem is in
np.einsum(operation, *matrices).reshape((-1, n_columns))*mask.
I doubt that it is because matrix multi need to use np.dot() and change this code into
np.dot(np.einsum(operation, *matrices).reshape((-1, n_columns)),mask)
and the problem becomes:

Traceback (most recent call last):
  File "t.py", line 40, in <module>
    factors = parafac(X, rank=2, mask=kk)
  File "/home/a/Documents/tensorly/tensorly/decomposition/candecomp_parafac.py", line 185, in parafac
    tensor = tensor*mask + tl.kruskal_to_tensor((None, factors), mask=1-mask)
  File "/home/a/Documents/tensorly/tensorly/kruskal_tensor.py", line 190, in kruskal_to_tensor
    return fold(full_tensor, 0, shape)
  File "/home/a/Documents/tensorly/tensorly/base.py", line 77, in fold
    return T.moveaxis(T.reshape(unfolded_tensor, full_shape), 0, mode)
  File "/home/a/Documents/tensorly/tensorly/backend/__init__.py", line 160, in inner
    return _get_backend_method(name)(*args, **kwargs)
  File "<__array_function__ internals>", line 6, in reshape
  File "/home/a/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 301, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "/home/a/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
    return bound(*args, **kwds)
ValueError: cannot reshape array of size 64 into shape (2,2,2,2)

I don't really understand it indeed. What should I do for my goal? I am very hopeful for help and it is better if there is example in code. Thanks very much for anyone's suggestions.

@asmeurer
Copy link
Member

@milanlanlan I would suggest opening a separate issue report for this. It looks like mask needs to be reshaped, or else multiplied before the reshape (I'm not sure which).

@JeanKossaifi
Copy link
Member

Fixed by #173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants