Handling missing data in decomposition #4

nipunbatra · 2017-04-07T01:13:32Z

Hi, One of the use case of matrix and tensor factorization is in movie recommendation where the matrix/tensor are sparse. I tried Tensorly with missing data and it fails.

I was wondering if handling missing data functionality can be added for the decomposition routines. I wrote a couple of blog posts on how we can handle missing entries for matrix factorization when using least squares based implementation; and if we use gradient descent based solutions.

JeanKossaifi · 2017-04-19T22:27:32Z

Nice posts. Would you like to create a pull request to add support for missing values?

Currently the Robust Tensor PCA handles missing values but Tucker and CP don't.
There are a number of things that we want to add, like having the option of choosing the solver as @ahwillia suggested.

nipunbatra · 2017-04-20T14:18:15Z

I'd love to do a PR. Sadly, at this point of time, I don't know where the masking would need to be done in the code. Would you have a clue? If the factorisation can be reduced to least squares, this should be trivial.

ahwillia · 2017-10-19T17:09:44Z

Just FYI - I have (scipy/numpy) code that handles this (see link below). Agree it would be a nice addition to tensorly!

https://github.com/ahwillia/tensortools/blob/master/tensortools/least_squares.py

JeanKossaifi · 2017-10-19T18:32:05Z

Agreed, let's make that happen!

Robust tensor PCA in TensorLy already handles missing values, ideally this should be the case for all decompositions.

ShivangiM · 2018-03-01T21:11:10Z

Hi! Has it been worked on? If not I would like to start working on it

JeanKossaifi · 2018-03-02T10:45:20Z

You're welcome to take a crack at it @ShivangiM!

JeanKossaifi · 2018-08-08T11:28:26Z

Hi @ShivangiM, any luck with this?

ShivangiM · 2018-08-10T17:52:16Z

@JeanKossaifi not yet, been busy lately.

jkjk82 · 2018-11-21T16:05:53Z

Hi all

I am wondering if robust_pca has been implemented as intended in terms of handling the missing values.

I understand the requested format for the missing value mask, but it seems that in the underlying data array X, the missing data cannot be a nan value, so it seems that you have to use a numerical value for missing data points in X.

However, I have noticed that the results are sensitive to the particular numerical value used for missing points, which I think cannot be the intended behavior? Is there an assumed value that missing points must have?

Sorry for lack of code, am on a mobile as not allowed GitHub at work!

milanlanlan · 2019-11-24T15:34:50Z

Hi, I am trying to use CP for decomposition on my experiment (a part of data missing), and I notice that function parafac provides parameter "mask" for handling the missing values.

I am successful when the tensor is 2-dimensions, e.g. tl.tensor([[1., 2.], [3., 4.]]), and the mask array is same with the tensor while its value is 0/1.

However, when I repeat this process when the tensor is in higher-dimension, e.g. tl.tensor([ [[1., 2.], [3., 4.]], [[5.,6.],[7.,8.]] ]), it doesn't work.

The error is as followed:

Traceback (most recent call last):
  File "t.py", line 40, in <module>
    factors = parafac(X, rank=2, mask=kk)
  File "/home/a/Documents/tensorly/tensorly/decomposition/candecomp_parafac.py", line 185, in parafac
    tensor = tensor*mask + tl.kruskal_to_tensor((None, factors), mask=1-mask)
  File "/home/a/Documents/tensorly/tensorly/kruskal_tensor.py", line 188, in kruskal_to_tensor
    full_tensor = T.sum(khatri_rao([factors[0]*weights]+factors[1:], mask=mask), axis=1)
  File "/home/a/Documents/tensorly/tensorly/tenalg/_khatri_rao.py", line 98, in khatri_rao
    return T.kr(matrices, weights=weights, mask=mask)
  File "/home/a/Documents/tensorly/tensorly/backend/__init__.py", line 160, in inner
    return _get_backend_method(name)(*args, **kwargs)
  File "/home/a/Documents/tensorly/tensorly/backend/numpy_backend.py", line 69, in kr
    return np.einsum(operation, *matrices).reshape((-1, n_columns))*mask
ValueError: operands could not be broadcast together with shapes (16,2) (2,2,2,2)

It seems that the direct problem is in
np.einsum(operation, *matrices).reshape((-1, n_columns))*mask.
I doubt that it is because matrix multi need to use np.dot() and change this code into
np.dot(np.einsum(operation, *matrices).reshape((-1, n_columns)),mask)
and the problem becomes:

Traceback (most recent call last):
  File "t.py", line 40, in <module>
    factors = parafac(X, rank=2, mask=kk)
  File "/home/a/Documents/tensorly/tensorly/decomposition/candecomp_parafac.py", line 185, in parafac
    tensor = tensor*mask + tl.kruskal_to_tensor((None, factors), mask=1-mask)
  File "/home/a/Documents/tensorly/tensorly/kruskal_tensor.py", line 190, in kruskal_to_tensor
    return fold(full_tensor, 0, shape)
  File "/home/a/Documents/tensorly/tensorly/base.py", line 77, in fold
    return T.moveaxis(T.reshape(unfolded_tensor, full_shape), 0, mode)
  File "/home/a/Documents/tensorly/tensorly/backend/__init__.py", line 160, in inner
    return _get_backend_method(name)(*args, **kwargs)
  File "<__array_function__ internals>", line 6, in reshape
  File "/home/a/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 301, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "/home/a/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
    return bound(*args, **kwds)
ValueError: cannot reshape array of size 64 into shape (2,2,2,2)

I don't really understand it indeed. What should I do for my goal? I am very hopeful for help and it is better if there is example in code. Thanks very much for anyone's suggestions.

asmeurer · 2019-11-25T18:25:33Z

@milanlanlan I would suggest opening a separate issue report for this. It looks like mask needs to be reshaped, or else multiplied before the reshape (I'm not sure which).

JeanKossaifi · 2020-06-19T13:51:32Z

Fixed by #173

JeanKossaifi added the enhancement label Sep 13, 2018

cohenjer mentioned this issue Nov 27, 2018

Optimization submodule #80

Open

niemand23 mentioned this issue Dec 1, 2019

Masked CP decomposition returns error in numpy backend kr #142

Closed

cohenjer mentioned this issue Mar 20, 2020

[WIP] Tensoptly: Class feature for the Parafac decomposition, and nnls routines #164

Closed

aarmey mentioned this issue Jun 11, 2020

Implement masked Tucker #173

Merged

JeanKossaifi closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling missing data in decomposition #4

Handling missing data in decomposition #4

nipunbatra commented Apr 7, 2017

JeanKossaifi commented Apr 19, 2017

nipunbatra commented Apr 20, 2017

ahwillia commented Oct 19, 2017 •

edited

JeanKossaifi commented Oct 19, 2017

ShivangiM commented Mar 1, 2018

JeanKossaifi commented Mar 2, 2018

JeanKossaifi commented Aug 8, 2018

ShivangiM commented Aug 10, 2018

jkjk82 commented Nov 21, 2018

milanlanlan commented Nov 24, 2019 •

edited

asmeurer commented Nov 25, 2019

JeanKossaifi commented Jun 19, 2020

Handling missing data in decomposition #4

Handling missing data in decomposition #4

Comments

nipunbatra commented Apr 7, 2017

JeanKossaifi commented Apr 19, 2017

nipunbatra commented Apr 20, 2017

ahwillia commented Oct 19, 2017 • edited

JeanKossaifi commented Oct 19, 2017

ShivangiM commented Mar 1, 2018

JeanKossaifi commented Mar 2, 2018

JeanKossaifi commented Aug 8, 2018

ShivangiM commented Aug 10, 2018

jkjk82 commented Nov 21, 2018

milanlanlan commented Nov 24, 2019 • edited

asmeurer commented Nov 25, 2019

JeanKossaifi commented Jun 19, 2020

ahwillia commented Oct 19, 2017 •

edited

milanlanlan commented Nov 24, 2019 •

edited