Add tensorly.contrib.sparse module #77

jcrist · 2018-10-02T01:01:10Z

This adds a tensorly.contrib.sparse module, mirroring the tensorly namespace, but with sparse functionality instead of dense. This builds on #76, only the last commit is for this PR.

I'm not 100% happy with the mechanism here, but it's easy with #76 (which I think is a good change regardless if it's used for this). The gist is that the sparse functions are simple wrappers around the normal tensorly functions, using the relevant sparse backend instead of the dense backend. This means that you need to use tensorly.contrib.sparse.unfold instead of tensorly.unfold, but the sparse versions are just wrapped versions of the dense ones.

So far only the numpy backend implements sparse functionality, using the pydata/sparse library (note that you need master to try this). All backend methods are supported, as well as all top-level tensorly methods (e.g. everything wrapped in the tensorly.contrib.sparse namespace).

I also included tucker decomposition in the tensorly.contrib.sparse.decomposition namespace. For now this is a wrapper around the dense version (as above), but could be replaced with a sparse-specific implementation later.

Supersedes #64.

Diff of only the changes in this PR: aa076e5

- Add docstrings for all public methods - A few style cleanups - Explicitly import things into top-level namespace - Remove a few unnecessary backend methods.

Having these as backend specific was unnecessary. - Create `tensorly.testing` - Move all test imports to `tensorly.testing`. - Use absolute imports for test imports. For tests this makes more sense than relative imports, and is standard practice in the numerical python ecosystem.

- Use classes, to hopefully make the backend implementations clearer for others. - Add ability to set backend for all threads. Default is still thread/context local, but we may want to change that later.

coveralls · 2018-10-02T01:06:15Z

Coverage decreased (-4.05%) to 92.418% when pulling aa076e5 on jcrist:sparse-take-2 into 5ef625f on tensorly:master.

jcrist · 2018-10-02T01:07:29Z

Example

In [1]: %load frostt.py
   ...: import requests
   ...: import os
   ...: import gzip
   ...: import numpy as np
   ...: import sparse
   ...:
   ...:
   ...: DATA_DIR = './_tensor-data/'
   ...:
   ...:
   ...: def download_file(url, local_path=DATA_DIR):
   ...:     local_filename = url.split('/')[-1]
   ...:     path = local_path + local_filename
   ...:     r = requests.get(url, stream=True)
   ...:     total_size = int(r.headers.get('content-length', 0))
   ...:     with open(path, 'wb') as f:
   ...:         chunk_size = 32 * 1024
   ...:         for chunk in r.iter_content(chunk_size):
   ...:             if chunk:
   ...:                 f.write(chunk)
   ...:     return path
   ...:
   ...:
   ...: def frostt(descriptor, data_dir=DATA_DIR):
   ...:     if data_dir == DATA_DIR:
   ...:         try:
   ...:             os.makedirs(DATA_DIR)
   ...:         except FileExistsError:
   ...:             pass
   ...:     files = os.listdir(data_dir)
   ...:     if descriptor + '.tns.gz' in files:
   ...:         return read_dataset(data_dir + descriptor + '.tns.gz',
   ...:                             format=lambda coords, values: (coords - 1, values))
   ...:     prefix = 'https://s3.us-east-2.amazonaws.com/frostt/frostt_data/'
   ...:     url = prefix + descriptor + '/' + descriptor + '.tns.gz'
   ...:     download_file(url, local_path=data_dir)
   ...:     return frostt(descriptor, data_dir=data_dir)
   ...:
   ...:
   ...: def read_dataset(filename, format=None):
   ...:     with gzip.open(filename, 'rb') as f:
   ...:         raw = f.readlines()
   ...:     first_row = [float(x) for x in raw[0].split(b' ')]
   ...:     num_coords = len(first_row) - 1
   ...:     medium_rare = list(map(lambda line: line.strip(b'\n').split(b' '), raw))
   ...:     coords = (int(x) for line in medium_rare for x in line[:-1])
   ...:     values = (float(line[-1]) for line in medium_rare)
   ...:     coords = np.fromiter(coords, dtype=int).reshape(-1, num_coords)
   ...:     values = np.fromiter(values, dtype=float)
   ...:     if format:
   ...:         coords, values = format(coords, values)
   ...:     return sparse.COO(coords.T, data=values)

In [2]: data = frostt('nips')

In [3]: data.nbytes / 1e9  # Sparse memory used in GB
Out[3]: 0.12406436

In [4]: data.size * 8 / 1e9  # Memory used if a dense array
Out[4]: 13559.812193664

In [5]: from tensorly.contrib.sparse.decomposition import partial_tucker

In [6]: %%time
   ...: core, factors = partial_tucker(data, [1, 2],
   ...:                                rank=[5, 5, 100, 17],
   ...:                                verbose=True, init='random',
   ...:                                tol=1e-3)
   ...:
reconsturction error=0.9934568578063805, variation=0.00014067785852067693.
converged in 2 iterations.
CPU times: user 3min 4s, sys: 53.6 s, total: 3min 58s
Wall time: 3min 25s

In [7]: core
Out[7]: <COO: shape=(2482, 5, 5, 17), dtype=float64, nnz=62050, fill_value=0.0>

In [8]: factors
Out[8]:
[array([[-5.91222626e-18,  2.89827035e-18, -2.75691272e-19,
         -8.78649988e-18,  3.94111419e-18],
        [-5.02446340e-18, -1.55209758e-09,  2.90628869e-08,
          1.34380988e-06,  1.68727465e-05],
        [ 7.39479895e-18, -1.79811939e-18,  9.82153292e-19,
          1.40760063e-17, -2.35939229e-18],
        ...,
        [-4.79033419e-18, -4.76625667e-08,  6.99296165e-09,
          5.31161114e-06, -2.72788286e-06],
        [-4.90639724e-18, -9.82272650e-18,  2.91640399e-19,
         -2.72238732e-18, -1.30575311e-17],
        [-7.96563209e-18, -6.38409970e-16,  2.77030659e-16,
          2.90861929e-13, -1.38441587e-13]]),
 array([[-2.17508562e-04, -2.20062345e-04,  1.71913582e-04,
         -2.69793852e-04,  4.48704964e-04],
        [-5.76951439e-04, -7.41918691e-04,  5.09510767e-04,
         -6.76461602e-04, -8.46511610e-04],
        [-5.74204776e-05, -6.89232270e-05,  5.43752349e-05,
         -1.39790241e-05, -3.00200107e-05],
        ...,
        [-1.17397904e-09, -1.19689398e-09,  9.17598542e-10,
         -1.32693291e-09, -5.93684923e-10],
        [-1.59895405e-04, -1.37067503e-04,  1.10396298e-05,
         -3.79328786e-05,  1.03508722e-04],
        [-9.26775928e-07, -1.01001915e-06,  5.39128131e-07,
         -9.26510976e-07,  1.09824611e-06]])]

jcrist · 2018-10-12T19:23:41Z

I gave a demo of this functionality today, the notebook used can be found here if you're interested: https://gist.github.com/jcrist/f7f0682ed01f12e96f9a40d8862b2477.

JeanKossaifi · 2018-10-12T22:36:29Z

Thanks for sharing - Awesome notebook!
Looking forward to have this merged in TensorLy :)

jcrist added 12 commits September 27, 2018 15:12

WIP: refactor backends

6dcb645

Continue refactor

b6cebd4

- Add docstrings for all public methods - A few style cleanups - Explicitly import things into top-level namespace - Remove a few unnecessary backend methods.

Refactor pytorch backend, add flake8

2b79f10

Refactor mxnet backend

15f1d4d

Refactor tensorflow backend

7cf259a

Refactor cupy backend

b9dbf5d

flake

80ee222

Add tests for set_backend/get_backend

9b8ad49

Refactor again

3752779

- Use classes, to hopefully make the backend implementations clearer for others. - Add ability to set backend for all threads. Default is still thread/context local, but we may want to change that later.

Document generic backend methods

cd83529

Test tensorflow backend on travis

404b8bd

This was referenced Oct 2, 2018

Refactor backends #76

Merged

API for sparse tensors #65

Closed

Add numpy sparse backend

aa076e5

jcrist force-pushed the sparse-take-2 branch from 9d2bef3 to aa076e5 Compare October 2, 2018 03:23

This was referenced Oct 23, 2018

Sparse nonnegative Tucker decomposition #79

Open

Optimization submodule #80

Open

Add tensorly.contrib.sparse.tenalg

ff75e67

asmeurer mentioned this pull request Nov 19, 2018

Sparse support #84

Merged

JeanKossaifi merged commit ff75e67 into tensorly:master Feb 5, 2019

JeanKossaifi mentioned this pull request Apr 8, 2019

Sparse array support #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tensorly.contrib.sparse module #77

Add tensorly.contrib.sparse module #77

jcrist commented Oct 2, 2018 •

edited

coveralls commented Oct 2, 2018 •

edited

jcrist commented Oct 2, 2018

jcrist commented Oct 12, 2018

JeanKossaifi commented Oct 12, 2018

Add tensorly.contrib.sparse module #77

Add tensorly.contrib.sparse module #77

Conversation

jcrist commented Oct 2, 2018 • edited

coveralls commented Oct 2, 2018 • edited

jcrist commented Oct 2, 2018

jcrist commented Oct 12, 2018

JeanKossaifi commented Oct 12, 2018

jcrist commented Oct 2, 2018 •

edited

coveralls commented Oct 2, 2018 •

edited