-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tensorly.contrib.sparse module #77
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add docstrings for all public methods - A few style cleanups - Explicitly import things into top-level namespace - Remove a few unnecessary backend methods.
Having these as backend specific was unnecessary. - Create `tensorly.testing` - Move all test imports to `tensorly.testing`. - Use absolute imports for test imports. For tests this makes more sense than relative imports, and is standard practice in the numerical python ecosystem.
- Use classes, to hopefully make the backend implementations clearer for others. - Add ability to set backend for all threads. Default is still thread/context local, but we may want to change that later.
Example In [1]: %load frostt.py
...: import requests
...: import os
...: import gzip
...: import numpy as np
...: import sparse
...:
...:
...: DATA_DIR = './_tensor-data/'
...:
...:
...: def download_file(url, local_path=DATA_DIR):
...: local_filename = url.split('/')[-1]
...: path = local_path + local_filename
...: r = requests.get(url, stream=True)
...: total_size = int(r.headers.get('content-length', 0))
...: with open(path, 'wb') as f:
...: chunk_size = 32 * 1024
...: for chunk in r.iter_content(chunk_size):
...: if chunk:
...: f.write(chunk)
...: return path
...:
...:
...: def frostt(descriptor, data_dir=DATA_DIR):
...: if data_dir == DATA_DIR:
...: try:
...: os.makedirs(DATA_DIR)
...: except FileExistsError:
...: pass
...: files = os.listdir(data_dir)
...: if descriptor + '.tns.gz' in files:
...: return read_dataset(data_dir + descriptor + '.tns.gz',
...: format=lambda coords, values: (coords - 1, values))
...: prefix = 'https://s3.us-east-2.amazonaws.com/frostt/frostt_data/'
...: url = prefix + descriptor + '/' + descriptor + '.tns.gz'
...: download_file(url, local_path=data_dir)
...: return frostt(descriptor, data_dir=data_dir)
...:
...:
...: def read_dataset(filename, format=None):
...: with gzip.open(filename, 'rb') as f:
...: raw = f.readlines()
...: first_row = [float(x) for x in raw[0].split(b' ')]
...: num_coords = len(first_row) - 1
...: medium_rare = list(map(lambda line: line.strip(b'\n').split(b' '), raw))
...: coords = (int(x) for line in medium_rare for x in line[:-1])
...: values = (float(line[-1]) for line in medium_rare)
...: coords = np.fromiter(coords, dtype=int).reshape(-1, num_coords)
...: values = np.fromiter(values, dtype=float)
...: if format:
...: coords, values = format(coords, values)
...: return sparse.COO(coords.T, data=values)
In [2]: data = frostt('nips')
In [3]: data.nbytes / 1e9 # Sparse memory used in GB
Out[3]: 0.12406436
In [4]: data.size * 8 / 1e9 # Memory used if a dense array
Out[4]: 13559.812193664
In [5]: from tensorly.contrib.sparse.decomposition import partial_tucker
In [6]: %%time
...: core, factors = partial_tucker(data, [1, 2],
...: rank=[5, 5, 100, 17],
...: verbose=True, init='random',
...: tol=1e-3)
...:
reconsturction error=0.9934568578063805, variation=0.00014067785852067693.
converged in 2 iterations.
CPU times: user 3min 4s, sys: 53.6 s, total: 3min 58s
Wall time: 3min 25s
In [7]: core
Out[7]: <COO: shape=(2482, 5, 5, 17), dtype=float64, nnz=62050, fill_value=0.0>
In [8]: factors
Out[8]:
[array([[-5.91222626e-18, 2.89827035e-18, -2.75691272e-19,
-8.78649988e-18, 3.94111419e-18],
[-5.02446340e-18, -1.55209758e-09, 2.90628869e-08,
1.34380988e-06, 1.68727465e-05],
[ 7.39479895e-18, -1.79811939e-18, 9.82153292e-19,
1.40760063e-17, -2.35939229e-18],
...,
[-4.79033419e-18, -4.76625667e-08, 6.99296165e-09,
5.31161114e-06, -2.72788286e-06],
[-4.90639724e-18, -9.82272650e-18, 2.91640399e-19,
-2.72238732e-18, -1.30575311e-17],
[-7.96563209e-18, -6.38409970e-16, 2.77030659e-16,
2.90861929e-13, -1.38441587e-13]]),
array([[-2.17508562e-04, -2.20062345e-04, 1.71913582e-04,
-2.69793852e-04, 4.48704964e-04],
[-5.76951439e-04, -7.41918691e-04, 5.09510767e-04,
-6.76461602e-04, -8.46511610e-04],
[-5.74204776e-05, -6.89232270e-05, 5.43752349e-05,
-1.39790241e-05, -3.00200107e-05],
...,
[-1.17397904e-09, -1.19689398e-09, 9.17598542e-10,
-1.32693291e-09, -5.93684923e-10],
[-1.59895405e-04, -1.37067503e-04, 1.10396298e-05,
-3.79328786e-05, 1.03508722e-04],
[-9.26775928e-07, -1.01001915e-06, 5.39128131e-07,
-9.26510976e-07, 1.09824611e-06]])] |
I gave a demo of this functionality today, the notebook used can be found here if you're interested: https://gist.github.com/jcrist/f7f0682ed01f12e96f9a40d8862b2477. |
Thanks for sharing - Awesome notebook! |
This was referenced Oct 23, 2018
Merged
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a
tensorly.contrib.sparse
module, mirroring thetensorly
namespace, but with sparse functionality instead of dense. This builds on #76, only the last commit is for this PR.I'm not 100% happy with the mechanism here, but it's easy with #76 (which I think is a good change regardless if it's used for this). The gist is that the sparse functions are simple wrappers around the normal tensorly functions, using the relevant sparse backend instead of the dense backend. This means that you need to use
tensorly.contrib.sparse.unfold
instead oftensorly.unfold
, but the sparse versions are just wrapped versions of the dense ones.So far only the numpy backend implements sparse functionality, using the pydata/sparse library (note that you need master to try this). All backend methods are supported, as well as all top-level tensorly methods (e.g. everything wrapped in the
tensorly.contrib.sparse
namespace).I also included tucker decomposition in the
tensorly.contrib.sparse.decomposition
namespace. For now this is a wrapper around the dense version (as above), but could be replaced with a sparse-specific implementation later.Supersedes #64.
Diff of only the changes in this PR: aa076e5