|
1 | 1 | # Declaring Tensors |
2 | 2 |
|
3 | | -`pytaco.Tensor` objects correspond to mathematical tensors. You can can declare a new tensor by specifying its name, a vector with the size of each dimension and the [storage format](pytensors.md#defining-tensor-formats) that will be used to store the tensor and a [datatype](pytensors.md#tensor-datatypes): |
| 3 | +`pytaco.tensor` objects, which represent mathematical tensors, form the core of |
| 4 | +the TACO Python library. You can can declare a new tensor by specifying the |
| 5 | +sizes of each dimension, the [format](pytensors.md#defining-tensor-formats) |
| 6 | +that will be used to store the tensor, and the |
| 7 | +[datatype](pytensors.md#tensor-datatypes) of the tensor's nonzero elements: |
4 | 8 |
|
5 | 9 | ```python |
6 | | -# Import the pytaco library |
| 10 | +# Import the TACO Python library |
7 | 11 | import pytaco as pt |
8 | | -# Import the storage formats to save some typing |
9 | 12 | from pytaco import dense, compressed |
10 | 13 |
|
11 | | -# Declare a new tensor "A" of double-precision floats with dimensions |
| 14 | +# Declare a new tensor of double-precision floats with dimensions |
12 | 15 | # 512 x 64 x 2048, stored as a dense-sparse-sparse tensor |
13 | | -A = pt.tensor("A", [512, 64, 2048], pt.format([dense, compressed, compressed]), pt.float64) |
14 | | -``` |
15 | | - |
16 | | -The name of the tensor can be omitted, in which case taco will assign an arbitrary name to the tensor: |
17 | | -```python |
18 | | -import pytaco as pt |
19 | | -from pytaco import dense, compressed |
20 | | - |
21 | | -# Declare a tensor with the same dimensions, storage format and type as before |
22 | 16 | A = pt.tensor([512, 64, 2048], pt.format([dense, compressed, compressed]), pt.float64) |
23 | 17 | ``` |
24 | 18 |
|
25 | | -The [datatype](pytensors.md#tensor-datatypes) can also be omitted in which case taco will default to using `pt.float32`: |
26 | | -```python |
27 | | -import pytaco as pt |
28 | | -from pytaco import dense, compressed |
| 19 | +The datatype can be omitted, in which case TACO will default to using |
| 20 | +`pt.float32` to store the tensor's nonzero elements: |
29 | 21 |
|
30 | | -# Declare a tensor with the same dimensions and storage format as before |
| 22 | +```python |
| 23 | +# Declare the same tensor as before |
31 | 24 | A = pt.tensor([512, 64, 2048], pt.format([dense, compressed, compressed])) |
32 | 25 | ``` |
33 | 26 |
|
34 | | -A single format can be given to create a tensor where all dimensions have that format: |
35 | | -```python |
36 | | -import pytaco as pt |
37 | | -from pytaco import dense, compressed |
| 27 | +Instead of specifying a format that is tied to the number of dimensions that a |
| 28 | +tensor has, we can simply specify whether all dimensions are dense or sparse: |
38 | 29 |
|
39 | | -# Declare a dense tensor |
| 30 | +```python |
| 31 | +# Declare a tensor where all dimensions are dense |
40 | 32 | A = pt.tensor([512, 64, 2048], dense) |
41 | 33 |
|
42 | | -# Declare a compressed tensor |
| 34 | +# Declare a tensor where all dimensions are sparse |
43 | 35 | B = pt.tensor([512, 64, 2048], compressed) |
44 | 36 | ``` |
45 | 37 |
|
46 | | -Scalars, which are treated as order-0 tensors, can be declared and initialized with some arbitrary value as demonstrated below: |
47 | | -```python |
48 | | -import pytaco as pt |
49 | | -from pytaco import dense, compressed |
| 38 | +Scalars, which correspond to tensors that have zero dimension, can be declared |
| 39 | +and initialized with an arbitrary value as demonstrated below: |
50 | 40 |
|
| 41 | +```python |
51 | 42 | # Declare a scalar |
52 | 43 | aplha = pt.tensor(42.0) |
53 | 44 | ``` |
54 | 45 |
|
55 | 46 | # Defining Tensor Formats |
56 | 47 |
|
57 | | -Conceptually, you can think of a tensor as a tree with each level (excluding the root) corresponding to a dimension of the tensor. Each path from the root to a leaf node represents a tensor coordinate and its corresponding value. Which dimension each level of the tree corresponds to is determined by the order in which dimensions of the tensor are stored. |
| 48 | +Conceptually, you can think of a tensor as a tree where each level (excluding |
| 49 | +the root) corresponding to a dimension of the tensor. Each path from the root |
| 50 | +to a leaf node represents the coordinates of a tensor element and its |
| 51 | +corresponding value. Which dimension of the tensor each level of the tree |
| 52 | +corresponds to is determined by the order in which tensor dimensions are |
| 53 | +stored. |
| 54 | + |
| 55 | +TACO uses a novel scheme that can describe different storage formats for a |
| 56 | +tensor by specifying the order in which tensor dimensions are stored and |
| 57 | +whether each dimension is sparse or dense. A sparse (compressed) dimension |
| 58 | +stores only the subset of the dimension that contains non-zero values, using |
| 59 | +index arrays that are found in the compressed sparse row (CSR) matrix format. |
| 60 | +A dense dimension, on the other hand, conceptually stores both zeros and |
| 61 | +non-zeros. This scheme is flexibile enough to express many commonly-used |
| 62 | +tensor storage formats: |
58 | 63 |
|
59 | | -taco uses a novel scheme that can describe different storage formats for any tensor by specifying the order in which tensor dimensions are stored and whether each dimension is sparse or dense. A sparse dimension stores only the subset of the dimension that contains non-zero values and is conceptually similar to the index arrays used in the compressed sparse row (CSR) matrix format, while a dense dimension stores both zeros and non-zeros. As demonstrated below, this scheme is flexibile enough to express many commonly-used matrix storage formats. |
60 | | - |
61 | | -You can define a new tensor storage format by creating a `pytaco.format` object. The constructor for `pytaco.format` takes as arguments a list specifying the type of each dimension and (optionally) a list specifying the order in which dimensions are to be stored, as seen below: |
62 | 64 | ```python |
63 | 65 | import pytaco as pt |
64 | | -from pytaco import dense, compressed, format |
65 | | -dm = format([dense, dense]) # (Row-major) dense matrix |
66 | | -csr = format([dense, compressed]) # Compressed sparse row matrix |
67 | | -csc = format([dense, compressed], [1, 0]) # Compressed sparse column matrix |
68 | | -dcsr = format([compressed, compressed], [1, 0]) # Doubly compressed sparse column matrix |
69 | | -``` |
70 | | - |
71 | | -```pytaco``` provides common formats (csr, csc and csf) by default and can be used by simply typing ```pt.csr```, ```pt.csc``` or ```pt.csf```. |
72 | | - |
73 | | -# Tensor Datatypes |
74 | | - |
75 | | -Tensors can be of 10 different datatypes. The following are the possible tensor datatypes: |
76 | | - |
77 | | -Signed Integers: |
78 | | - |
79 | | -```pytaco.int8``` |
80 | | - |
81 | | -```pytaco.int16``` |
82 | | - |
83 | | -```pytaco.int32``` |
84 | | - |
85 | | -```pytaco.int64``` |
86 | | - |
87 | | -Unsigned Integers: |
88 | | - |
89 | | -```pytaco.uint8``` |
90 | | - |
91 | | -```pytaco.uint16``` |
92 | | - |
93 | | -```pytaco.uint32``` |
94 | | - |
95 | | -```pytaco.uint64``` |
| 66 | +from pytaco import dense, compressed |
96 | 67 |
|
97 | | -Floating point precision: |
| 68 | +dm = pt.format([dense, dense]) # (Row-major) dense matrix format |
| 69 | +csr = pt.format([dense, compressed]) # Compressed sparse row matrix format |
| 70 | +csc = pt.format([dense, compressed], [1, 0]) # Compressed sparse column matrix format |
| 71 | +dcsr = pt.format([compressed, compressed], [1, 0]) # Doubly compressed sparse column matrix format |
| 72 | +csf = pt.format([compressed, compressed, compressed]) # Compressed sparse fiber tensor format |
| 73 | +``` |
98 | 74 |
|
99 | | -```pytaco.float32``` |
| 75 | +As demonstrated above, you can define a new tensor storage format by creating a |
| 76 | +`pytaco.format` object. This requires specifying whether each tensor dimension |
| 77 | +is dense or sparse as well as (optionally) the order in which dimensions should |
| 78 | +be stored. TACO also predefines some common tensor formats (including |
| 79 | +```pt.csr```, ```pt.csc``` and ```pt.csf```) that you can use out of the box. |
100 | 80 |
|
101 | | -```pytaco.float``` |
| 81 | +# Initializing Tensors |
102 | 82 |
|
103 | | -Double precision: |
| 83 | +Tensors can be made by using python indexing syntax. For example, one may write |
| 84 | +the following: You can initialize a tensor by calling its `insert` method to |
| 85 | +add a nonzero element to the tensor. The `insert` method takes two arguments: |
| 86 | +a list specifying the coordinates of the nonzero element to be added and the |
| 87 | +value to be inserted at that coordinate: |
104 | 88 |
|
105 | | -```pytaco.float64``` |
| 89 | +```python |
| 90 | +# Declare a sparse tensor |
| 91 | +A = pt.tensor([512, 64, 2048], compressed) |
106 | 92 |
|
107 | | -```pytaco.double``` |
| 93 | +# Set A(0, 1, 0) = 42.0 |
| 94 | +A.insert([0, 1, 0], 42.0) |
| 95 | +``` |
108 | 96 |
|
109 | | -# Initializing Tensors |
| 97 | +If multiple elements are inserted at the same coordinates, they are summed |
| 98 | +together: |
110 | 99 |
|
111 | | -Tensors can be made by using python indexing syntax. For example, one may write the following: |
112 | 100 | ```python |
113 | | -import pytaco as pt |
114 | | -from pytaco import dense, compressed |
115 | | - |
116 | | -# Declare a dense tensor |
| 101 | +# Declare a sparse tensor |
117 | 102 | A = pt.tensor([512, 64, 2048], compressed) |
118 | 103 |
|
119 | | -# Set location (0, 1, 0) in A to 42.0 |
120 | | -A[0, 1, 0] = 42.0 |
| 104 | +# Set A(0, 1, 0) = 42.0 + 24.0 = 66.0 |
| 105 | +A.insert([0, 1, 0], 42.0) |
| 106 | +A.insert([0, 1, 0], 24.0) |
121 | 107 | ``` |
122 | 108 |
|
123 | | -The insert operator adds the inserted non-zeros to a temporary buffer. Before a tensor can actually be used in a computation, it is automatcally packed. |
124 | | - |
125 | | -For most cases, this is not necessary but you may also invoke the `pack` method to compress the tensor into the storage format that was specified after all values have been inserted. |
126 | | - |
127 | | -NOTE: Multidimensional indexing (as used with lists) are NOT supported. For example, the following is invalid code: |
| 109 | +The `insert` method adds the inserted nonzero element to a temporary buffer. |
| 110 | +Before a tensor can actually be used in a computation though, the `pack` method |
| 111 | +must be invoked to pack the tensor into the storage format that was specified |
| 112 | +when the tensor was first declared. TACO will automatically do this |
| 113 | +immediately before the tensor is used in a computation. You can also manually |
| 114 | +invoke `pack` though if you need full control over when exactly that is done: |
128 | 115 |
|
129 | 116 | ```python |
130 | | -import pytaco as pt |
131 | | -from pytaco import dense, compressed |
| 117 | +A.pack() |
| 118 | +``` |
132 | 119 |
|
133 | | -# Declare a dense tensor |
134 | | -A = pt.tensor([512, 64, 2048], compressed) |
| 120 | +You can then iterate over the nonzero elements of the tensor as follows: |
135 | 121 |
|
136 | | -# INVALID STATEMENT |
137 | | -A[0][1][0] = 42.0 |
| 122 | +```python |
| 123 | +for elem in A: |
| 124 | + print(elem) |
138 | 125 | ``` |
139 | 126 |
|
140 | | -# Loading Tensors from File |
| 127 | +# File I/O |
141 | 128 |
|
142 | | -Rather than manually invoking building a tensor, you can load tensors directly from file by calling `pytaco.read` as demonstrated below: |
| 129 | +Rather than manually constructing a tensor, you can load tensors directly from |
| 130 | +file by invoking the `pytaco.read` function: |
143 | 131 |
|
144 | 132 | ```python |
145 | | -import pytaco as pt |
146 | | -from pytaco import dense, compressed, format |
147 | | - |
148 | | -# Load a dense-sparse-sparse tensor from file A.tns |
149 | | -A = pt.read("A.tns", format([dense, compressed, compressed])) |
| 133 | +# Load a dense-sparse-sparse tensor from file "A.tns" |
| 134 | +A = pt.read("A.tns", pt.format([dense, compressed, compressed])) |
150 | 135 | ``` |
151 | 136 |
|
152 | | -By default, `pytaco.read` returns a packed tensor. You can optionally pass a Boolean flag as an argument to indicate whether the returned tensor should be packed or not: |
| 137 | +By default, `pytaco.read` returns a tensor that has already been packed into |
| 138 | +the specified storage format. You can optionally pass a Boolean flag as an |
| 139 | +argument to indicate whether the returned tensor should be packed or not: |
153 | 140 |
|
154 | 141 | ```python |
155 | | -import pytaco as pt |
156 | | -from pytaco import dense, compressed, format |
157 | | - |
158 | | -# Load an unpacked tensor from file A.tns |
| 142 | +# Load an unpacked tensor from file "A.tns" |
159 | 143 | A = pt.read("A.tns", format([dense, compressed, compressed]), false) |
160 | 144 | ``` |
161 | | -NOTE: the tensor will be packed anyway before any computation is actually performed. |
162 | | - |
163 | 145 |
|
164 | | -Currently, taco supports loading from the following matrix and tensor file formats: |
| 146 | +The loaded tensor will then remain unpacked until the `pack` method is manually |
| 147 | +invoked or a computation that uses the tensor is performed. |
165 | 148 |
|
166 | | -* [Matrix Market (Coordinate) Format (.mtx)](http://math.nist.gov/MatrixMarket/formats.html#MMformat) |
167 | | -* [Rutherford-Boeing Format (.rb)](https://www.cise.ufl.edu/research/sparse/matrices/DOC/rb.pdf) |
168 | | -* [FROSTT Format (.tns)](http://frostt.io/tensors/file-formats.html) |
169 | | - |
170 | | -# Writing Tensors to Files |
171 | | - |
172 | | -You can also write a (packed) tensor directly to file by calling `pytaco.write`, as demonstrated below: |
| 149 | +You can also write a tensor directly to file by invoking the `pytaco.write` |
| 150 | +function: |
173 | 151 |
|
174 | 152 | ```python |
175 | | -import pytaco as pt |
176 | | - |
177 | | -A = pt.tensor([512, 64, 2048], compressed) |
178 | | -A[0, 1, 0] = 42.0 |
179 | | -A[1, 1, 1] = 77 |
180 | | -pt.write("A.tns", A); # Write tensor A to file A.tns |
| 153 | +# Write tensor A to file "A.tns" |
| 154 | +pt.write("A.tns", A) |
181 | 155 | ``` |
182 | 156 |
|
183 | | -`pytaco.write` supports the same set of matrix and tensor file formats as `pytaco.read`. |
| 157 | +TACO supports loading tensors from and storing tensors to the following file |
| 158 | +formats: |
| 159 | + |
| 160 | +* [Matrix Market (Coordinate) Format (.mtx)](http://math.nist.gov/MatrixMarket/formats.html#MMformat) |
| 161 | +* [Rutherford-Boeing Format (.rb)](https://www.cise.ufl.edu/research/sparse/matrices/DOC/rb.pdf) |
| 162 | +* [FROSTT Format (.tns)](http://frostt.io/tensors/file-formats.html) |
184 | 163 |
|
185 | | -# I/O with Numpy or Scipy |
| 164 | +# NumPy and SciPy I/O |
186 | 165 |
|
187 | | -Tensors can be initialized with either numpy arrays or scipy sparse CSC or CSR matrices. As such, we can use the I/O from numpy and scipy and feed the data into pytaco by initializing a tensor. |
| 166 | +Tensors can also be initialized with either NumPy arrays or SciPy sparse (CSR |
| 167 | +or CSC) matrices: |
188 | 168 |
|
189 | 169 | ```python |
190 | 170 | import pytaco as pt |
191 | 171 | import numpy as np |
192 | 172 | import scipy.sparse |
193 | 173 |
|
194 | | -# Assuming matrix is CSR |
| 174 | +# Assuming SciPy matrix is stored in CSR |
195 | 175 | sparse_matrix = scipy.sparse.load_npz('sparse_matrix.npz') |
196 | 176 |
|
197 | | -# Pass data into taco for use |
198 | | -taco_tensor = pt.from_scipy_csr(sparse_matrix) |
| 177 | +# Cast the matrix as a TACO tensor (also stored in CSR) |
| 178 | +taco_tensor = pt.from_sp_csr(sparse_matrix) |
199 | 179 |
|
200 | | -# We can also load a numpy array |
| 180 | +# We can also load a NumPy array |
201 | 181 | np_array = np.load('arr.npy') |
202 | 182 |
|
203 | | -# And initialize a tensor from this array |
204 | | -dense_tensor = pt.from_numpy_array(np_array) |
| 183 | +# And initialize a TACO tensor from this array |
| 184 | +dense_tensor = pt.from_array(np_array) |
205 | 185 | ``` |
206 | 186 |
|
| 187 | +We can also export TACO tensors to either NumPy arrays or SciPy sparse |
| 188 | +matrices: |
207 | 189 |
|
| 190 | +```python |
| 191 | +# Convert the tensor to a SciPy CSR matrix |
| 192 | +sparse_matrix = taco_tensor.to_sp_csr() |
208 | 193 |
|
209 | | - |
| 194 | +# Convert the tensor to a NumPy array |
| 195 | +np_array = dense_tensor.to_array() |
| 196 | +``` |
0 commit comments