Skip to content
This repository
Fetching contributors…

Cannot retrieve contributors at this time

file 265 lines (233 sloc) 10.793 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265
/// Pending Issues

ID: 2
General Area: Priority Queues
Priority: Medium
Description: TPIE is lacking a good implementation of a general purpose
external priority queue. There is an implementation from TerraFlow, but
this is not in the main TPIE source and there are some concerns about its
performance. It would be nice to examine the performance of the TFlow PQ,
possibly implement some other EM PQueues and put at least one decent
implementation into the main TPIE source
Status: Open as of 1 April 2005

ID: 6
General Area: Priority Queue
Priority: ?
Description: From Herman Haverkort "TPIE philosophy is, apparently, that
the user does not have to worry about how memory is used. This is fine as
long as you have one thing running at a time. But if, for example, you
want to run an algorithm that uses two priority queues in parallel, you
really want to be able to give them each a share of the main memory,
rather then seeing the first one to be initialized take all."
Perhaps this can be solved--in the case of the priority queue--by having
an option in the constructor that specifies the amount of memory to use.
It could default to all available memory. We would need to check other
data structures for this too. Perhaps the B-tree and related tree
structures. I don't think this is a problem for sorting as you can't do
anything else while you are sorting and sort cleans up its memory when it
is done.
Status: Open as of 21 April 2005

ID: 17
General Area: Priority Queue
Kevin writes "can you add the following to the wish list of pqueue: Currently
it uses both x < y and x.getPriorty() < y.getPriority(), which are assumed to
be the same. Ideally it should only use x < y. Then the user can change the
definition of x < y so that we can also have a max pqueue easily"

I agree with Kevin that is does not make sense to require both "<" and
"getPriority()", but I don't agree that we should overload "<" to mean ">"
In this case it makes sense to support a version of the PQ that supports a
comparison operator "<" and a version that supporst a comparison object or
functor (). This is what is done in STL and TPIE sort has a similar
mechanism, except the comparison object has a function called compare().
In the next version of TPIE, this should probably be changed to match
STL-style, but that is a separate wish-item.

Status: Open as of 10 Feb 2006

ID: 10
General Area: Priority Queue
Description: Regarding the TerraFlow Priority Queue, messages are
displayed with printf to standard out. This should probably be sent to a
log file and it should be possible to turn off messages all together.
See Also: wish-list ID 2,11,12
Status: Open as of 21 April 2005

ID: 11
General Area: Priority Queue
Description: Regarding the TerraFlow Priority Queue, it leaves a lot of
temporary files persistent. They should be deleted if not needed.
See Also: wish-list ID 2,10,12
Status: Open as of 21 April 2005

ID: 12
General Area: Priority Queue
Description: Regarding the TerraFlow Priority Queue, Herman Haverkort
suspects that it crashes on low memory limits.
See Also: wish-list ID 2,10,12
Status: Open as of 21 April 2005

ID: 3
General Area: Generalized Streaming
Priority: Medium
Description: A common sorting paradigm is decorate-sort-undecorate or DSU.
We decorate a stream with some keys to sort by, sort the stream, and then
later remove the keys from the stream. The output of TPIE sort is stream
of the same type as the input, i.e., a stream with keys. Removing the keys
requires an additional scan over the stream. However, one could imagine
adding a option in TPIE sort to apply some function object to each item in
the final merge phase when the keys are no longer needed. Because an
external sort is typically equivalent to about three scans, bundling the
undecorate phase and saving a fourth scan is a considerable savings.
Status: Open as of 1 April 2005

ID: 5
General Area: Streams with variable length items
Priority: Medium
Description: AMI_streams currently maintain a stream of fixed sized items,
but it would often be useful to sort/scan variable sized items. A sample
data layout could be that each item could have a fixed sized header
including at a minimum the size of the variable length item. While this
might be difficult to do in a general setting, in most applications I have
encountered, the variable length is bounded, so perhaps one could create a
stream that could support scanning and sorting of variable length items
where the length of the item is bounded.
Status: Open as of 1 April 2005

ID: 8
General Area: Sorting
Description: It should be possible that sort feeds its output to some
arbitrary object that has a write_item type of function. This object
should not be restricted to being a converter or an "undecorator": it may
do complicated stuff with the objects, keep an internal state, and maybe
it does not even output anything in a one-to-one relation with the
objects fed to it.
Status: Open as of 21 April 2005

ID: 16
General Area: Pipelining, BTE
Priority: ?
Description: Support some form of pipelining or FIFO buffers, so output of
one algorithm can be used directly by another. This is likely a large
change and may even require threading, so a frist step could be something
like overloading the first/last scan in merge-sort, a sort of pseudo
pipeling where we allow pre/post-processing of certain applications to be
integrated better
Status: Open as of Dec 2005

/// Resolved Issues

ID: 1
General Area: Sorting
Priority: Very High
Description: TPIE sometimes returns memory errors on sorting.
If the memory manager is set to enforce_memory_limit() the program will
crash. I have an email from Jan 13 2005 describing the problem in detail.
We need to calculate the space used by all "new" calls during the sorting
phase. I have identified several objects whose space is not accounted for.
There are also some circular dependecies in calculating memory usage.
Until these memory calculation is resolved, I suggest setting the memory
manager to warn_memory_limit() or ignore_memory_limit() before sorting.
Files: apm_dh.h
Status: Problem fixed by A.Danner in Summer 2005

ID: 4
General Area: Substreams
Priority: High
Description: Herman Haverkort writes: "when making a substream of an
AMI_STREAM, I get an AMI_stream_base, which is not an AMI_STREAM but
something different. So different, in fact, that AMI_sort does not take it.
So this concept of making substreams is not as transparent as I would like
it to be." This should be addressed somehow.
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
        November 2005 (J. Vahrenhold)

ID: 7
General Area: Streams
Description: From Herman Haverkort "Include a member function that returns
the current position of the read/write pointer. Without such a thing, you
cannot detect end of file except by attempting to read past it. This is
annoying. In the Pfafstetter code, I handled this issue by putting a
wrapper around the AMI stream that counts the number of times an item has
been read---thus probably keeping the same count three times (once in the
wrapper, once as the internal read/write pointer of the stream, and once
in the statistics of the stream)..." Comment from Andrew Danner: "There is
the tell() function, but as Herman notes, it is not defined in
ami_stream_base, but it IS defined in ami_stream_single"
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
        November 2005 (J. Vahrenhold)

ID: 9
General Area: Sorting
Description: Include a progress indicator for sorting. Users don't like it
when the software seems to stall for ten minutes.
Status: Issue resolved by progress_indicator_base and subclasses
        November 2005 (J. Vahrenhold)
ID: 13
General Area: Sorting, Code cleanup
Description: There are three main sort implementations in the TPIE source, but
only one seems to be used. Find out which one it is, document it. Remove
or update older code and get rid of _dh tags if the _dh code is really the
default sort.
Status: Resolved by A. Danner Nov-Dec 2005

ID: 14
General Area: Streams, Code cleanup
Description: Can ami_stream_base and ami_stream_single be merged? Are we
seriously ever going to have another stream implementation other than
ami_stream_single? If not, the code should be merged.
See Also: wish-list ID 7
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
        November 2005 (J. Vahrenhold)

ID: 15
General Area: Memory management/STL
Priority: ?
Description: the TPIE memory manager cannot keep track of memory
allocated in STL correctly. It seems that it can capture allocation but
not deallocation. For example the following piece of code will report a
memory leak.

cout << "Memory available = " << MM_manager.memory_available() << endl;
vector<int> *s = new vector<int>;
delete s;
cout << "Memory available = " << MM_manager.memory_available() << endl;

The real problem is related to this lengthy document:

A very quick summary:
 The STL uses its own memory allocation methods. They could be just
 new/delete or something else. You can also build your own and pass it as
 a template parameter like

 vector<my_data_type, my_memory_allocator> foo

 Looking at GCC 3.3 the vector library uses "new" to allocate space, but
 does not necessarily use "delete" to free space. What a nightmare. I
 don't have any quick solutions in mind. Perhaps we can overload the
 default STL allocator much like we overloaded new/delete, but I doubt
 this will improve performance of the STL.

 If one wants to see some more evidence, take a look at c
 ++/bits/stl_vector.h and c++/bits/stl_construct.h. You'll see a "new"
 but no "delete"

 We need to decide what is the best way to handle this.
 Status: Resolved by setting GLIBCPP_FORCE_NEW and extending
         the memory manager to provide a pause/resume
         mechanism for registering allocation sized
         November 2005 (J. Vahrenhold)

Something went wrong with that request. Please try again.