`TMBitset` checklist

**Benchmark:**
* [x] `Route::clinched_by_traveler`: branching vs indirection
  *Branching is faster.*
* [x] Graph generation with different `unit`s
  *Bigger is faster.*
* [x] Explicitly inline `matching_vertices_and_edges` (Change name? It finds travelers too)
  *Slightly faster on all machines except lab2. (Noise?)*
* [x] Try branchless variant of `eaa6`:
  ```c++
  for (size_t index = 0; index < traveler_lists.size(); ++index)
  	code[index/4] += clinched_by(traveler_lists[index]) << index%4;
  ```
  *Branching is faster.*
* [x] Revert `eaa6`; apply `4e05` directly to prev commit
  For *Computing Stats*, `96e9`...
  • is THE top performer so far for lab1, lab3, lab4.
  • outperforms `4e05` on BiggaTomato.
  • on lab2 lags behind top performer `4e05` by only 3.8 ms. Just noise? A very tight race here.
* [x] `5c76` inconclusive, but does appear slightly slower for userlogs, very consistently. Try:
  * [x] comparing against `6922` for a more apples-to-apples comparison.
  * [x] comparing against the final selected commit once all the dust settles (see `4a64`).
  * [x] Out of curiosity, is there a diff if I build with GCC instead of clang? **Yes.**
* [x] What if `traveler_lists` is nixed altogether?
  As we iterate thru `traveler_set`, just keep a count instead of creating a vector.
  The trade-off is doing one more iteration thru `traveler_set` at the end of the traveled graph for traveler names.
  Is this outweighed by not having to construct the vector and do a modest number of allocations/reallocations?
  **Preliminary results:** helps on BiggaTomato; hurts on lab1. Be interesting to see results on bsdlab, at higher thread counts, and after doing more work on the RAM bandwidth bottleneck.
  **Final:** No speed advantage. Leaving as-is. **Maybe re-examine** after more RAM bandwidth improvements are implemented.

**Try out:**
* [x] [What happens to traveled graphs if a devel system comes before concurrent a/p in systems.csv?](https://github.com/TravelMapping/DataProcessing/issues/588)
* [x] How much does initial `HGEdge` construction slow down if I force an active/preview canonical `HighwaySegment`?
  *\~ 0.01 - 0.02 s on BiggaTomato -- 0.9 - 1.9 % more time.*
* [x] Simple iteration optimization (with different `unit`s)
  The trade-off: Larger units = skip back farther but less often.
  * 64-bit is clearly suboptimal.
  * 8, 16 & 32-bit all very close to margin of error. Maybe a slight edge for 32-bit; makes sense to use it anyway because good `|=` performance.
* [x] Complex iteration optimization
  * 8-bit `f749` underperforms 8-32-bit simple iteration on lab{1..4}; slight lead on BiggaTomato.
  * 16-bit `ec4e` is 1st place for BiggaTomato & lab1; underperforms `f749` & even 64-bit `f8cf` on lab2.
  Other machines **TBD**.
    * [x] `[bits >> 1]` solution
    * [x] ternaries
      Both underperform `ec4e` on BiggaTomato & lab1. Successively better on lab2 though, with ternaries 1st place overall & `[bits >> 1]` falling between 16 & 32-bit SIO2. Other machines **TBD**.
  * [x] How much time does the 15-bit lookup table take to construct? Does this outweigh the advantage of using 16-bit ComItOpt?
    * About 0.3 ms on BiggaTomato.
    * N/A -- there *isn't* an advantage to using 16-bit ComItOpt.
  * ~LOL what if I `constexpr` the damn thing by brute force?~ Never mind. Nothing to be gained by doing this.
  * [x] 32k is 1/16 of 512k. Will `ec4e` perform poorly on Epoch? **No.** Performs well; outperforms SIO2. Similar to BiggaTomato.
* [x] Pointer punning `|=`
  Try simplified versions for larger units:
  * [x] 64-32-16-bit
  * [x] 64-16-bit
  * [x] 64-32-bit
* [x] Replace `TravelerList::traveler_num` with one
  `unsigned int* traveler_num = new unsigned int[TravelerList::allusers.size()];` per thread.
  Index via `for (TravelerList *t : traveler_lists) traveler_nums[t-TravelerList::allusers.data()] = travnum++;`
* [x] ~A variant of `eaa6` with TMBitset `[]` operator?~ LOLNOPE. Different vectors (everything vs subset); indices don't match up.

**Kinda both:**
* ~Init TravelerLists at beginning; init segments with size not capacity~
* [x] ~Or better yet, init segments with `TravelerList::ids.size()`~
  Segments set via TMArray::size, set via TravelerList::ids.size()
* [x] Timestamp for first task, with & without. Python Too?
* [x] `TMArray<TravelerList>` means threaded construction in place (via placement new) without separate read_list function.
* #251

**Clean up:**
* [x] [fix comment](https://github.com/TravelMapping/DataProcessing/blob/80f27243bf26579c53c6c7a215a5aeb22347317a/siteupdate/cplusplus/classes/TravelerList/TravelerList.h#L43-L44)
* [x] Constexpr the static edge format constants
* [x] Template specialization
* [x] As of `ecff`, cmath no longer explicitly needed in HighwayGraph.cpp
* [x] Extraneous `else` after `continue`; parens can be clarified https://github.com/yakra/DataProcessing/blob/80f27243bf26579c53c6c7a215a5aeb22347317a/siteupdate/cplusplus/classes/GraphGeneration/HighwayGraph.cpp#L112-L116
* [x] extraneous `visibility == 1` check just before that
* [x]  (**`4db4`**) `maxbits` less useful in `SimItOpt2`. Delete it; replace with `8*sizeof(unit)`; let `-1` and `+1` cancel out.
* ~`bits` should be `unsigned char` in `pun` branch. Dumb luck that it ran without errors. Fixed for `ComItOpt`.~
  Never mind. Using a branch that retains `unit` instead.
* [x] `uint8_t`etc.
* [ ] Explicit instantiation?
* [x] Inlining MV&E (`2d1b`) means `#include "../../templates/TMBitset.cpp"` not needed in HighwayGraph.h. Can lose the include guard too LOL. :cowboy_hat_face:
* [ ] `segments` SQL table: only iterate `clinched_by` for active/preview systems
* [x] (**`c8b9`**) Check for diffs due to constant folding: `8/sizeof(unit)`
* [x] Comment for `!=` and `|=` operators
* [x] Review TMBitset variable names
* [x] Cast `(unit)1` before `<<`, lest the unsigned long bug make a reappearance. See `f8cf` on `SimItOpt2` branch.
* [x] (**`4db4`**) Switch: are additions in `for` loop reordered? Make it look pretty!
* [x] nix `()` and `add_value` until needed

	else if ((w->vertex->incident_c_edges.front() == w->vertex->incident_t_edges.front()
	&& w->vertex->incident_c_edges.back() == w->vertex->incident_t_edges.back())
	\|\| (w->vertex->incident_c_edges.front() == w->vertex->incident_t_edges.back()
	&& w->vertex->incident_c_edges.back() == w->vertex->incident_t_edges.front()))
	new HGEdge(w->vertex, HGEdge::collapsed \| HGEdge::traveled);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`TMBitset` checklist #245

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

TMBitset checklist #245

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`TMBitset` checklist #245