Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HighwaySegment::clin_mtx #253

Closed
6 tasks
yakra opened this issue Jul 12, 2023 · 1 comment · Fixed by TravelMapping/DataProcessing#619
Closed
6 tasks

HighwaySegment::clin_mtx #253

yakra opened this issue Jul 12, 2023 · 1 comment · Fixed by TravelMapping/DataProcessing#619

Comments

@yakra
Copy link
Owner

yakra commented Jul 12, 2023

Carved out of #221:

Can the mutex be made more efficient?

Eliminating HighwaySegment::clin_mtx would get sizeof(HighwaySegment) down to 72 B, with potential to get even smaller. This could potentially speed up multiple tasks.
Calls to add_clinched_by could be removed, instead calling TMBitset::add_index directly and locking/unlocking Route::mtx before/after, however's most efficient.

Overall pros & cons:

  • Pro: Smaller sizeof(HighwaySegment).
  • Pro: Less mutex overhead; fewer lock/unlock ops.
  • Con: More mutex contention; multiple travelers could be trying to add the same route even if not the same segment.
  • Con: Indirection.

Route::store_traveled_segments could handle all the segments it needs to with one lock/unlock per call.

  • Con: More contention; mutex also used for making labels in use. Faster to add another?
  • Con: This puts t->clinched_segments.push_back(hs); within the lock. Possible things to do include:
    • Leave it there and accept the performance hit. Is there a net gain?
    • Create an array of bools to track insertion success, iterate it, then push? Prolly not; sounds almost as expensive as just pushing within the lock.
    • Allow redundant data to be added to t->clinched_segments, creating a performance hit when augmenting concurrencies. Either...
      • Iterate again after the lock, pushing everything
      • Have TravelerList::clinched_segments store pairs of waypoints. Push beginning & end once per call.
        • Pro: Less pushing, less reallocation.
        • Meh: Less iteration thru vector is replaced by more iteration from 1st to last segments.
          Slightly more complex overall? For example, single-segment entries with 1st & last segments equal.
        • Con: Potential for redundant data. How much will this bog down concurrency augments?
        • Pro: Less stuff to pull in from memory.

For concurrency augments, multiple successive segments are likely to have the same route due to being added in series by Route::store_traveled_segments. We could...

  • Detect whether the route has changed since previous segment; unlock & lock accordingly.
    • Con: A little overhead in managing variables for that. Branching.
  • If using begin/end pairs as suggested above, optionally skip the check for route change.
    • Pro: We already have groups of segments from one route. Lose the overhead.
    • Con: There will often be multiple non-contiguous entries from same route; we lose whatever advantage that'd bring. Fewer checks necessary -- between groups of, rather than individual, segments.
  • Nope. The source route has all those successive segments. The target segment needs the lock. We don't know what our concurrent route is until iterating the concurrency list. Concurrencies of 3+ routes guarantee different successive routes. A previous route check may still benefit 2-plexes, but there's added complexity. Would it save time?

Or alternately, any of a number of more esoteric, incompletely-thought-out ideas to go mutex-free, involving 2D arrays, a separate threaded task for storing traveler with segments, etc. How about this?

  • For each HighwaySystem Region,
    clinched_segments = new std::vector<std::pair<HighwaySegment*,size_t>>[args::numthreads];
  • Convert TravelerList::clinched_segments to a TMBitset.
    Nope. Not all stored in contiguous memory. Would have to do one of...
    • Use an unordered_set. Yecch.
    • Move all segments into contiguous memory after creation, then set up a TMSlice. Would the benefit be worth the cost?
    • LOL pointers to pointers. Admittedly a half-baked idea. How would I get these, let alone inexpensively?
    • Accept some redundancy. Would it be so bad? There's already redundancy in the .list files, and it's taken care of. Redundant segments comprise only 0.34% of segments .listed. Plus, this opens the door to using begin/end pairs.
  • Get threadnum into Route::store_traveled_segments, however I wanna do it.
  • When insertion succeeds, system->clinched_segments[threadnum].emplace_back(hs, index);
  • New threaded task iterates thru HighwaySystems Regions. For each,
    • Iterate through the vectors in clinched_segments, then thru the pairs in each vector.
      si.first->clinched_by.add_index[si.second];
    • delete[] clinched_segments;
  • Why regions?
    • For initial storage, a segment can be in only one regions & thus only one thread. No mutex needed.
    • For concurrency augments, concurrencies across different systems are commonplace. In theory, a concurrency list should never span multiple regions outside of testing a region split. But it can happen: By accident, like that one Israel/Palestine route that one time. Or in theory, routes along a border like AB/SK or Texarkana. Sure, these cases have the border segments in just one region, but it could happen. For robustness' sake, best to still use a lock. So while we'll still have the overhead, this will minimize contention.

This all comes with its own overhead, and may or may not be faster than whatever the fastest option above is.

For siteupdateST, skip this extra step and do things more or less as done now.


See also TravelMapping#151.

@yakra
Copy link
Owner Author

yakra commented Mar 28, 2024

  • Con: More mutex contention; multiple travelers could be trying to add the same route even if not the same segment.

No noticeable difference. With so many routes in the system, the odds of 2+ travelers trying to add the same one at the same time are already vanishingly small. Performs better.

  • Con: Indirection.

No, scratch that. Same amount.

Still not enough contention to degrade performance.
On the contrary, 1 mutex performs better than 2, whether it's smaller sizeof, just noise in the data, or whatever.
So let's keep the code simple & save RAM.

t->clinched_segments.push_back(hs); yadda yadda

Leaving as is & accepting "the performance hit" is already a big performance boost.
Plus, ConcAugThread & TravelerList::clinched_segments can be removed entirely.
See TravelMapping#151 & 0b6711d on BiggaTomato.


Deleted branch on BiggaTomato

y253 @ e7dad8c, more locks/unlocks, fewer ternaries
Benchmark. Does this perform better or worse?

Appears to perform every-so-slightly worse. Plus, parent had cleaner code.

Undid a few things from parent, 90f08a1:

  • Removed 1 ternary each from beginning & ending chopped route
  • Restored 5 equivalent lines (incl. else block) to forward or backward? block just above.
    Plus mutexes: r1->mtx.lock(); r1->mark_label_in_use(fields[5]); r1->mtx.unlock(); etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant