New issue

Jump to bottom

RFC 2: Add Fiber::ExecutionContext::MultiThreaded #15517

Open

ysbaddaden wants to merge 2 commits into crystal-lang:master from ysbaddaden:feature/execution-context-multithreaded

+556 −2

Contributor

ysbaddaden commented Feb 25, 2025 •

edited

Loading

Introduces the LAST EC scheduler that runs in multiple thread, with work stealing so any thread can resume any runnable fiber in the context (no more starving threads).

Unlike the ST scheduler, the MT scheduler needs to actively park threads since only one thread in the context can run the event loop (no parallel runs).

Having a single event loop for the whole context instead of having one per thread avoids situations where fibers would wait in an event loop but won't be processed because this thread happens to be busy, causing delays. With a single event loop, as soon as a thread is starving it can check the event loop and enqueue runnable fibers, that can be immediately resumed (and stolen).

NOTE: we can start running the specs in this context though they can segfault sometimes. Maybe because of some issues in spec helpers that used to expect fibers not switching, or maybe of issues in the stdlib for the same reason (for example libxml).

~~Kept in draft until #15511 and #15513 are merged.~~

ysbaddaden added kind:feature topic:multithreading labels

ysbaddaden self-assigned this

ysbaddaden mentioned this pull request

Implement RFC 0002: ExecutionContext [EPIC] #15342

Open

ysbaddaden added 2 commits

March 25, 2025 10:21


          Add ExecutionContext::MultiThreaded scheduler

95e4ad0

Introduces the second EC scheduler that runs in multiple threads. Uses
the thread-safe queues (Runnables, GlobalQueue).

Contrary to the ST scheduler, the MT scheduler needs to actively park
the thread in addition to waiting on the event loop, because only one
thread is allowed to run the event loop.


          Add Thread.delay

7bcff33

ysbaddaden force-pushed the feature/execution-context-multithreaded branch from f5e466e to 7bcff33 Compare

March 25, 2025 09:23

ysbaddaden marked this pull request as ready for review

March 25, 2025 09:24

straight-shoota reviewed

View reviewed changes

src/fiber/execution_context/multi_threaded.cr

Comment on lines +43 to +48

+                  # Starts a context with a maximum number of threads. Threads aren't started
+                  # right away, but will be started as needed to increase parallelism up to
+                  # the configured maximum.
+                  def self.new(name : String, size : Range(Nil, Int32)) : self
+                    new(name, 0..size.end, hijack: false)
+                  end

Member

straight-shoota Mar 28, 2025

thought: This overload feels odd because it's semantically equivalent to the next one (or rather a subset), with the only exception of representing the minimum number 0 as nil.
So technically I think this shouldn't need a separate overload, just a value conversion (this would be easy with clamping from #15106).

Merging both would be a bit rough on the type restriction of size.
Although the current overloads don't account for a nilable lower bound (Range(Int32?, Int32)). So I guess it's all a bit of a mess. (this would be easy with bounded free variables such as size : Range(T, Int32) forall T <= Int32?, see #3803).

Member

straight-shoota Mar 28, 2025

Perhaps we could simplify on a single method with size : Range type restriction and manually coalesce nil to 0 on the range's begin?

src/fiber/execution_context/multi_threaded.cr

+                  # Starts a context with a maximum number of threads. Threads aren't started
+                  # right away, but will be started as needed to increase parallelism up to
+                  # the configured maximum.
+                  def self.new(name : String, size : Int32) : self

Member

straight-shoota Mar 28, 2025

suggestion: Rename the parameter to make clear it's the maximum size, not a fixed size.

Suggested change

      
                def self.new(name : String, size : Int32) : self
          
                def self.new(name : String, max_size : Int32) : self

src/fiber/execution_context/multi_threaded.cr

Comment on lines +32 to +33

		protected def self.default(size : Int32) : self
		new("DEFAULT", 1..size, hijack: true)

Member

straight-shoota Mar 28, 2025

suggestion: ditto

Suggested change

      
                protected def self.default(size : Int32) : self
          
                  new("DEFAULT", 1..size, hijack: true)
          
                protected def self.default(max_size : Int32) : self
          
                  new("DEFAULT", 1..max_size, hijack: true)

src/fiber/execution_context/multi_threaded.cr

+                  # right away, but will be started as needed to increase parallelism up to
+                  # the configured maximum.
+                  def self.new(name : String, size : Int32) : self
+                    new(name, 0..size, hijack: false)

Member

straight-shoota Mar 28, 2025

issue: The range coalesce isn't entirely correct. It does not account for exclusive range.
A safe implementation should be trivial with #15106.

src/fiber/execution_context/multi_threaded.cr

Comment on lines +89 to +93

+                  private def start_schedulers(hijack)
+                    @size.end.times do |index|
+                      @schedulers << Scheduler.new(self, "#{@name}-#{index}")
+                    end
+                  end

Member

straight-shoota Mar 28, 2025

suggestion: The hijack parameter doesn't seem to have any purpose in this method. Should we drop it?

src/fiber/execution_context/multi_threaded.cr

Comment on lines +96 to +104

+                    @size.begin.times do |index|
+                      scheduler = @schedulers[index]
+                      if hijack && index == 0
+                        @threads << hijack_current_thread(scheduler, index)
+                      else
+                        @threads << start_thread(scheduler, index)
+                      end
+                    end

Member

straight-shoota Mar 28, 2025

suggestion: Simplify loop logic.

Suggested change

      
                  @size.begin.times do |index|
          
                    scheduler = @schedulers[index]
          
                    if hijack && index == 0
          
                      @threads << hijack_current_thread(scheduler, index)
          
                    else
          
                      @threads << start_thread(scheduler, index)
          
                    end
          
                  end
          
                  offset = 0
          
                  if hijack
          
                    @threads << hijack_current_thread(@schedulers[0], 0)
          
                    offset = 1
          
                  end
          
                  offset.upto(@size.begin) do |index|
          
                    @threads << start_thread(@schedulers[index], index)
          
                  end

src/fiber/execution_context/multi_threaded.cr

Comment on lines +89 to +93

+                  private def start_schedulers(hijack)
+                    @size.end.times do |index|
+                      @schedulers << Scheduler.new(self, "#{@name}-#{index}")
+                    end
+                  end

Member

straight-shoota Mar 28, 2025

question: Why do we initialize all schedulers from the start? We only start the minimal number of threads. And might never actually need all schedulers when the number of threads doesn't reach the max.
Could we lazily initialize?

src/fiber/execution_context/multi_threaded.cr

+                  # Starts a new `Thread` and attaches *scheduler*. Runs the scheduler loop
+                  # directly in the thread's main `Fiber`.
+                  private def start_thread(scheduler, index) : Thread

Member

straight-shoota Mar 28, 2025

suggestion: scheduler is @schedulers[index] in all calls.
Perhaps we could simplify that, passing only the index and accessing @schedulers here.

src/fiber/execution_context/multi_threaded.cr

Comment on lines +154 to +167

+                  # Picks a scheduler at random then iterates all schedulers to try to steal
+                  # fibers from.
+                  protected def steal(& : Scheduler ->) : Nil
+                    return if size == 1
+                    i = @rng.next_int
+                    n = @schedulers.size
+                    n.times do |j|
+                      if scheduler = @schedulers[(i &+ j) % n]?
+                        yield scheduler
+                      end
+                    end
+                  end

Member

straight-shoota Mar 28, 2025

note: This would also yield the current scheduler. It makes no sense to try to steal from yourself.
I noticed this case is handled at the call site, where the current scheduler is directly accessible.

But this makes me wonder whether this method should be an internal helper method in Scheduler instead. It only deals with schedulers and it's only called from Scheduler. So I don't think there's a compelling reason why it should be on ExecutionContext. We can access the ExecutionContext's instance variables from Scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:feature topic:multithreading