Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run-fibers with #:drain #t does not wait for dependent threads #30

Closed
amirouche opened this issue Nov 6, 2019 · 4 comments
Closed

run-fibers with #:drain #t does not wait for dependent threads #30

amirouche opened this issue Nov 6, 2019 · 4 comments

Comments

@amirouche
Copy link
Contributor

When trying to emulate a threadpool for blocking operations (cpu intensive or embedded database like wiredtiger) I hit a bug with #drain #t where run-fibers returns before the thread created with call-with-new-thread has the time the finish its block call in suspend of operations.scm:

(define (suspend)
;; Two cases. If there is a current fiber, then we suspend the
;; current fiber and arrange to restart it when the operation
;; succeeds. Otherwise we block the current thread until the
;; operation succeeds, to allow for communication between fibers
;; and foreign threads.
(if (current-scheduler)
((suspend-current-task
(lambda (sched k)
(define (resume thunk)
(schedule-task sched (lambda () (k thunk))))
(block sched resume))))
(let ((k #f)
(thread (current-thread))
(mutex (make-mutex))
(condvar (make-condition-variable)))
(define (resume thunk)
(cond
((eq? (current-thread) thread)
(set! k thunk))
(else
(call-with-blocked-asyncs
(lambda ()
(lock-mutex mutex)
(set! k thunk)
(signal-condition-variable condvar)
(unlock-mutex mutex))))))
(lock-mutex mutex)
(block #f resume)
(let lp ()
(cond
(k
(unlock-mutex mutex)
(k))
(else
(wait-condition-variable condvar mutex)
(lp)))))))

Since in the thread, there is no reference to the "parent" scheduler, it is not possible to notify that a thread is waiting/blocking for a operation rendez-vous.

When creating the thread in the fiber, the program hits the issue #21.

Here is a test program:

(import (fibers))
(import (fibers channels))

(import (ice-9 threads))


(define mutex (make-mutex))
(define channel (make-channel))

(let loop ((index (- (current-processor-count) 7)))
  (unless (zero? index)
    (call-with-new-thread
     (lambda ()
       (let continue ((message (get-message channel)))
         (let ((thunk (car message))
               (return (cdr message)))
           (let ((out (thunk)))
             (put-message return out)))
         (continue (get-message channel))))
     pk)
    (loop (- index 1))))

(define (fib n)
  (cond
    ((= n 0) 0)
    ((= n 1) 1)
    (else (+ (fib (- n 1)) (fib (- n 2))))))

(define (exec thunk)
  (let ((return (make-channel)))
    (put-message channel (cons thunk return))
    (pk 'getting)
    (get-message return)))

(define (compute)
  (pk 'out (exec (lambda () (fib (expt 2 5))))))

(define (main)
  (let loop ((index 1))
    (unless (zero? index)
      (spawn-fiber (lambda () (compute)))
      (loop (- index 1))))
  (pk 'main-end)
  #;(sleep 10))


(run-fibers main #:parallelism 1 #:drain? #t #:hz 0)
(pk 'program-end)

uncomment (sleep 10) to have see the program complete.

@amirouche
Copy link
Contributor Author

That said, for my usecase, run-fibers does not return, so this issue does not show up.

There is a workaround where one can properly manage the life of the threads created with call-with-new-thread and wait for the count of threads to become zero in the init procedure of run-fiber.

@amirouche
Copy link
Contributor Author

Maybe a parent-thread-scheduler parameter can help to solve the issue.

@wingo
Copy link
Owner

wingo commented Nov 8, 2019

run-fibers #:drain? #t means to run until schedule-work-pending? returns false. scheduler-work-pending? returns true if there are pending timeouts or pending runnable tasks. Neither is the case for fibers waiting on external events; in this case messages on channels. From the point of view of scheduler-work-pending?, this program is undistinguishable from:

(define (main)
  (spawn-fiber (lambda () (get-message channel))))

I think in general it's hard to have a general definition of what it means to be done. scheduler-work-pending? has the advantage that it's a simple and clear definition, but it is not universal. If you need another definition of "done", for example your definition above, you need to arrange to have the initial fiber outlast the tasks you are concerned with.

@amirouche
Copy link
Contributor Author

I think in general it's hard to have a general definition of what it means to be done. scheduler-work-pending?

My idea to workaround the above behavior was the change the definitions of scheduler-work-pending? to with another clause in the and that looks like (zero? (scheduler-waiting-thread scheduler)) where scheduler-waiting-thread is a count of threads that are blocked inside get-message or put-message.

scheduler-work-pending? has the advantage that it's a simple and clear definition, but it is not universal. If you need another definition of "done", for example your definition above, you need to arrange to have the initial fiber outlast the tasks you are concerned with.

That is what I documented in the previous comment.

Thanks for the feedback! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants