Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: allow threading backend to be replaced by caller #2255

Closed
wants to merge 3 commits into from

Conversation

stevengj
Copy link
Contributor

@stevengj stevengj commented Sep 12, 2019

This PR adds a new function openblas_set_threads_callback that allows the caller to register a callback in order to change the threading backend at runtime — instead of spawning its own threads and passing work to them, OpenBLAS will pass an array of work (an array of blas_queue_t) to the callback, which can execute the work in serial or in parallel using whatever mechanism it wants.

Motivation: this allows OpenBLAS threading to be composable with caller multithreading, instead of having OpenBLAS and the caller fight over the same CPUs. In particular, this enables OpenBLAS threading to be composable with:

Because registering the callback is done at runtime, people wanting to use this feature won't need to install a separate compiled OpenBLAS binary, but can instead use an existing threaded binary.

The design and implementation is very similar to the new pluggable threading backend for FFTW (FFTW/fftw3#175) and Blosc (Blosc/c-blosc2#81), which worked out very well for us. My expectation is that the code additions will be fairly minimal—a few dozen lines plus some trivial refactoring—with a practical overhead of a single if statement in the exec_blas function for people not using this feature.

This PR is an early work-in-progress, but I wanted to get some early feedback. To do:

  • Testing
  • Add to pthreads backend (blas_server.c)
  • Add to win32 backend (blas_server_win32.c)
  • Documentation

@stevengj
Copy link
Contributor Author

stevengj commented Sep 12, 2019

In particular, I'm thinking the prototypical user callback could look like:

void callback(void *callback_data, int sync, void (*dojob)(int, void *, void *), int numjobs, size_t jobdata_elsize, void *jobdata, void *dojob_data) {
    int i;
    for (i = 0; i < numjobs; ++i) // in parallel
        dojob(THREADID, ((char *) jobdata) + ((unsigned) i)*jobdata_elsize, dojob_data);
    if (sync)
     // wait for dojob calls to complete
}

where the for loop is executed in parallel in some way determined by the caller, and THREADID is some caller code for determining the current thread number (0 to nthreads-1, where nthreads was the value passed to omp_set_num_threads).

@stevengj
Copy link
Contributor Author

(Note that this will also allow people using OpenMP to link to a pthreads OpenBLAS binary without recompiling OpenBLAS — with a 10-line callback using #pragma omp parallel for, they can tell the pthreads OpenBLAS to use their OpenMP threads instead of its own pthreads.)

@martin-frbg
Copy link
Collaborator

That's an intriguing concept... I just hope it does not wake any sleeping bugs.

@StefanKarpinski
Copy link
Contributor

Is further work required to move this forward? Thanks for pushing composable threading, @stevengj!

@martin-frbg
Copy link
Collaborator

IF this is expected to work on Win32, it will need adjustments in blas_server_win32.c similar to those in/for blas_server_omp.c (this is what causes one of the two remaining Travis failures, the other is an unrelated issue with powerpc)

@stevengj
Copy link
Contributor Author

stevengj commented Sep 21, 2019

Is further work required to move this forward?

Yes, a significant amount of work: see un-checked to-do items above. It looks like the most effort is required to adapt blas_server.c, which I'm currently working on now.

I haven't had a chance to work on it for a few days, but I'm hoping to get back to it next week.

@StefanKarpinski
Copy link
Contributor

Right, somehow I missed the checklist at the bottom of your post 😬

@h-vetinari
Copy link
Contributor

@stevengj: I haven't had a chance to work on it for a few days, but I'm hoping to get back to it next week.

Any update on this? :)

@h-vetinari
Copy link
Contributor

@stevengj
18 month later, just checking in where things stand on this... 🙃

@StefanKarpinski
Copy link
Contributor

Would be great to get this done!

@jeremiedbb
Copy link

Looks very interesting. Do you think it would provide a solution to this issue #3187 ?

@martin-frbg
Copy link
Collaborator

rebased for convenience (it is not immediately clear to me how to apply the proposed changes to the non-OMP server files)
2255rebased.txt

@goplanid
Copy link
Contributor

goplanid commented Jul 12, 2023

@stevengj Any update on this? Can you please share your email id, would like to discuss few doubts. Your help is highly appreciated.

@martin-frbg
Copy link
Collaborator

closing as superseded by #4577 - nevertheless an important contribution, and I regret not having been able to advance it at the time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants