-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Copy link
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
DataFusion performs CPU bound work within async closures. This causes issues if running IO on the same async runtime, as the cooperative nature of such schedulers allows the CPU bound work to starve servicing of IO. This leads to errors such as apache/arrow-rs-object-store#272.
Describe the solution you'd like
I think at the very least this needs to be better documented, I couldn't find any mention of this in the DataFusion documentation following a cursory search.
I also think more holistic approach would be valuable to this, as it stands the use of async within DataFusion acts as a massive footgun that encourages users to intermix IO and CPU work in a way that is at best inefficient, but this can be tracked as a separate follow on task.
Describe alternatives you've considered
No response
Additional context
No response
Omega359, andygrove, goldmedal, rohitrastogi and austin362667
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Milestone
Relationships
Development
Select code repository
- Add example for using a separate threadpool for CPU bound workapache/datafusion
- Alternate example of using two thread pools to run DataFusion IO and CPU operationsapache/datafusion
- Example for using a separate threadpool for CPU bound work (try 2)apache/datafusion
- Example for using a separate threadpool for CPU bound work (try 3)apache/datafusion
Activity
error decoding response body
after upgrade to object store 0.10 apache/arrow-rs-object-store#272alamb commentedon Sep 9, 2024
I recommend two things:
[-]Document DataFusion Threading[/-][+]Document DataFusion Threading (and how to separate IO and CPU bound work)[/+]ozankabak commentedon Oct 7, 2024
I think it'd be great to have a good documentation on this.
alamb commentedon Oct 25, 2024
100% agree -- @itsjunetime and @tustvold are working on a bit of it in apache/arrow-rs#6612. I'll try and help with the documentation as well
[-]Document DataFusion Threading (and how to separate IO and CPU bound work)[/-][+]Document DataFusion Threading / tokio runtimes (how to separate IO and CPU bound work)[/+]alamb commentedon Nov 16, 2024
Documentation
I hope to work on the example a bit more shortly
23 remaining items