display: enable offloading render process to background thread #40

danieldegrasse · 2023-06-12T15:23:40Z

Description of the feature or fix

Enable offloading of display_write call to background thread for color displays. This feature is opt-in, as it may offer significant performance gains for every display pipeline.

When enabled display_write and lv_disp_flush_ready will be called from a background thread. This means that while the display driver waits on the hardware to render the framebuffer, the LVGL rendering thread will not be blocked.

zephyr/Kconfig

jakkra · 2023-07-02T00:54:29Z

Hi @danieldegrasse I have tried this one out but ran into some issues. How are you calling lv_task_handler in your testing code (what thread and prio)?

My issue is that I'm running lv_task_handler from them system workqueue and also using lvgl only from the system workqueue (LVGL is not thread safe). For the proposed solution here the way I understand at least is that lv_task_handler has to run from a preemptive thread, which will be preempted directly after

lvgl/zephyr/lvgl_display.c

Line 108 in 453eedd

k_msgq_put(&flush_queue, request, K_FOREVER);

by lvgl_flush_thread_entry otherwise what I see there will only be a ping-pong between the thread running lv_task_handler and the new lvgl_flush_thread_entry without any gain from double buffering (same FPS as with single buffer).
The way I "fixed" it was by adding a k_yield(); after k_msgq_put to let lvgl_flush_thread_entry run before rendering into next buffer, and got about 50% more FPS with this PR.

Another thing I thought about that may be a possible issue (please correct me if I'm wrong):
Let's say the Display driver uses SPI, for many SPI displays you need to send more than one SPI transaction to write. For example: https://github.com/zephyrproject-rtos/zephyr/blob/f6eeb9dc84bec63dea231a49e97992875b659614/drivers/display/display_ili9xxx.c#L111C1-L111C1
Often first setting X, Y and then after that sending the bytes (including seperate SPI transactions for command as Zephyr does not support HW control of DCX pin). So there may be 5+ SPI transactions before the large chunk of data is actually sent.
My concern here is that the display_write may when waiting for one of those small SPI transactions to finish get "preempted" (while waiting for SPI finish mutex) and we may get back to rendering into the other buffer before the big chunk of data is actually being transmitted. And we loose the purpose of double buffering again.

danieldegrasse · 2023-07-03T16:45:57Z

Hi @danieldegrasse I have tried this one out but ran into some issues. How are you calling lv_task_handler in your testing code (what thread and prio)?

Currently from the main thread, which should use priority 0. I am testing this using the LVGL benchmarking and widgets samples on an RT1060 EVK. I see what you mean that k_yield is need when using a cooperative thread, but as I mention below, I think this feature will only work correctly if the thread running LVGL rendering is preemptable. Do you think adding documentation of that within the Kconfig help strings would be useful?

My concern here is that the display_write may when waiting for one of those small SPI transactions to finish get "preempted" (while waiting for SPI finish mutex) and we may get back to rendering into the other buffer before the big chunk of data is actually being transmitted. And we loose the purpose of double buffering again.

display_write being preempted is ok, and frankly desirable (as we want LVGL to start rending into the second framebuffer while the first is written to the display) If using cooperative threads, the LVGL rending thread may run the entire render before the framebuffer can be streamed out. I don't see a great solution for this besides allowing the thread running LVGL rendering to be preemptable, and running the display_write thread at a higher priority. For your use case, could you use a workqueue running from a thread with a set priority or set CONFIG_SYSTEM_WORKQUEUE_PRIORITY to be a preemptable value?

jakkra · 2023-07-03T20:47:02Z

Do you think adding documentation of that within the Kconfig help strings would be useful?

Yes I think this should be documented that lvgl rendering has to run in a preemtable thread and that LV_Z_FLUSH_THREAD_PRIO need to have a higher priority. Otherwise this feature will not work and people enabling will will be confused.

For your use case, could you use a workqueue running from a thread with a set priority or set CONFIG_SYSTEM_WORKQUEUE_PRIORITY to be a preemptable value?

Unfortunately due to some kconfigs such as BT and IPC my CONFIG_SYSTEM_WORKQUEUE_PRIORITY has to be cooperative priority.
And yes I can run rendering from a separate workqueue thread with preemtable prio and this works, althought it's very handy (saves lot's of extra code) to be able to use LVGL from system workqueue context as many subsystems and drivers etc. will have their callbacks in system workqueue context already and the system workqueue is available from anywhere. Ignoring the extra "wasted" ~3K RAM needed for stack for this new workqueue thread to do rendering.

For now I think I'll just patch in the k_yield in my application manually, unless you think this would be something that makes sense to add a Kconfig entry for, but maybe it's a bit "hackish".

Enable offloading of display_write call to background thread for color displays. This feature is opt-in, as it may offer significant performance gains for every display pipeline. When enabled display_write and lv_disp_flush_ready will be called from a background thread. This means that while the display driver waits on the hardware to render the framebuffer, the LVGL rendering thread will not be blocked. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>

danieldegrasse · 2023-07-06T17:23:11Z

Yes I think this should be documented that lvgl rendering has to run in a preemtable thread and that LV_Z_FLUSH_THREAD_PRIO need to have a higher priority. Otherwise this feature will not work and people enabling will will be confused.

I've added some documentation to the latest push highlighting this, thanks for pointing it out.

For now I think I'll just patch in the k_yield in my application manually, unless you think this would be something that makes sense to add a Kconfig entry for, but maybe it's a bit "hackish".

Truthfully, I think that adding a k_yield is harmless (as the preemptable thread would have already been forced to yield if another thread of higher priority could run). I'll add in to this PR, since it seems you still see some performance gains

danieldegrasse · 2023-07-12T12:43:33Z

@carlescufi are you aware of anyone else who can review this PR?

faxe1008 · 2023-07-17T09:17:51Z

I tried running the stress demo on a RT1060-EVKB with an RK043FN66HS-CTG, using CONFIG_LV_USE_PERF_MONITOR. Changing the display refresh period to 10ms without the change I observe roughly 50-65 FPS. With the flush thread I am observing ~70-100 FPS. For anyone also testing, you'd need to adjust LV_DISP_DEF_REFR_PERIOD.

jakkra · 2023-08-14T22:19:39Z

This PR is such an amazing improvement for FPS, It would be very nice if it could get merged, either before (would be easier) or after #45 (if after I guess this PR needs to move to main repo).

Since nothing have happened for a month, maybe we could ping it in discord "pr-help" channel to get more reviewers if that is needed?

fabiobaltieri · 2023-08-15T09:09:59Z

@danieldegrasse I just merged #45, let's get #61300 in then you should be able to just move this over and we should not have PR visibility issues anymore on this.

I'll do a swipe of the lvgl open PRs as well after that.

fabiobaltieri · 2023-08-16T09:17:49Z

zephyrproject-rtos/zephyr#61300 is in, feel free to move this upstream.

danieldegrasse · 2023-08-16T14:14:33Z

Closing this, as the changes will be integrated into Zephyr modules gluecode

danieldegrasse requested review from pdgendt and jfischer-no June 12, 2023 15:23

zephyrbot mentioned this pull request Jun 12, 2023

feature: LVGL display flush offload zephyrproject-rtos/zephyr#59166

Merged

danieldegrasse force-pushed the feature/offload-render branch from f0a45fe to 453eedd Compare June 12, 2023 20:36

pdgendt approved these changes Jun 13, 2023

View reviewed changes

pdgendt reviewed Jun 13, 2023

View reviewed changes

zephyr/Kconfig Show resolved Hide resolved

danieldegrasse force-pushed the feature/offload-render branch from 453eedd to 7b6acfd Compare July 6, 2023 17:22

jfischer-no approved these changes Jul 6, 2023

View reviewed changes

danieldegrasse closed this Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

display: enable offloading render process to background thread #40

display: enable offloading render process to background thread #40

danieldegrasse commented Jun 12, 2023

jakkra commented Jul 2, 2023

danieldegrasse commented Jul 3, 2023 •

edited

jakkra commented Jul 3, 2023 •

edited

danieldegrasse commented Jul 6, 2023

danieldegrasse commented Jul 12, 2023

faxe1008 commented Jul 17, 2023 •

edited

jakkra commented Aug 14, 2023

fabiobaltieri commented Aug 15, 2023 •

edited

fabiobaltieri commented Aug 16, 2023

danieldegrasse commented Aug 16, 2023

display: enable offloading render process to background thread #40

display: enable offloading render process to background thread #40

Conversation

danieldegrasse commented Jun 12, 2023

Description of the feature or fix

jakkra commented Jul 2, 2023

danieldegrasse commented Jul 3, 2023 • edited

jakkra commented Jul 3, 2023 • edited

danieldegrasse commented Jul 6, 2023

danieldegrasse commented Jul 12, 2023

faxe1008 commented Jul 17, 2023 • edited

jakkra commented Aug 14, 2023

fabiobaltieri commented Aug 15, 2023 • edited

fabiobaltieri commented Aug 16, 2023

danieldegrasse commented Aug 16, 2023

danieldegrasse commented Jul 3, 2023 •

edited

jakkra commented Jul 3, 2023 •

edited

faxe1008 commented Jul 17, 2023 •

edited

fabiobaltieri commented Aug 15, 2023 •

edited