New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCDIF and DCNANO performance improvements for LVGL #57421
LCDIF and DCNANO performance improvements for LVGL #57421
Conversation
The following west manifest projects have been modified in this Pull Request:
Note: This message is automatically posted and updated by the Manifest GitHub Action. |
ab0dcba
to
387fb0a
Compare
/* Wait for frame send to complete */ | ||
k_sem_take(&dev_data->sem, K_FOREVER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it here at the end will force the thread to always suspend, right? Shouldn't we keep this at the start to let, for example LVGL, continue running?
EDIT: I guess this is needed if there's only a single buffer, which we can't use until the display has synced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and this should still work with a multi-buffer setup. Do you know if LVGL will be blocked by this? Based on the lv_disp_flush_ready
call required when display flushes complete, I assumed that LVGL would not call display_write
from the rendering thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think display_write
will be called from the thread that runs lv_task_handler
. A single thread should be used with LVGL as it explicitly states that it isn't thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
display_write
will be called from the thread that runslv_task_handler
That is a bit concerning- the reason this code currently blocks where it does is that when I did not LVGL would often start writing to the current display buffer while it was still being sent to the display- even in a double buffering situation.
Seems like what LVGL might expect is the user to call lv_disp_flush_ready
from an ISR context, when the frame render completes? I'm not sure how else LVGL could get a notification that the framebuffer is now ready, and not block within the display write callback. I suppose from the Zephyr port, we could submit display render requests to a message queue, and have a separate thread manage those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was bothered by this so I took a look into it- it seems that LVGL may be designed to work like so:
- LVGL calls
flush_cb
, which writes to the display - At this point, the
flushing
field of the draw buffer is set totrue
(Callinglv_disp_flush_ready
will clear this field) - When LVGL refreshes again, if
flushing
is still set, then LVGL will use a callback calledwait_cb
to wait for theflushing
field to be clear
We would probably need a background thread to call display_write
, then call lv_disp_flush_ready
. The background thread could then give to a semaphore. Our wait_cb
would then wait on this semaphore, so that if LVGL attempted to refresh before the buffer was available, it would be blocked.
I think this is probably worth adding to our LVGL display port, and I will likely look at it when I get bandwidth to do so. For the time being though, I think this PR needs to block display_write
the way it currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, maybe create an enhancement issue to keep track of this?
@danieldegrasse module PR merged, please update revision with SHA |
Update LVGL revision with performance tuning options, including the following: - Enable forcing LVGL to perform a full display refresh on every update - Enable custom alignment and linker section location of LVGL framebuffers. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
387fb0a
to
27f0685
Compare
27f0685
to
0394a21
Compare
boards/arm/mimxrt595_evk/board.c
Outdated
|
||
#ifdef CONFIG_LV_Z_VBD_CUSTOM_SECTION | ||
extern char __flexspi2_start[]; | ||
extern char __flexspi2_end[]; | ||
|
||
static int init_lvgl_bufs(void) | ||
{ | ||
/* | ||
* LVGL rendering buffers will be stored in FlexSPI2 linker section. | ||
* Zero out BSS section. | ||
*/ | ||
memset(__flexspi2_start, 0, __flexspi2_end - __flexspi2_start); | ||
return 0; | ||
} | ||
|
||
#endif /* CONFIG_LV_Z_VBD_CUSTOM_SECTION */ | ||
|
||
#if CONFIG_REGULATOR | ||
/* PMIC setup is dependent on the regulator API */ | ||
SYS_INIT(board_config_pmic, POST_KERNEL, CONFIG_APPLICATION_INIT_PRIORITY); | ||
#endif | ||
|
||
#ifdef CONFIG_LV_Z_VBD_CUSTOM_SECTION | ||
/* LVGL buffers should be setup after PSRAM is initialized but before LVGL init */ | ||
SYS_INIT(init_lvgl_bufs, POST_KERNEL, CONFIG_APPLICATION_INIT_PRIORITY); | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not look right to me. There should be no third-party project-specific code in the boards/soc area. It could be dc_ram/dc_ram.ld, which would be used by any display subsystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have renamed the linker file to dc_ram.ld
, and the function call to init_psram_framebufs
Enable the ELCDIF driver to directly write the framebuffer using hardware, when an entire framebuffer update is requested. This will enable better performance for applications that avoid partial display updates. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Update LVGL settings to improve performance with ELCDIF. This update requires changes to the LVGL module, to introduce Kconfig settings for enabling full display refresh, and setting display buffer alignment. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Update LVGL settings to optimize RT1170 performance with LCDIF. The display hardware will perform best when LVGL always uses full refreshes, as this avoids memory copies required with partial display refreshes. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Enable direct framebuffer rendering using DCNANO LCDIF, to improve performance when the call to display_write is attempting to refresh the full display. Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Add FLEXSPI1 and FLEXSPI2 memory sections for RT5xx SOC. These sections can be used by the user's application as part of a custom linker script, if the application needs to relocate a buffer to external SRAM connected to FLEXSPI1 or FLEXSPI2 Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
Configure LVGL to improve performance on the RT595 EVK, with the following changes: - Allocate two rendering buffers for LVGL, both the entire size of the display - Force LVGL full refresh bit - Locate LVGL rendering buffers in external PSRAM (since they will not fit on onboard SRAM) Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
0394a21
to
16e97ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of nitpicks, otherwise lgtm
} | ||
|
||
#if defined(CONFIG_HAS_MCUX_CACHE) && defined(CONFIG_MCUX_DCNANO_LCDIF_MAINTAIN_CACHE) | ||
CACHE64_InvalidateCacheByRange((uint32_t) data->fb[next_fb_idx], | ||
CACHE64_CleanCacheByRange((uint32_t) data->active_fb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: can't cache.h API be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future it should be, yes. Currently CONFIG_CACHE_MANAGEMENT
is not enabled for the RT5xx platform this IP block is present on, as we do not have a cache API driver for the CACHE64 peripheral
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think we should standardize this part in the near future as we now have APIs, cc: @carlocaione
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree- I'm going to create an issue to highlight this, because this is something that we've also discussed internally and the change needs to be made
Edit: created as an enhancement: #58324. Can always be moved to a bug if this is something that we think needs to be addressed before the release
ELCDIF_EnableInterrupts(config->base, | ||
kELCDIF_CurFrameDoneInterruptEnable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to use Zephyr IRQ API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRQ enable is called down here: https://github.com/nxp-zephyr/zephyr/blob/feature/lcdif_performance/drivers/display/display_mcux_elcdif.c#L341. This call enables the peripheral itself to generate the interrupt signal for the NVIC, and has no Zephyr IRQ API equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks for the details
This PR includes the following updates:
display_write
call directly to hardware when possible (ie when the buffer is the same size as the display). This enables the calling application to avoid the expensive memory copies associated with partial display updates by always callingdisplay_write
with a buffer the same size as the connected displayThis results in the following performance improvements in the weighted FPS of the LVGL benchmark sample:
RT595:
3 FPS -> 7 FPS
RT1170:
3 FPS -> 34 FPS
RT1060:
18 FPS -> 65 FPS