New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bindesc: Add initial support for binary descriptors #54464
Conversation
@yonsch, I like the idea. Regarding the format of the binary info entries maybe this could be reworked to a tag, length, value format as this would allow whatever binary data to be stored. I am also interested in any remarks regarding portability. |
UPDATE: I've created another sample that reads its own descriptors and ran it on QEMU (I don't know of any way to run both a bootloader and an app on the same emulation). I could verify that this works for ARM, RISC-V (32 and 64 bit), x86 and ARC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks for your effort!
The data can be either an integer or a pointer to a string somewhere in RODATA.
I'd prefer for this to be a contiguous and self-contained block. Once you've found the start, you can walk through the rest of it without pulling data from elsewhere or seeking into the binary. (i.e: no pointers)
Tooling to extract and display this information on a PC would also be incredibly useful.
I'd also be very keen for this to gain an equivalent to GNU's --build-id=sha1
(see #51532, ref), which is a hash over all "interesting" parts of the binary - giving confidence when matching running code to build artifacts, for debugging, etc...
That's non-trivial to get right, so being able to expand into that is fine.
@attie-argentum Thanks for your comments.
|
Love this idea! |
@ddavidebor thanks, I will add a more thorough documentation once the general idea is approved. I would say that environment variables should make their way into the image through cmake: |
Yeah, some useful variables could be
|
Interesting concept @yonsch . |
Yes, the "functional" pieces.
I'm not sure how exactly
It would also be handy to have the "application" repository's commit ID too... and perhaps clean/dirty flags for the sources. I include the following JSON BLOB (or similar) in my applications - put together using a script, and with tools to extract the info from a binary/ihex. Some of the same information is also available on the console output, though not currently parsed from the JSON at runtime. I tend to use my application's repository as the "authority", with it's own The release name/date comes from The commit info is the full commit ID, with a {
"product": $product_name,
"build": {
"date": $build_date,
"gcc": $build_gcc_version,
"zephyr-sdk": $build_zephyr_sdk_version
},
"release": {
"name": $release_name,
"date": $release_date,
},
"firmware": {
"version": $firmware_version,
"config": $firmware_config
},
"source": {
"application": $commit_application,
"zephyr": $commit_zephyr,
"hal/cmsis": $commit_hal_cmsis,
"hal/atmel": $commit_hal_atmel,
"fs/fatfs": $commit_fs_fatfs
}
} Supporting NotesFor completeness, this is converted into a string of hex values in
static const char firmware_info[]
__attribute__((used))
__attribute__((section(".firmware_info")))
= { /* fwv= */ 0x66, 0x77, 0x76, 0x3d, /* {...} */ FIRMWARE_JSON_BLOB, /* '\0' */ 0x00 };
zephyr_linker_sources(ROM_START SORT_KEY z_firmware_info src/firmware_info.ld) |
Maybe this can be the hash of one of the intermediate ELFs or some of their content. Then we can generate the hash as a *.c or *.o file and link it into the final binary. Thanks for the other ideas, I'll add IDs for them |
It doesn't allocate any space for the data in advance. The data is linked into the binary. As long as the architecture can jump over that section (through a jump instruction or reset vector) it would fit. |
Pinging some of the participants of #51532 , if the general idea is approved I'll get this finalized for a review: |
The underlying storage has quite a few parts in common with a data retention API I am developing, I wonder if they could be merged. I wonder if putting the position into dts would be a better fit also, the position might want to be configurable by someone, e.g. not to be at the beginning of an application but somewhere else, or they might want multiple sections at different offsets. I like the idea! |
@yonsch, as you know I like the idea. It could be used by the build system to find out image properties when combining images. However as I was thinking into using the feature I stumbled on a problem. To allow getting for example the version of the bootloader the properties are placed behind the vector table (on arm). When using this with encrypted images this means that the properties would become encrypted and are no longer readable by external programs that don't have an idea of the encryption key. There is a solution to this by placing the properties before the vector table (so that they can be left unencrypted), but this means that the properties can no longer be used to find out the version of the bootloader (as the bootloader cannot add image data before the vector table). But then again maybe it is not bad to disallow reading from the bootloader al together as this might provide means to discover bootloader secrets. What is your opinion on placing the binary descriptors before the vector table ? |
@Laczen - I think you'll find that the subject of "before/after vector table" is not directly compatible with other architectures... Placing this information as a block at "the front" of the image (generally speaking) would permit a small amount to be decrypted before locating this information in full. If you need to inspect details of an encrypted image without the keys, then I'd suggest this becomes a packaging issue - i.e: a disposable header followed by the encrypted payload / firmware image would be a preferable approach... thoughts? |
@attie-argentum I guess you are right, it would be possible to extract the (limited amount of) needed data from a image and put it in a (disposable) header. It would duplicate the data but when it is extracted from the image it is certain not to be different. |
The problem I see with this is that any app trying to read the descriptors would also need the dts to know where to look for them. If the descriptors are always at the same location you always know where to find them. @Laczen @attie-argentum |
That's no different than now. You can't build mcuboot and an mcuboot image without that otherwise how does mcuboot know where the image is, what the slot size is, etc.? |
Architecture WG:
|
@yonsch given the feedback above, I would recommend you take this out of draft. |
1599b25
to
ef3b1a3
Compare
subsys/bindesc/CMakeLists.txt
Outdated
# done to ensure that the timestamp is always up to date. | ||
add_custom_target( | ||
bindesc_time_force_rebuild | ||
COMMAND ${CMAKE_COMMAND} -U BUILD_TIME_DUMMY ${CMAKE_BINARY_DIR} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up to comment here: #54464 (comment)
if it is purely a matter of re-invoking CMake, then there is no need for the -U
, and this proposal should be sufficient:
COMMAND ${CMAKE_COMMAND} -U BUILD_TIME_DUMMY ${CMAKE_BINARY_DIR} | |
COMMAND ${CMAKE_COMMAND} ${CMAKE_BINARY_DIR} |
that said, then i'm generally not fond of build system implementation where the following cannot be achieved when there are no changes to the system:
$ ninja
ninja: no work to do.
but in this case where the behavior is guarded by CONFIG_BINDESC_BUILD_TIME_ALWAYS_REBUILD
then it is accepted.
Add an ARCH_SUPPORTS_ROM_START kconfig symbol to mark architectures that support ROM_START as an argument to zephyr_linker_sources. This was added so that features relying on this feature could depend on this kconfig symbol. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add three macros: sys_uint{16,32,64}_to_array, to convert integers to byte arrays in a byte order aware manner. For example, sys_uint16_to_array(0x0123) evaluates to: {0x01, 0x23} for big endian machines, and {0x23, 0x01} for little endian machines. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add tests for sys_uint*_to_array macros to the byteorder suite. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Binary descriptors are data objects stored at a known location of a binary image. They can be read by an external tool or image, and are used mostly for build information: version, build time, host information, etc. This commit adds initial support for defining such descriptors. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
ef3b1a3
to
18c5688
Compare
scripts/west_commands/bindesc.py
Outdated
self.bindesc_gen_tag(self.TYPE_UINT, 0x801): 'APP_VERSION_MAJOR', | ||
self.bindesc_gen_tag(self.TYPE_UINT, 0x802): 'APP_VERSION_MINOR', | ||
self.bindesc_gen_tag(self.TYPE_UINT, 0x803): 'APP_VERSION_PATCHLEVEL', | ||
self.bindesc_gen_tag(self.TYPE_UINT, 0x803): 'APP_VERSION_NUMBER', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*0x804
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice indeed
Added the bindesc command to west, for working with binary descriptors. Currently it supports dump, list and search subcommands, for bin, hex, elf and uf2 file types. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add the hello_bindesc sample which shows the basic usage of binary descriptors. Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add documentation for binary descriptors under "OS Services" Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Added tests for the bindesc subsystem, testing definition of binary descriptors on several qemu architectures, and using several C standards (c99, c11, etc.). Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
0f9e34a
18c5688
to
0f9e34a
Compare
|
||
.. code-block:: bash | ||
|
||
west bindesc dump build/zephyr/zephyr.bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have more details in the readme, at least the output of dump...
(note that when testing with native_posix, zephyr.bin does not exist)
Thanks everyone! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great - well done all involved, especially @yonsch.
Apologies for missing the merge - my only real comment is about timestamps... can we explicitly use UTC, and ISO 8601 format?
config BINDESC_BUILD_DATE_TIME_STRING_FORMAT | ||
depends on BINDESC_BUILD_DATE_TIME_STRING | ||
string "Date-Time format" | ||
default "%Y/%m/%d %H:%M:%S" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have a preference for ISO 8601 timestamps and explicitly UTC... i.e: %Y-%m-%dT%H:%M:%SZ
Same for BINDESC_BUILD_DATE_STRING_FORMAT
(%Y-%m-%d
) and BINDESC_BUILD_TIME_STRING_FORMAT
(%H:%M:%SZ
).
|
||
macro(gen_build_time_int_definition def_name format) | ||
if(CONFIG_BINDESC_${def_name}) | ||
string(TIMESTAMP ${def_name} ${format}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add UTC
?
|
||
macro(gen_build_time_str_definition def_name format) | ||
if(CONFIG_BINDESC_${def_name}) | ||
string(TIMESTAMP ${def_name} ${${format}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add UTC
?
Binary Descriptors
Binary Descriptors are strings and integers that describe a binary executable. With a bit of linker trickery, these descriptors are placed at a known location in the binary. This allows a host tool to read these descriptors, even if only a
bin
file is present. In the future it may allow different images on the same device to read each other's descriptors. This is useful mostly for interaction between an app and a bootloader - a bootloader reading the app's version, for example.Binary Descriptors were inspired by
pico-sdk
's binary_info.Working principles
Binary descriptors are implemented with a TLV (tag, length, value) header linked to a known offset in the binary image. This offset may vary between architectures, but generally the descriptors are linked as close to the beginning of the image as possible. In architectures where the image must begin with a vector table (such as ARM), the descriptors are linked right after the vector table. The reset vector points to the beginning of the text section, which is after the descriptors. In architectures where the image must begin with executable code (e.g. x86), a jump instruction is injected at the beginning of the image, in order to skip over the binary descriptors, which are right after the jump instruction.
A user can define any ID for any use case, but this will most likely be used for version strings and build timestamps. For these typical use cases, IDs are defined in
bindesc.h
. Also, default declarations for timestamp and version are provided and can be enabled with Kconfig (seebindesc_build_time.c
as an example).Platform support
Binary descriptors have been proven to be pretty portable on various QEMU machines, but more testing needs to be done in this area.
C standard support
Toolchain support
Signed-off-by: Yonatan Schachter yonatan.schachter@gmail.com