Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bindesc: Add initial support for binary descriptors #54464

Merged
merged 8 commits into from Sep 28, 2023

Conversation

yonsch
Copy link
Contributor

@yonsch yonsch commented Feb 5, 2023

Binary Descriptors

Binary Descriptors are strings and integers that describe a binary executable. With a bit of linker trickery, these descriptors are placed at a known location in the binary. This allows a host tool to read these descriptors, even if only a bin file is present. In the future it may allow different images on the same device to read each other's descriptors. This is useful mostly for interaction between an app and a bootloader - a bootloader reading the app's version, for example.

Binary Descriptors were inspired by pico-sdk's binary_info.

Working principles

Binary descriptors are implemented with a TLV (tag, length, value) header linked to a known offset in the binary image. This offset may vary between architectures, but generally the descriptors are linked as close to the beginning of the image as possible. In architectures where the image must begin with a vector table (such as ARM), the descriptors are linked right after the vector table. The reset vector points to the beginning of the text section, which is after the descriptors. In architectures where the image must begin with executable code (e.g. x86), a jump instruction is injected at the beginning of the image, in order to skip over the binary descriptors, which are right after the jump instruction.

A user can define any ID for any use case, but this will most likely be used for version strings and build timestamps. For these typical use cases, IDs are defined in bindesc.h. Also, default declarations for timestamp and version are provided and can be enabled with Kconfig (see bindesc_build_time.c as an example).

Platform support

Binary descriptors have been proven to be pretty portable on various QEMU machines, but more testing needs to be done in this area.

Platform Builds west bindesc works Runs (renode/qemu) Runs (hardware)
ARM Cortex-M ✔️ ✔️ ✔️ ✔️
native_posix ✔️ ✔️ N/A ✔️
RISC-V ✔️ ✔️ ✔️
ARC ✔️ ✔️ ✔️
x86 ✔️ ✔️ ✔️

C standard support

Standard Supported
c99 ✔️
c11 ✔️
c17 ✔️
gnu99 ✔️
gnu11 ✔️
gnu17 ✔️
c++17 ✔️

Toolchain support

Toolchain Supported
zephyr-sdk (GCC) ✔️

Signed-off-by: Yonatan Schachter yonatan.schachter@gmail.com

@Laczen
Copy link
Collaborator

Laczen commented Feb 6, 2023

@yonsch, I like the idea.

Regarding the format of the binary info entries maybe this could be reworked to a tag, length, value format as this would allow whatever binary data to be stored.

I am also interested in any remarks regarding portability.

@yonsch
Copy link
Contributor Author

yonsch commented Feb 6, 2023

UPDATE: I've created another sample that reads its own descriptors and ran it on QEMU (I don't know of any way to run both a bootloader and an app on the same emulation). I could verify that this works for ARM, RISC-V (32 and 64 bit), x86 and ARC.
For x86 where execution starts at the beginning of the executable, a jump instruction is added that skips over the binary descriptors.

Copy link
Member

@attie-argentum attie-argentum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for your effort!

The data can be either an integer or a pointer to a string somewhere in RODATA.

I'd prefer for this to be a contiguous and self-contained block. Once you've found the start, you can walk through the rest of it without pulling data from elsewhere or seeking into the binary. (i.e: no pointers)

Tooling to extract and display this information on a PC would also be incredibly useful.


I'd also be very keen for this to gain an equivalent to GNU's --build-id=sha1 (see #51532, ref), which is a hash over all "interesting" parts of the binary - giving confidence when matching running code to build artifacts, for debugging, etc...

That's non-trivial to get right, so being able to expand into that is fine.

include/zephyr/bin_descriptors.h Outdated Show resolved Hide resolved
include/zephyr/bin_descriptors.h Outdated Show resolved Hide resolved
subsys/bin_descriptors/bin_descriptors.ld Outdated Show resolved Hide resolved
@yonsch
Copy link
Contributor Author

yonsch commented Feb 7, 2023

@attie-argentum Thanks for your comments.

  • I've changed the structure to id/len/data. It introduces some more alignment issues, but I also think it's better (unless this would cause a problem down the road where the header would get too large).

  • A python script for displaying the descriptors was also added. It's very clumsy but I'll make it better when we're settled on the header format. I hope to have this as a west extension, and in the future to connect it with a runner so that it would be possible to tell what's currently flashed on a device.

  • A function for searching a specific descriptor was added.

  • Regarding the hash - is that a hash over the entire image? I don't think it's possible to calculate this at link time, so perhaps we can fill it with zeros, and then have a post build script replace it with the correct hash.

@ddavidebor
Copy link

Love this idea!
If you end up merging it, please consider adding documentation that explains how to sneak in these Binary Descriptors from environment variables.

@yonsch
Copy link
Contributor Author

yonsch commented Feb 11, 2023

@ddavidebor thanks, I will add a more thorough documentation once the general idea is approved. I would say that environment variables should make their way into the image through cmake: zephyr_compile_definitions(VAR="$ENV{VAR}") and then DECLARE_BIN_DESC_STRING(name, VAR).
Were you thinking of any particular variable? I might add it to the common IDs...

@ddavidebor
Copy link

@yonsch

Yeah, some useful variables could be

  • toolchain version
  • some sort of computer ID
  • GitHub commit hash
  • CI build number

@JonathanVanWin
Copy link
Contributor

Interesting concept @yonsch .
We also use something similar to this at my place, exactly for storing the compiled version of the binary and other stuff.
How much place do we allocate for this IDs?
What if we need more place, can we change how much memory is allocated for this?

@attie-argentum
Copy link
Member

attie-argentum commented Feb 13, 2023

@yonsch

Regarding the hash - is that a hash over the entire image?

Yes, the "functional" pieces.

I don't think it's possible to calculate this at link time, so perhaps we can fill it with zeros, and then have a post build script replace it with the correct hash.

I'm not sure how exactly ld calculates the "Build ID", but it's an approach that would be worth reproducing. Zeroes with binary patch sounds plausible, but a little risky, especially if you're acting on a single file (rather than different filenames for pre/post patch).

@ddavidebor

GitHub commit hash

It would also be handy to have the "application" repository's commit ID too... and perhaps clean/dirty flags for the sources.


I include the following JSON BLOB (or similar) in my applications - put together using a script, and with tools to extract the info from a binary/ihex. Some of the same information is also available on the console output, though not currently parsed from the JSON at runtime.

I tend to use my application's repository as the "authority", with it's own west.yml, etc... so all sources can be found if built from a clean tree. I think the commit ID and clean/dirty indication of supporting projects is also important.

The release name/date comes from git describe --tag --exact-match --match 'v*.*' --dirty='+' from the application (not Zephyr).

The commit info is the full commit ID, with a + appended for dirty trees.

{
	"product": $product_name,
	"build": {
		"date":        $build_date,
		"gcc":         $build_gcc_version,
		"zephyr-sdk":  $build_zephyr_sdk_version
	},
	"release": {
		"name":        $release_name,
		"date":        $release_date,
	},
	"firmware": {
		"version":     $firmware_version,
		"config":      $firmware_config
	},
	"source": {
		"application": $commit_application,
		"zephyr":      $commit_zephyr,
		"hal/cmsis":   $commit_hal_cmsis,
		"hal/atmel":   $commit_hal_atmel,
		"fs/fatfs":    $commit_fs_fatfs
	}
}
Supporting Notes

For completeness, this is converted into a string of hex values in src/firmware_info.h, as FIRMWARE_JSON_BLOB below...

src/firmware_info.c:

static const char firmware_info[]
__attribute__((used))
__attribute__((section(".firmware_info")))
= { /* fwv= */ 0x66, 0x77, 0x76, 0x3d, /* {...} */ FIRMWARE_JSON_BLOB, /* '\0' */ 0x00 };

src/firmware_info.ld:

/* firmware JSON info */
KEEP(*(.firmware_info))

CMakeLists.txt:

zephyr_linker_sources(ROM_START SORT_KEY z_firmware_info src/firmware_info.ld)

@yonsch
Copy link
Contributor Author

yonsch commented Feb 14, 2023

@attie-argentum

Zeroes with binary patch sounds plausible, but a little risky, especially if you're acting on a single file (rather than different filenames for pre/post patch).

Maybe this can be the hash of one of the intermediate ELFs or some of their content. Then we can generate the hash as a *.c or *.o file and link it into the final binary.

Thanks for the other ideas, I'll add IDs for them

@yonsch
Copy link
Contributor Author

yonsch commented Feb 14, 2023

@JonathanVanWin

How much place do we allocate for this IDs?
What if we need more place, can we change how much memory is allocated for this?

It doesn't allocate any space for the data in advance. The data is linked into the binary. As long as the architecture can jump over that section (through a jump instruction or reset vector) it would fit.

@yonsch
Copy link
Contributor Author

yonsch commented Feb 20, 2023

Pinging some of the participants of #51532 , if the general idea is approved I'll get this finalized for a review:
@JordanYates @nordicjm @ceolin @tejlmand @stephanosio

@nordicjm
Copy link
Collaborator

The underlying storage has quite a few parts in common with a data retention API I am developing, I wonder if they could be merged. I wonder if putting the position into dts would be a better fit also, the position might want to be configurable by someone, e.g. not to be at the beginning of an application but somewhere else, or they might want multiple sections at different offsets.

I like the idea!

@Laczen
Copy link
Collaborator

Laczen commented Feb 21, 2023

@yonsch, as you know I like the idea. It could be used by the build system to find out image properties when combining images.

However as I was thinking into using the feature I stumbled on a problem. To allow getting for example the version of the bootloader the properties are placed behind the vector table (on arm). When using this with encrypted images this means that the properties would become encrypted and are no longer readable by external programs that don't have an idea of the encryption key. There is a solution to this by placing the properties before the vector table (so that they can be left unencrypted), but this means that the properties can no longer be used to find out the version of the bootloader (as the bootloader cannot add image data before the vector table). But then again maybe it is not bad to disallow reading from the bootloader al together as this might provide means to discover bootloader secrets.

What is your opinion on placing the binary descriptors before the vector table ?

@attie-argentum
Copy link
Member

attie-argentum commented Feb 21, 2023

What is your opinion on placing the binary descriptors before the vector table ?

@Laczen - I think you'll find that the subject of "before/after vector table" is not directly compatible with other architectures... Placing this information as a block at "the front" of the image (generally speaking) would permit a small amount to be decrypted before locating this information in full.

If you need to inspect details of an encrypted image without the keys, then I'd suggest this becomes a packaging issue - i.e: a disposable header followed by the encrypted payload / firmware image would be a preferable approach... thoughts?

@Laczen
Copy link
Collaborator

Laczen commented Feb 21, 2023

What is your opinion on placing the binary descriptors before the vector table ?

If you need to inspect details of an encrypted image without the keys, then I'd suggest this becomes a packaging issue - i.e: a disposable header followed by the encrypted payload / firmware image would be a preferable approach... thoughts?

@attie-argentum I guess you are right, it would be possible to extract the (limited amount of) needed data from a image and put it in a (disposable) header. It would duplicate the data but when it is extracted from the image it is certain not to be different.

@yonsch
Copy link
Contributor Author

yonsch commented Feb 21, 2023

@nordicjm

I wonder if putting the position into dts would be a better fit also, the position might want to be configurable by someone, e.g. not to be at the beginning of an application but somewhere else, or they might want multiple sections at different offsets.

The problem I see with this is that any app trying to read the descriptors would also need the dts to know where to look for them. If the descriptors are always at the same location you always know where to find them.

@Laczen @attie-argentum
I really haven't thought about encryption, and my knowledge in that area is pretty limited. I suppose that whatever tool is used to encrypt the image skips the first X bytes of the image, as they might include a header and the vector table. In that case, we only have to tell that tool to also skip the descriptors. But again, I've never done this before so I have no idea how hard this might be.

@nordicjm
Copy link
Collaborator

The problem I see with this is that any app trying to read the descriptors would also need the dts to know where to look for them. If the descriptors are always at the same location you always know where to find them.

That's no different than now. You can't build mcuboot and an mcuboot image without that otherwise how does mcuboot know where the image is, what the slot size is, etc.?

@carlescufi
Copy link
Member

carlescufi commented Feb 28, 2023

Architecture WG:

  • @henrikbrixandersen mentions that this functionality is implemented in their downstream, with the objective of being able to find out the mcuboot version from the app
  • @Laczen mentions that it might be a security risk to be able to read the bootloader. So one of the biggest use cases, being able to get which mcuboot image version, is by default insecure because it requires read access to the bootloader. That said, @Laczen mentions that he thinks it's a good proposal

@carlescufi
Copy link
Member

Architecture WG:

  • @henrikbrixandersen mentions that this functionality is implemented in their downstream, with the objective of being able to find out the mcuboot version from the app
  • @Laczen mentions that it might be a security risk to be able to read the bootloader. So one of the biggest use cases, being able to get which mcuboot image version, is by default insecure because it requires read access to the bootloader. That said, @Laczen mentions that he thinks it's a good proposal

@yonsch given the feedback above, I would recommend you take this out of draft.

@nordicjm
Copy link
Collaborator

@yonsch I've created this PR for adding an application version Kconfig option which will work with imgtool signing and might be of use for this too, #55814 can you have a look and see if you think if it would be a good fit?

# done to ensure that the timestamp is always up to date.
add_custom_target(
bindesc_time_force_rebuild
COMMAND ${CMAKE_COMMAND} -U BUILD_TIME_DUMMY ${CMAKE_BINARY_DIR}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up to comment here: #54464 (comment)

if it is purely a matter of re-invoking CMake, then there is no need for the -U, and this proposal should be sufficient:

Suggested change
COMMAND ${CMAKE_COMMAND} -U BUILD_TIME_DUMMY ${CMAKE_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} ${CMAKE_BINARY_DIR}

that said, then i'm generally not fond of build system implementation where the following cannot be achieved when there are no changes to the system:

$ ninja
ninja: no work to do.

but in this case where the behavior is guarded by CONFIG_BINDESC_BUILD_TIME_ALWAYS_REBUILD then it is accepted.

Add an ARCH_SUPPORTS_ROM_START kconfig symbol to mark architectures
that support ROM_START as an argument to zephyr_linker_sources.
This was added so that features relying on this feature could
depend on this kconfig symbol.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add three macros: sys_uint{16,32,64}_to_array, to convert
integers to byte arrays in a byte order aware manner.
For example, sys_uint16_to_array(0x0123) evaluates to:
{0x01, 0x23} for big endian machines, and {0x23, 0x01} for
little endian machines.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add tests for sys_uint*_to_array macros to the byteorder suite.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Binary descriptors are data objects stored at a known location
of a binary image. They can be read by an external tool or image,
and are used mostly for build information: version, build time,
host information, etc.
This commit adds initial support for defining such descriptors.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
tejlmand
tejlmand previously approved these changes Sep 27, 2023
carlescufi
carlescufi previously approved these changes Sep 27, 2023
self.bindesc_gen_tag(self.TYPE_UINT, 0x801): 'APP_VERSION_MAJOR',
self.bindesc_gen_tag(self.TYPE_UINT, 0x802): 'APP_VERSION_MINOR',
self.bindesc_gen_tag(self.TYPE_UINT, 0x803): 'APP_VERSION_PATCHLEVEL',
self.bindesc_gen_tag(self.TYPE_UINT, 0x803): 'APP_VERSION_NUMBER',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*0x804

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice indeed

de-nordic
de-nordic previously approved these changes Sep 27, 2023
Added the bindesc command to west, for working with binary
descriptors. Currently it supports dump, list and search
subcommands, for bin, hex, elf and uf2 file types.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add the hello_bindesc sample which shows the basic usage of
binary descriptors.

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Add documentation for binary descriptors under "OS Services"

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>
Added tests for the bindesc subsystem, testing definition of
binary descriptors on several qemu architectures, and using
several C standards (c99, c11, etc.).

Signed-off-by: Yonatan Schachter <yonatan.schachter@gmail.com>

.. code-block:: bash

west bindesc dump build/zephyr/zephyr.bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have more details in the readme, at least the output of dump...

(note that when testing with native_posix, zephyr.bin does not exist)

@nashif nashif merged commit 859be8c into zephyrproject-rtos:main Sep 28, 2023
43 checks passed
@yonsch
Copy link
Contributor Author

yonsch commented Sep 28, 2023

Thanks everyone!

@yonsch yonsch deleted the bindesc_demo branch September 28, 2023 12:47
Copy link

@attie attie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great - well done all involved, especially @yonsch.

Apologies for missing the merge - my only real comment is about timestamps... can we explicitly use UTC, and ISO 8601 format?

config BINDESC_BUILD_DATE_TIME_STRING_FORMAT
depends on BINDESC_BUILD_DATE_TIME_STRING
string "Date-Time format"
default "%Y/%m/%d %H:%M:%S"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have a preference for ISO 8601 timestamps and explicitly UTC... i.e: %Y-%m-%dT%H:%M:%SZ

Same for BINDESC_BUILD_DATE_STRING_FORMAT (%Y-%m-%d) and BINDESC_BUILD_TIME_STRING_FORMAT (%H:%M:%SZ).


macro(gen_build_time_int_definition def_name format)
if(CONFIG_BINDESC_${def_name})
string(TIMESTAMP ${def_name} ${format})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add UTC?


macro(gen_build_time_str_definition def_name format)
if(CONFIG_BINDESC_${def_name})
string(TIMESTAMP ${def_name} ${${format}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add UTC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Base OS Base OS Library (lib/os) area: Coding Guidelines Coding guidelines and style area: Continuous Integration area: Documentation area: Kernel area: West West utility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet