Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New hostfxr hosting API design - discussion #112984

Open
ivanpovazan opened this issue Feb 27, 2025 · 28 comments
Open

New hostfxr hosting API design - discussion #112984

ivanpovazan opened this issue Feb 27, 2025 · 28 comments
Labels
area-Host discussion os-android os-browser Browser variant of arch-wasm os-ios Apple iOS os-wasi Related to WASI variant of arch-wasm
Milestone

Comments

@ivanpovazan
Copy link
Member

Description

It is needless to say that startup performance is critical for mobile applications. As we are exploring the support of CoreCLR for mobile we have identified several inefficiencies with currently available hosting APIs which need to be addressed in order to achieve best performance possible.

One example of such inefficiency is related to how the runtime properties can be passed to the runtime during initialization.
If we consider CoreCLR's private hosting APIs, like coreclr_initialize, runtime configuration properties' values are accepted as UTF8 strings. Which means that pointers need to be converted to UTF8 strings which are internally converted back to pointers. The conversion pointer-string-pointer seems redundant and negatively impacts the performance during startup. Additionally, this conversion could break tagged pointers on ARMv8+.
On the other hand, the public hosting hostfxr APIs support passing runtime configuration properties as a json file, which is also not ideal.

Going forward, we would like to avoid using private hosting APIs and use hostfxr APIs instead.

The goal of this issue is to open up a discussion between our hosting and iOS/Android/WASM experts in order to collect list of requirements specific to each platform and from that design new highly performant hosting API to successfully support CoreCLR on them.

@ivanpovazan ivanpovazan added area-Host discussion os-android os-browser Browser variant of arch-wasm os-ios Apple iOS os-wasi Related to WASI variant of arch-wasm labels Feb 27, 2025
@ivanpovazan ivanpovazan added this to the 10.0.0 milestone Feb 27, 2025
Copy link
Contributor

Tagging subscribers to 'arch-android': @vitek-karas, @simonrozsival, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Tagging subscribers to 'os-ios': @vitek-karas, @kotlarmilos, @ivanpovazan, @steveisok, @akoeplinger
See info in area-owners.md if you want to be subscribed.

@ivanpovazan
Copy link
Member Author

@rolfbjarne
Copy link
Member

Going forward, we would like to avoid using private hosting APIs and use hostfxr APIs instead.

What's wrong with the private hosting APIs?

From https://github.com/dotnet/runtime/blob/main/docs/design/features/native-hosting.md:

hostfxr and hostpolicy APIs [...] These can execute application, determine available SDKs, determine native dependency locations, resolve component dependencies and so on.

That sounds like a massive overkill for what we need it for. We know which SDK to use (it's embedded in the app), we know where all the native dependencies are (also all in the app), etc.

Our startup needs are really rather simple:

  1. First we need to call a specific managed method, which will initialize all our managed logic.
  2. Optionally call the managed Main method.
    • App extensions don't have a managed Main method, the entry point into managed code can be any API we've exposed to Objective-C.

Note that for NativeAOT we don't neither of those two points, because we expose both our initialization method + the managed Main method using [UnmanagedCallersOnly] entry points we just call from native code.

So I think our needs can be stated rather simply: we can figure out a lot at build time, so we need a hosting API that takes advantage of that fact.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2025

Which means that pointers need to be converted to UTF8 strings which are internally converted back to pointers.

This assumes that the source of truth for the property values are UTF16 strings. Is it really the case?

In general, file formats and the Unix APIs are UTF8 that makes UTF8/UTF16 conversions very common. A few more should not be a big deal.

BTW: Our C/C++ implementation of UTF8/UTF16 conversions seems to be leaving perf on the table, but it have not showed up on the radar except for this PR micro-benchmark: #102424 (comment)

Additionally, this conversion could break tagged pointers on ARMv8+.

I do not see a problem here.

list of requirements

My list would be:

@rolfbjarne
Copy link
Member

Which means that pointers need to be converted to UTF8 strings which are internally converted back to pointers.

This assumes that the source of truth for the property values are UTF16 strings. Is it really the case?

No, for some of the properties the source of truth is a pointer - some properties are function pointers.

One example is PINVOKE_OVERRIDE, we do:

char *pinvokeOverride = strdup_printf ("%p", &xamarin_pinvoke_override); // xamarin_pinvoke_override is a C function
coreclr_initialize (..., pinvokeOverride, ...); // very pseudo code here

@rolfbjarne
Copy link
Member

For iOS apps we generate C code, so we can provide TRUSTED_PLATFORM_ASSEMBLIES in pretty much any format.

One idea could be a constant presorted list of constant (UTF8 or UTF16) strings the runtime can do a binary search in for instance.

@steveisok
Copy link
Member

This should be resurrectable. Had a hard time with the hosting tests if I remember correctly.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2025

No, for some of the properties the source of truth is a pointer - some properties are function pointers.
One example is PINVOKE_OVERRIDE, we do:

Ah ok, this one. I disagree that this negatively impacts the performance during startup. The overhead of these roundtrips is miniscule. I agree that this is not pretty, and it is fine to consider it during the design.

For iOS apps we generate C code, so we can provide TRUSTED_PLATFORM_ASSEMBLIES in pretty much any format.
One idea could be a constant presorted list of constant (UTF8 or UTF16) strings the runtime can do a binary search in for instance.

The runtime does not actually need the list. The runtime just needs a callback that returns the payload (file name, pointer to mapped memory, etc.) for the given assembly name.

@grendello
Copy link
Contributor

No, for some of the properties the source of truth is a pointer - some properties are function pointers.
One example is PINVOKE_OVERRIDE, we do:

Ah ok, this one. I disagree that this negatively impacts the performance during startup. The overhead of these roundtrips is miniscule. I agree that this is not pretty, and it is fine to consider it during the design.

Let me describe the process here:

  1. Host converts a pointer to a hexadecimal UTF8 string (or an integer one, more on that later)
  2. coreclr_initialize first must walk over the entire collection of properties (30 at this point for a dotnet new android app), using strcmp over potentially each of them to find the pointer property.
  3. coreclr_initialie converts the string to Unicode - which requires two passes over the string, one to determine the desired destination buffer length and the other to do the actual conversion.
  4. Destination string is malloc-ed
  5. Unicode string is sent to another API which then immediately converts it back to UTF8
  6. The UTF8 string is converted back to an integer by calling strtoull

This process is repeated for up to 3 pointers currently (if the host optimizes it, it's just a single pointer).

None of those steps are necessary, and it's a 100% waste of time. Dismissing such "insignificant" delays is what leads to software that underperforms. It is especially important on mobile to not ignore such time and resource waste.

@grendello
Copy link
Contributor

I fully agree with @rolfbjarne that we need simple, directly callable API and that hoxstfxr is a serious overkill for both iOS and Android apps. We have a huge opportunity to optimize for zero-cost operations for many things here. Properties, for instance, could be pre-converted at build time to Unicode and their values sent to the runtime not as strings, but as concrete (limited) types (e.g. bool, int, float) - other property values could be sent as Unicode strings.

@grendello
Copy link
Contributor

Which means that pointers need to be converted to UTF8 strings which are internally converted back to pointers.

This assumes that the source of truth for the property values are UTF16 strings. Is it really the case?

In general, file formats and the Unix APIs are UTF8 that makes UTF8/UTF16 conversions very common. A few more should not be a big deal.

BTW: Our C/C++ implementation of UTF8/UTF16 conversions seems to be leaving perf on the table, but it have not showed up on the radar except for this PR micro-benchmark: #102424 (comment)

Additionally, this conversion could break tagged pointers on ARMv8+.

I do not see a problem here.

Here's an example of what happens when a tagged pointer is broken:

02-17 20:44:59.545 25657 25657 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
02-17 20:44:59.545 25657 25657 F DEBUG   : Build fingerprint: 'google/shiba/shiba:15/AP4A.250105.002/12701944:user/release-keys'
02-17 20:44:59.545 25657 25657 F DEBUG   : Revision: 'MP1.0'
02-17 20:44:59.545 25657 25657 F DEBUG   : ABI: 'arm64'
02-17 20:44:59.545 25657 25657 F DEBUG   : Timestamp: 2025-02-17 20:44:59.470501659+0100
02-17 20:44:59.545 25657 25657 F DEBUG   : Process uptime: 1s
02-17 20:44:59.545 25657 25657 F DEBUG   : Cmdline: com.xamarin.mauiperftest
02-17 20:44:59.545 25657 25657 F DEBUG   : pid: 25635, tid: 25635, name: in.mauiperftest  >>> com.xamarin.mauiperftest <<<
02-17 20:44:59.545 25657 25657 F DEBUG   : uid: 10297
02-17 20:44:59.545 25657 25657 F DEBUG   : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
02-17 20:44:59.545 25657 25657 F DEBUG   : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
02-17 20:44:59.545 25657 25657 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
02-17 20:44:59.545 25657 25657 F DEBUG   : Abort message: 'Pointer tag for 0x758bc5e7d5 was truncated, see 'https://source.android.com/devices/tech/debug/tagged-pointers'.'
02-17 20:44:59.545 25657 25657 F DEBUG   :     x0  0000000000000000  x1  0000000000006423  x2  0000000000000006  x3  0000007fea9445a0
02-17 20:44:59.545 25657 25657 F DEBUG   :     x4  7260761f34633664  x5  7260761f34633664  x6  7260761f34633664  x7  7f7f7f7f7f7f7f7f
02-17 20:44:59.545 25657 25657 F DEBUG   :     x8  00000000000000f0  x9  34f7a19f88f3c120  x10 0000000000000001  x11 00000078b3e89a80
02-17 20:44:59.545 25657 25657 F DEBUG   :     x12 0000000067b391bb  x13 000000007fffffff  x14 00000000003e02dc  x15 000000cc33ac402a
02-17 20:44:59.545 25657 25657 F DEBUG   :     x16 00000078b3ef3058  x17 00000078b3edd590  x18 00000078dc4fc000  x19 0000000000006423
02-17 20:44:59.545 25657 25657 F DEBUG   :     x20 0000000000006423  x21 00000000ffffffff  x22 0000007fea9451b8  x23 00000078db7e8a80
02-17 20:44:59.545 25657 25657 F DEBUG   :     x24 000000758e994690  x25 0000007fea945300  x26 0000007fea945248  x27 0000000000000001
02-17 20:44:59.545 25657 25657 F DEBUG   :     x28 0000007fea945ca0  x29 0000007fea944620
02-17 20:44:59.545 25657 25657 F DEBUG   :     lr  00000078b3e72098  sp  0000007fea9445a0  pc  00000078b3e720bc  pst 0000000000001000
02-17 20:44:59.545 25657 25657 F DEBUG   : 3 total frames
02-17 20:44:59.545 25657 25657 F DEBUG   : backtrace:
02-17 20:44:59.545 25657 25657 F DEBUG   :       #00 pc 000000000005e0bc  /apex/com.android.runtime/lib64/bionic/libc.so (abort+156) (BuildId: d607b2dd86e0ffc603529ce13afab7fa)
02-17 20:44:59.545 25657 25657 F DEBUG   :       #01 pc 0000000000045508  /apex/com.android.runtime/lib64/bionic/libc.so (free+104) (BuildId: d607b2dd86e0ffc603529ce13afab7fa)
02-17 20:44:59.545 25657 25657 F DEBUG   :       #02 pc 00000000000b800c  /memfd:doublemapper (deleted) (offset 0x161000)
02-17 20:44:59.554  1633 25661 I DropBoxManagerService: add tag=data_app_native_crash isTagEnabled=true flags=0x2

It will become a more pronounced problem as more devices use ARMv8 with tagging extensions or ARMv9. We most certainly cannot ignore it nor can we risk breaking any pointers (via e.g. casting) in any way.

@grendello
Copy link
Contributor

grendello commented Feb 27, 2025

No, for some of the properties the source of truth is a pointer - some properties are function pointers.
One example is PINVOKE_OVERRIDE, we do:

Ah ok, this one. I disagree that this negatively impacts the performance during startup. The overhead of these roundtrips is miniscule. I agree that this is not pretty, and it is fine to consider it during the design.

The overhead of the entire property set (30 properties, 60 strings) conversion costs 15% of the total coreclr_initialize time in my testing, on a fast device (Pixel 8). This cost will be bigger on slower devices. On Pixel 8 it's "just" 4ms, but considering that the entire startup time of a plain Android app using Mono runtime takes 167ms, the 4ms are significant (especially that they are just one of an entire set of equally "small" amounts of time spent doing something that doesn't have to be done or doing it less efficiently that it could be done). We must not, and I can't stress it more, ignore even the smallest inefficiencies.

@grendello
Copy link
Contributor

My list would be:

* Allow efficient zero-copy loading of both IL and native code. The scheme implemented in [Allow R2R for images provided via external_assembly_probe #112934](https://github.com/dotnet/runtime/pull/112934) ends up copying all native code. It is inefficient (megabytes of memory copied), and it won't work for platforms that prohibit dynamic code generation.

100% agreed, but as long as R2R is a resource embedded in an assembly this cannot be avoided. The best approach here is to place the code in a separate shared library and request it via an API similar to extended_assembly_probe when needed. Shared libraries are, sort of, privileged denizens on Android and on iOS, I imagine, the API could just return a pointer to in-memory data, pre-allocated and filled at build time.

* No json parsing during startup in fully self-contained Android/iOS/WASM apps. As [@rolfbjarne](https://github.com/rolfbjarne) said, we can figure all of this at build time.

Android already does all the JSON processing at build time. We generate native code (via LLVM IR) which contains environment variables, system properties as well as runtime properties (those from runtimeconfig.json)

* Be lazy as much as possible, avoid materializing very long strings. `TRUSTED_PLATFORM_ASSEMBLIES` value that is extremely long string is the main offender. [Expand host_runtime_contract to get assembly names and resolve their paths #100503 (comment)](https://github.com/dotnet/runtime/pull/100503#discussion_r1550067939) has done some experiments with fixing it.

Again, 100% agreed - I would add not converting pointers to strings to pointers as the lazy optimization here :)

@jkotas
Copy link
Member

jkotas commented Feb 27, 2025

Let me describe the process here:

You are pointing out that the implementation of the whole thing is not very efficient today and it can be streamlined. I agree with that.

The overhead of the PINVOKE_OVERRIDE roundtrips required by the current contract is miniscule compared to that. It is like in the order 100 nanoseconds if the other inefficiencies are fixed.

Here's an example of what happens when a tagged pointer is broken:

CoreCLR has number of issues/bugs related with tagged pointers. This is just one of many.

100% agreed, but as long as R2R is a resource embedded in an assembly this cannot be avoided. The best approach here is to place the code in a separate shared library and request it via an API similar to extended_assembly_probe when needed.

Right, we will need something along these lines.

@grendello
Copy link
Contributor

Let me describe the process here:

You are pointing out that the implementation of the whole thing is not very efficient today and it can be streamlined. I agree with that.

I'm pointing out that conversion from pointer-to-string-to-pointer is an entirely unnecessary process. We are running in a single process, we call the API directly, there's no reason to pass a pointer as a string. And note - coreclr_initalize takes two pointers directly, why are they passed as raw pointers and not as strings? Playing devil's advocate here.

The overhead of the PINVOKE_OVERRIDE roundtrips required by the current contract is miniscule compared to that. It is like in the order 100 nanoseconds if the other inefficiencies are fixed.

The overhead is 100% wasted time - this is the main point. Dismissing such "insignificant" things isn't going to lead to better performance.

Here's an example of what happens when a tagged pointer is broken:

CoreCLR has number of issues/bugs related with tagged pointers. This is just one of many.

They must be fixed then, including this one.

@jkotas
Copy link
Member

jkotas commented Feb 27, 2025

The overhead is 100% wasted time - this is the main point. Dismissing such "insignificant" things isn't going to lead to better performance.

Discussing a few hundred nanosecond savings is a distraction in this scenario at this point. It will lead to worse performance at the end it since we will pay less attention to improving things that actually matter.

@grendello
Copy link
Contributor

The overhead is 100% wasted time - this is the main point. Dismissing such "insignificant" things isn't going to lead to better performance.

Discussing a few hundred nanosecond savings is a distraction in this scenario at this point. It will lead to worse performance at the end it since we will pay less attention to improving things that actually matter.

It's not a distraction since the code has already been written to work around it. I don't see why fixing this obvious inefficiency appears to be a problem. Maybe if I understood the reason behind this conversion I wouldn't object it. And conversion of pointer-to-string-to-pointer does actually matter - since it has the potential to break tagged pointers as we pointed out before.

@steveisok
Copy link
Member

I don't see why fixing this obvious inefficiency appears to be a problem.

I don't think there is a problem. I've maintained that we get a big picture sense of where we are and attack the most glaring areas. I feel Jan is saying something similar.

@Clockwork-Muse
Copy link
Contributor

Additionally, this conversion could break tagged pointers on ARMv8+.

Should we also attempt to look at any RISC-V specific concerns as part of this?

I don't know if there are any, not do I have any of the relevant experience, but I do know it's being worked on here, so if there are architecture concerns like this, it seems like it should be brought up....

@am11
Copy link
Member

am11 commented Feb 28, 2025

Should we also attempt to look at any RISC-V specific concerns as part of this?

Currently there is no riscv64 mobile platform support in runtime, and desktop linux-riscv64 does not use this API with non-stringly key-value arrays.

int coreclr_initialize(
const char* exePath,
const char* appDomainFriendlyName,
int propertyCount,
const char** propertyKeys,
const char** propertyValues,
void** hostHandle,
unsigned int* domainId)

on windows, corehost uses utf-16 and utf-8 on unix, there is no conversion happening on unix in corehost (from main() -> coreclr_initialize() call path). The issue raised above is about passing function pointer in propertyValues array on mobile platform, for that integer to string conversion is necessary.

(perhaps xamarin could use strdup_printf("%#" PRIxPTR and sscanf(pinvokeOverride, "%" SCNxPTR instead which covers the tags?)

@jkotas
Copy link
Member

jkotas commented Feb 28, 2025

Additionally, this conversion could break tagged pointers on ARMv8+.

Should we also attempt to look at any RISC-V specific concerns as part of this?

There should not be any architecture specific concerns to worry about for the design tracked by this issue.

Wrt. tagged pointers: Tagged pointers are 64-bit values just like regular pointers on 64-bit platforms. I do not see anything fundamental why converting the value to string and back is incompatible with it. It is no different from storing to value to a field or a static variable and reading it back. My guess is that the C/C++ compiler is getting confused where to add and validate tags on the roundtrip code path. It should be like 2-line fix somewhere to fix this confusion once it is understood what's going on. I expect that there are number of other places in coreclr with the same problem. We should have a separate issue about it if tagged data pointers are something to worry about on Android.

@jonpryor
Copy link
Member

@jkotas wrote:

I do not see anything fundamental why converting the value to string and back is incompatible with it.

Tagged pointers are incompatible with round-tripping via strings because tagged pointers are explicitly designed to be incompatible with round-tripping, for security.

See e.g. Pointer Tagging for Memory Safety:

The change involves having the CPU hold two extra tag bits to the side of each piece of 64-bit data to denote whether the data holds a code/data pointer or not. By doing this, we can prevent attackers from using ‘data’ to corrupt ‘pointers’ and cause undesired damage.

the spirit of this proposal is to make corruption of pointers as hard as possible. We achieve this by storing (non-addressable) tag bits on the side of each 64-bit word in the computer to denote whether the data is merely regular data or it actually contains a pointer and, if so, what kind of pointer (code or data).

The general spirit of this pointer tagging proposal is to make the HW aware of whether each 64-bit in the system is regular ‘data’, or whether it is a ‘data pointer’ or a ‘code pointer’

When you pass pointers through strings, you bypass all of the pointer tagging going on behind the scenes. By doing so, the pointer is different; it has the same value, but it lacks the tags, and thus using it will result in a crash.

This is all very much by design and the only way to "fix" it is to stop doing that: pointers must be passed around as pointers, not strings!

This has the "nice" benefit -- in the context of this issue -- that we need to stop using strings to store pointers, which means less parsing overhead during startup, and better performance. Win-Win!

@jkotas
Copy link
Member

jkotas commented Feb 28, 2025

This is all very much by design and the only way to "fix" it is to stop doing that: pointers must be passed around as pointers, not strings!

I disagree with that. As I have said, the roundtripping through string is no different from storing the pointer in a field or a static variable and reading it back.

Depending on the pointer tagging plan that the memory location is on, the roundtrip of storing the pointer to a field or a static variable may strip the tag on store and re-attach the tag on load; or store and load the pointer with the tag. The exact same options exist when roundtripping the pointer through a string.

@jonpryor
Copy link
Member

@jkotas wrote:

I disagree with that. As I have said, the roundtripping through string is no different from storing the pointer in a field or a static variable and reading it back.

It's nice that you disagree with that, but copying a pointer value is not the same as roundtripping through a string.

Consider this short C++ program:

#include <stdio.h>

void invoke_me ()
{
  printf("Hello, world!\n");
}

int main()
{
  void (*p)() = invoke_me;
  p(); /* works */

  void (*q)() = p;
  q(); /* works */

  char *buf = new char[100];
  snprintf(buf, sizeof(buf), "%p", p);
  void (*r)() = nullptr;
  sscanf(buf, "%p", &r);
  r(); /* boom */

  return 0;
}

I you compile it on macOS with clang using -fsanitize=address:

clang -fsanitize=address ptr-tag.cc -lc++

and run it, it crashes, big-time:

% ./a.out
a.out(28379,0x7ff84452db00) malloc: nano zone abandoned due to inability to reserve vm space.
Hello, world!
Hello, world!
AddressSanitizer:DEADLYSIGNAL
=================================================================
==28379==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000101dc (pc 0x0000000101dc bp 0x7ff7be13a5e0 sp 0x7ff7be13a4f8 T0)
==28379==The signal is caused by a READ memory access.
    #0 0x101dc  (<unknown module>)
    #1 0x7ff80144a2cc in start+0x70c (dyld:x86_64+0xfffffffffff4f2cc)

==28379==Register values:
rax = 0x00007ff7be13a520  rbx = 0x00007ff7be13a540  rcx = 0x0000100000000000  rdx = 0x0000000000000007  
rdi = 0x00000ffef7c274a4  rsi = 0x00000ffef7c274a4  rbp = 0x00007ff7be13a5e0  rsp = 0x00007ff7be13a4f8  
 r8 = 0x0000000000000000   r9 = 0xf3f3f300f1f1f100  r10 = 0x0000000000000000  r11 = 0x00007ff7be13a238  
r12 = 0x00007ff7be13a738  r13 = 0x00007ff7be13a7d0  r14 = 0x00007ff844704e70  r15 = 0x00007ff8442910a0  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>) 
==28379==ABORTING
zsh: abort      ./a.out

Why? Because that's what Clang's AddressSanitizer is for.

See also Hardware-assisted AddressSanitizer, which on arm64 uses hardware pointer tagging to implement the (software based) AddressSanitizer which caused the above crash.

Going through strings is bad.

@grendello has a variation on the above app crashing on an Android device, with real pointer tagging. Same basic scenario, similar basic crash (but with hardware support!).

@grendello
Copy link
Contributor

grendello commented Feb 28, 2025

For reference, this is the Android 15 Pixel 8 crash:

shiba:/data/local/tmp $ ./test
Hello, world!
Hello, world!
Bus error 
Fatal signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x63a26 in tid 8094 (test), pid 8094 (test)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/shiba/shiba:15/AP4A.250105.002/12701944:user/release-keys'
Revision: 'MP1.0'
ABI: 'arm64'
Timestamp: 2025-02-28 20:11:31.761527452+0100
Process uptime: 1s
Cmdline: ./test
pid: 8094, tid: 8094, name: test  >>> ./test <<<
uid: 2000
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x0000000000063a26
    x0  0000000000000001  x1  0000000000000000  x2  0000000000000010  x3  0000007fd1cfaa80
    x4  0000007fd1cfab00  x5  0000000000000004  x6  fefefefefefefeff  x7  7f7f7f7f7f7f7f7f
    x8  0000000000063a26  x9  208da78212096f96  x10 0000000000000020  x11 0000000000063a26
    x12 0000000000000010  x13 0000000000000001  x14 00000000ffffffff  x15 00000000ffffffa5
    x16 00000071ebae9878  x17 00000071eba831d0  x18 00000071ec0ec000  x19 b40000707b95eac0
    x20 00000063a2654e3f  x21 0000000000000001  x22 0000007fd1cfafc8  x23 0000007fd1cfafd0
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007fd1cfaf20
    lr  00000063a2663428  sp  0000007fd1cfaf10  pc  0000000000063a26  pst 0000000060001800
3 total frames
backtrace:
  NOTE: Function names and BuildId information is missing for some frames due
  NOTE: to unreadable libraries. For unwinds of apps, only shared libraries
  NOTE: found under the lib/ directory are readable.
  NOTE: On this device, run setenforce 0 to make the libraries readable.
  NOTE: Unreadable libraries:
  NOTE:   /data/local/tmp/test
      #00 pc 0000000000063a26  <unknown>
      #01 pc 0000000000019424  /data/local/tmp/test
      #02 pc 0000000000057854  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+116) (BuildId: d607b2dd86e0ffc603529ce13afab7fa)

@jkotas
Copy link
Member

jkotas commented Feb 28, 2025

Right, I have said: "My guess is that the C/C++ compiler is getting confused where to add and validate tags on the roundtrip code path. It should be like 2-line fix somewhere to fix this confusion once it is understood what's going on."

You are saying that the roundtripping pointers through a string is not pretty and that the way we do that today may not be compatible with pointer tagging. I agree with that.

I disagree that roundtripping a pointer through a string is fundamentally incompatible with pointer tagging.

The example that you have included in your reply has a trivial bug: snprintf(buf, sizeof(buf), "%p", p); should be snprintf(buf, 100, "%p", p);. Once you fix the bug, it works just fine with clang address sanitizer.

@AaronRobinsonMSFT
Copy link
Member

There is also the Android documentation that describes the use of upper bits in the pointer, not two extra bits along side the 64-bit pointer. See https://source.android.com/docs/security/test/tagged-pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Host discussion os-android os-browser Browser variant of arch-wasm os-ios Apple iOS os-wasi Related to WASI variant of arch-wasm
Projects
Status: No status
Development

No branches or pull requests

9 participants