Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap each ELF segment if CONFIG_LIBPOSIXMMAP is enabled #28

Merged
merged 3 commits into from
Oct 19, 2023

Conversation

mogasergiu
Copy link
Member

@mogasergiu mogasergiu commented Sep 22, 2023

Deprecate use of elf_load_do_fdread and uk_memalign in favor of
mmaping each ELF segment individually. This allows for massive
performance improvements in conjunction with on-demand paging.

The first address of the executable will be the first result of the
first time call to mmap on the first PT_LOAD and everything else
will be contiguously mapped by using the MAP_FIXED flag.

Depends on lib-libelf #3.

elf_load.c Show resolved Hide resolved
The `e_rawfile` of `elf_prog` already contains the entire read or
mmap'd ELF so use that instead for computing the interpretor path.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
@i-Pear
Copy link
Member

i-Pear commented Oct 2, 2023

A quick question: Looks like we didn't use p_align in the PHDRs, what will happen if p_align is larger than 4k? The manual says the segments should be aligned that way in memory.

[1] https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-83432/index.html

elf_load.c Show resolved Hide resolved
@mogasergiu
Copy link
Member Author

A quick question: Looks like we didn't use p_align in the PHDRs, what will happen if p_align is larger than 4k? The manual says the segments should be aligned that way in memory.

I mmap according to the address that the linker already generated the PHDR with 🤔. Is it unsafe to assume that the linker already setup the address layout of the PHDRs according to the alignment the linker itself also used to describe the PHDR? 🤔

I just assume that the alignment is already taken care of implicitly by the address fields of the PHDR.

elf_load.c Outdated Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
@i-Pear
Copy link
Member

i-Pear commented Oct 2, 2023

Tested with a few applications, all working well, and looks good to me.

@mogasergiu
Copy link
Member Author

I am going to try to modularize the code some more. elf_load_phdr ended up being massive (~170 lines).

@mogasergiu
Copy link
Member Author

Ok, it should be easier to read the code now. Separated the logic between the first PT_LOAD and the other PT_LOAD's.

elf_load.c Outdated Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
@mogasergiu
Copy link
Member Author

@i-Pear Thank you for the very thorough review! Here's another update. I hope I integrated all of the comments we reached consensus on. Let me know if you have any other comments!

@mogasergiu mogasergiu requested a review from i-Pear October 6, 2023 14:38
@i-Pear
Copy link
Member

i-Pear commented Oct 6, 2023

The alignment of the first segment should be the maximum alignment of all PT_LOAD segments instead of the alignment of the first PT_LOAD segment? According to our discussion and your link [1].

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28676

@i-Pear
Copy link
Member

i-Pear commented Oct 12, 2023

True, this is also how the dynamic linker does it. But is it not safe to assume that only the first segment's alignment matters? Because otherwise, the other segments' alignment is ensured by the static linker when building the PHDR's and their start addresses right?

I don't think just considering the first alignment is safe enough, because you are performing a relocation. If the ELF is non-PIE, and you load it to the same address of the p_addr, that's safe enough. But if you relocate it, you have to ensure the diff of memory address is aligned to the maximum alignment?

Is it not safe to consider that if the alignment of the first PHDR is respected, then the first's PHDR alignment + the address offset from the PHDR with the biggest alignment to this first PHDR with a smaller alignment already ensures this alignment? E.g. (from NodeJS)

Program Headers:
  Type           Offset             VirtAddr           PhysAddr           FileSiz            MemSiz         Flags  Align
  LOAD           0x000000000064a000 0x000000000064a000 0x000000000064a000 0x0000000001229210 0x0000000001229210  R E    0x2000                                                                                                     
  LOAD           0x0000000001a00000 0x0000000001a00000 0x0000000001a00000 0x0000000000000231 0x0000000000000231  R E    0x200000

Does ensuring that this first segment is aligned is enough? Because then 0x0000000001a00000 would ensure for us that it will be aligned.

I still don't understand why it's safe. If the va_base is aligned to 0x200000, that will be safe.

But if we assume the va_base is 0x2000, in this case the first segment's p_align is respected. But the real vaddr of the second segment is 0x1a02000. The programmer may want the segment to use a HUGE_TLB mmap, but the unaligned va_base made it harder.

elf_load.c Show resolved Hide resolved
@mogasergiu
Copy link
Member Author

I decided to use the highest alignment after all. Since this is what glibc also does. You're right, let's keep it safe 💯.

Copy link
Member

@i-Pear i-Pear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this will be the final round of review.

elf_load.c Outdated Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
@mogasergiu mogasergiu force-pushed the smoga/mmap branch 3 times, most recently from 3673311 to 78b2a70 Compare October 13, 2023 08:43
@mogasergiu
Copy link
Member Author

Thanks for the very attentive review @i-Pear. If all is fine with you now, feel free to approve and give you Reviewed-by tag 🙏

Copy link
Member

@i-Pear i-Pear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks.

Reviewed-by: Tianyi Liu i.pear@outlook.com

@mogasergiu
Copy link
Member Author

Turns out that TODO: we placed as a comment came back at me for some very small apps. Thank you for pointing it out @skuenzer ! Alas, I ended up doing the first mmap of the first ELF segment having the length of what would supposedly be the entire file in memory. This way we ensure we have a contiguous big enough mapping ready to go.

@mogasergiu
Copy link
Member Author

@i-Pear This is the difference basically. Let me know if this change is still fine with you. You also mentioned it sometime ago IIRC.

Copy link
Member

@i-Pear i-Pear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better on my side, everything looks well except the following comments:

elf_load.c Outdated Show resolved Hide resolved
elf_load.c Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
elf_load.c Outdated Show resolved Hide resolved
We must ensure that the image base address is aligned according to the
alignment of the highest PT_LOAD alignment.

Thus, do two things:
- define `align` field in `struct elf_prog` that we will use to hold
this highest alignment
- make `elf_load_parse` also compute this, since it makes sense judging
by the function name and, besides, it does an initial iteration over
PT_LOADs anyway.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
@mogasergiu mogasergiu force-pushed the smoga/mmap branch 2 times, most recently from 135d9f2 to 04b559a Compare October 19, 2023 14:16
@mogasergiu
Copy link
Member Author

I also realised I could set elf_prog->valen to elf_prog->upperl. But I think it's fine to still keep prog_upperl/prog_lowerl as they might prove useful in the future.

Copy link
Member

@skuenzer skuenzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the update! Looks good to me.

Approved-by: Simon Kuenzer simon@unikraft.io

Add an additional way of loading an ELF depending on whether
CONFIG_LIBPOSIX_MMAP is used or not. If it is enabled, then the
ELF loader will `mmap` the file instead of `pread` it. Otherwise,
the previous method of `pread`ing file contents is used.

The first address of the executable will be the first result of the
first time call to `mmap` on the first `PT_LOAD` and everything else
will be contiguously mapped by using the `MAP_FIXED` flag.

Some special things to be noted when it comes to misaligned end
addresses of ELF segments. Suppose we have `vastart` and `vaend`,
corresponding to the start address of the ELF segment and the end
address respectively and consider memsz as the runtime memory size
occupied by the loaded segment:
if vaend > vastart + memsz
	if vaend == PAGE_ALIGN_UP(vastart + memsz)
		the remaining bytes from (vastart + memsz) to vaend
		have already been mapped, so simply memset with 0
	else /* vaend > PAGE_ALIGN_UP(vastart + memsz) */
		this segment contains a section with NOBITS, such as
		.bss, which can be fairly large. Therefore:
		/* memset what is already mapped */
		memset(vastart + memsz, 0,
		       PAGE_ALIGN_UP(vastart + memsz) - vastart);
		/* anonymously map what is left, as memsetting large
		 * sections such as .bss is very costly.
		 */
		mmap(PAGE_ALIGN_UP(vastart + memsz),
		     vaend - PAGE_ALIGN_UP(vastart + memsz),
		     ..., MAP_ANONYMOUS, ...);

NOTE: The first segment must be mmap'd with the length of the entire
file in memory and then the surplus is to be `munmapped`. This way we
ensure we will always have a contiguous mapping big enough ready.
Although future `mmap`'s with `MAP_FIXED` should normally overwrite
this surplus, ridding us of the need to explicitly `munmap` it, in
this current state of Unikraft (0.14.1) there is a bug in `ukvmem`
that may not handle overlapping VMA's correctly.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
@mogasergiu
Copy link
Member Author

Realised I forgot to add the elf_unload_vaimg equivalent for mmap

@razvand razvand merged commit fd7f282 into unikraft:staging Oct 19, 2023
razvand pushed a commit that referenced this pull request Oct 19, 2023
The `e_rawfile` of `elf_prog` already contains the entire read or
mmap'd ELF so use that instead for computing the interpretor path.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Tianyi Liu <i.pear@outlook.com>
Approved-by: Simon Kuenzer <simon@unikraft.io>
GitHub-Closes: #28
razvand pushed a commit that referenced this pull request Oct 19, 2023
We must ensure that the image base address is aligned according to the
alignment of the highest PT_LOAD alignment.

Thus, do two things:
- define `align` field in `struct elf_prog` that we will use to hold
this highest alignment
- make `elf_load_parse` also compute this, since it makes sense judging
by the function name and, besides, it does an initial iteration over
PT_LOADs anyway.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Tianyi Liu <i.pear@outlook.com>
Approved-by: Simon Kuenzer <simon@unikraft.io>
GitHub-Closes: #28
razvand pushed a commit that referenced this pull request Oct 19, 2023
Add an additional way of loading an ELF depending on whether
CONFIG_LIBPOSIX_MMAP is used or not. If it is enabled, then the
ELF loader will `mmap` the file instead of `pread` it. Otherwise,
the previous method of `pread`ing file contents is used.

The first address of the executable will be the first result of the
first time call to `mmap` on the first `PT_LOAD` and everything else
will be contiguously mapped by using the `MAP_FIXED` flag.

Some special things to be noted when it comes to misaligned end
addresses of ELF segments. Suppose we have `vastart` and `vaend`,
corresponding to the start address of the ELF segment and the end
address respectively and consider memsz as the runtime memory size
occupied by the loaded segment:
if vaend > vastart + memsz
	if vaend == PAGE_ALIGN_UP(vastart + memsz)
		the remaining bytes from (vastart + memsz) to vaend
		have already been mapped, so simply memset with 0
	else /* vaend > PAGE_ALIGN_UP(vastart + memsz) */
		this segment contains a section with NOBITS, such as
		.bss, which can be fairly large. Therefore:
		/* memset what is already mapped */
		memset(vastart + memsz, 0,
		       PAGE_ALIGN_UP(vastart + memsz) - vastart);
		/* anonymously map what is left, as memsetting large
		 * sections such as .bss is very costly.
		 */
		mmap(PAGE_ALIGN_UP(vastart + memsz),
		     vaend - PAGE_ALIGN_UP(vastart + memsz),
		     ..., MAP_ANONYMOUS, ...);

NOTE: The first segment must be mmap'd with the length of the entire
file in memory and then the surplus is to be `munmapped`. This way we
ensure we will always have a contiguous mapping big enough ready.
Although future `mmap`'s with `MAP_FIXED` should normally overwrite
this surplus, ridding us of the need to explicitly `munmap` it, in
this current state of Unikraft (0.14.1) there is a bug in `ukvmem`
that may not handle overlapping VMA's correctly.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Tianyi Liu <i.pear@outlook.com>
Approved-by: Simon Kuenzer <simon@unikraft.io>
GitHub-Closes: #28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants