Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "MKS-Klipad50: Switch to standard support" #7883

Closed
wants to merge 1 commit into from

Conversation

torte71
Copy link
Contributor

@torte71 torte71 commented Feb 26, 2025

This reverts commit 4bddce9.

Standard support turned out to be too complicated for me.

This reverts commit 4bddce9.

Standard support turned out to be too complicated for me.
@github-actions github-actions bot added size/small PR with less then 50 lines Needs review Seeking for review Hardware Hardware related like kernel, U-Boot, ... labels Feb 26, 2025
@igorpecovnik
Copy link
Member

too complicated for me

In what sense? This is still a grey zone. You are not obligated to do hard lifting or resolve bugs. If device generally works well and most of general rules apply - try at least until next release and then decide?

@igorpecovnik igorpecovnik added the Discussion Being discussed - Voice your opinions :) label Feb 27, 2025
@torte71
Copy link
Contributor Author

torte71 commented Feb 28, 2025

I was panicking. :)

After the change from .csc to .conf, only the nightly trixie/jammy images got created and apt.armbian.com did not receive any packages (though beta.armbian.com still does). I misassumed that stable images would have been generated earlier.
This and your "website status is still on manual change" comment from #7851 (comment) (which I still do not understand fully) made me think that I drove that project into a dead end because of whatever unmet requirements, so I wanted to revert.

But now it sounds, as if that setup is correct and sufficient, i.e. a stable image and packages on apt.armbian.com will follow in 05/25 (or if there is any earlier release). That would be perfectly fine.

You are right, I'll drop this PR and wait for the next release.

Thanks for the clarification.

@torte71 torte71 closed this Feb 28, 2025
@igorpecovnik
Copy link
Member

igorpecovnik commented Feb 28, 2025

No worries.

After the change from .csc to .conf, only the nightly trixie/jammy images got created and apt.armbian.com did not receive any packages

Let me clarify this better.

Stable images are done manually, one by one, even we have automation "build them all" as some families have either known serious problems or they weren't been finished by the release date. Its a quality control feature - better old images then broken new ones. We can't hold back others ... so I think its better to release majority and release the rest once ready. This is happening right now, today, tomorrow. Its a lot of manual work, first by fixing and then releasing / testing. This includes several people, who we all have some life happening in between. I had to spent few days with my family as otherwise I risk serious problems of different kind ;) Backup is, but not for all roles and tasks ...

Since most targets got major kernel upgrade, where troubles are more then expected, we are holding back populating apt repository with all packages. Selecting here is much more difficult as kernel is common for many boards. Luckily we don't have one kernel, so we can hold back Rockchip, but release Allwinner ... but that only adds additional manual work and risk of added bugs in the process due to manual work.

your "website status is still on manual change"

When images are pushed to the download servers, index on download pages is updated automatically, but not the status itself - supported / csc ... This also slightly defines how images are displayed. What we are looking here is a script that would adjust wordpress database with changes on Git. As this is the source of truth. This still has to be switched by hand.

If board was previously .csc, it will have community supported targets in the download pages until next recompilation (happens weekly, sometimes not due to compilation issues and has to be fixed and run again), after changed to .conf. .conf is getting daily images and daily repo, but not community ... Perhaps too many complications.

That would be perfectly fine.

Yes, everything should show up properly within a week. If not, then something could be wrong somewhere and you open a ticket, ping me, ...

@torte71
Copy link
Contributor Author

torte71 commented Feb 28, 2025

Many thanks for that detailed explanation. Getting the whole picture from reading the docs and the workflow galore is a bit demanding for me, being new to this project.
But that really cleared things up. 👍

@igorpecovnik
Copy link
Member

igorpecovnik commented Feb 28, 2025

But that really cleared things up.

Great! I know its overwhelming for anyone that wants to jump in the loop. We try our best.

Stable images (.conf) - you can prepare them on your own and tell me, when they are tested, to move to download folder. You have rights (once you accept invitation to join .org) for that and here is most of related documentation - https://docs.armbian.com/Process_CI/

When you manage.

@torte71
Copy link
Contributor Author

torte71 commented Feb 28, 2025

Strange: The noble-minimal image goes into bootloop, kernel runs into "Synchronous abort" handler directly after u-boot starts the kernel.
(I've checked the .asc and .sha checksums, the image is correct. Tried different image-writing programs and a different emmc card. It's not a download or card error.)

It has something to do with the initrd - if I replace it with the initrd from noble-server, it boots up OK.
Regenerating initrd on noble-minimal does not solve it.
Comparing the extracted initrd contents shows that noble-minimal lacks some quite basic libraries - e.g. libz.so.1, liblzo2.so.2, libfuse3.so.3. That might be a trace.

I'll investigate that further, but I wonder if this really only affects the mks-klipad50?

Edit: Both other images (bookworm-minimal and noble-server) work fine.

@igorpecovnik
Copy link
Member

igorpecovnik commented Feb 28, 2025

This is strange, hug. Check CI build logs if there is anything odd, like qemu crash. Images were assembled on x86 machine - we can force them to use aarch64 runners ...

Build logs for broken image:
https://paste.armbian.de/gigazitici

Build logs for OKish Noble server:
https://paste.armbian.de/emehuriqar

I wonder if this really only affects the mks-klipad50?

We need to find that out.

@igorpecovnik
Copy link
Member

igorpecovnik commented Feb 28, 2025

Nothing obvious - this was build from trunk and there are some commits after release that could make some troubles. I would propose quick workaround - removing broken image (in progress) - until we find out why it broke.

@torte71
Copy link
Contributor Author

torte71 commented Feb 28, 2025

No need to hurry.
If it affects more boards, then fixing it for all has priority. If it affects only my board, then it still feels better if things got sorted out beforehand.
Tomorrow more, now it's family time.

@torte71
Copy link
Contributor Author

torte71 commented Mar 1, 2025

A working uInitrd (without bootloop) gets generated after installing "fuse3".
On noble-minimal (with "uInitrd" copied from noble-server to allow boot):

  • update-initramfs -y && reboot
    • creates bootloop
  • apt install fuse3 && reboot
    • (automatically runs update-initramfs -y)
    • boots OK
  • apt remove fuse3 && reboot
    • (automatically runs update-initramfs -y)
    • bootloop

A locally built noble-minimal from 25.05-trunk (today morning) has no problems booting - but also has no fuse3 installed.
Digging futher into it...

Edit: The initrd from noble-minimal DOES have fuse3, even though the normal filesystem doesn't.
After an update-initramfs with fuse3 installed, the resulting initrd contains the same files as the original (non-booting) noble-minimal, but there are binary differences between libfuse3.so.3, libfuse3.so.3.14.0, mount.fuse3 (but they have the same version numbers).

@torte71
Copy link
Contributor Author

torte71 commented Mar 3, 2025

Not nailed it down yet, current test status for the record (in addition to my prior post):

  • Other *-minimal images (bookworm, trixie, plucky) don't have this problem (but they also have no fuse3 installed)
  • I tried to reproduce the build locally exactly as on the workflow raw log (same git hash, same compile options), but it does not show the boot problem. Though there must be some more logic to it, as the "armbian-images" parameter is unknown to compile.sh from armbian/build.
  • I tried debugging the initrd, but even with "break=top" I don't get a shell before the bootloop
  • It is unexplainable to me, why/how at this early stage fuse3 should be required:
    • Running "ldd" on all initrd files: no dependency on libfuse
    • Grepping all initrd files for fuse3, fusermount and mount.fuse: nothing found
    • steps "/init" (from initrd) does before checking "break=top" (apart from setting variables):
      • mount sysfs, proc, devtmpfs, devpts, tmpfs - none should require fuse
      • executed binaries (non-shell-builtin): mount (see above), mkdir, cat, ln, hostname - none should require fuse
  • Every other image (with a rockchip kernel, running on mksklipad) show this directly after u-boot emits "Starting kernel...":
    efi_free_pool: illegal free 0x000000003cf20040
    efi_free_pool: illegal free 0x000000003cf1d040
    efi_free_pool: illegal free 0x000000003cf1b040

That happens when u-boot loads initramfs into EFI memory region (that should have been fixed in later versions): https://lore.kernel.org/all/d3f3fc7f-b29a-4503-9fe0-97468bbe1f71@gmx.de/
The broken noble-minimal shows only the first to "efi_free_pool" errors, then the Abort handler kicks in:

    efi_free_pool: illegal free 0x000000003cf20040
    efi_free_pool: illegal free 0x000000003cf1d040
    "Synchronous Abort" handler, esr 0x96000004

So maybe that illegal free leads to execution of some random code, which in case of installed fuse3 just happens to take a non-fatal pathway (out of sheer luck)?

My other assumption is that this is caused by some bug in an upstream package.
@igorpecovnik
I'll try one new workflow run when armbian/os is not busy to see if that changes the behaviour. So please don't wonder why I rerun it without any prior code change.


Are other boards affected? Probably not, but unsure:

  • I hoped to get some positive/negative replies on discord, if other people with rockchip64 boards had a similar problem with 25.2 stable noble-minimal images, but got no reaction so far. Not sure if others are affected.
  • I tested the same release for RasPi (rpi4b) noble-minimal, and it worked correctly, so other boards are probably not affected (I don't have any other Armbian capable hardware lying around for testing, at least to my knowledge)

@torte71
Copy link
Contributor Author

torte71 commented Mar 3, 2025

After rerunning the workflow:
Now "bookworm-minimal" has the bootloop.
But "noble-minimal" boots fine - even without the efi errors.
"noble-server" behaves like before: boots ok, with efi errors.

That's not my idea of a "stable" release. I'll continue search.

@igorpecovnik
Copy link
Member

I hoped to get some positive/negative replies on discord, if other people with rockchip64 boards had a similar problem

@paolosabatino Have you experienced this on similar Rockchip RK3328 boards?

I tested the same release for RasPi (rpi4b) noble-minimal, and it worked correctly, so other boards are probably not affected

To me it looks isolated to Rockchip family, could be to this SoC. It would be more reports, if this would be present wider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Being discussed - Voice your opinions :) Hardware Hardware related like kernel, U-Boot, ... Needs review Seeking for review size/small PR with less then 50 lines
Development

Successfully merging this pull request may close these issues.

2 participants