Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorming about the design goals of a type-3 image format #1

Open
probonopd opened this issue Feb 27, 2020 · 20 comments
Open

Brainstorming about the design goals of a type-3 image format #1

probonopd opened this issue Feb 27, 2020 · 20 comments

Comments

@probonopd
Copy link

probonopd commented Feb 27, 2020

Let's first brainstorming about the design goals of a type-3 image format, then spec a future type-3 format, and then start writing an implementation.

Please be aware that the type-2 spec leaves quite much flexibility to implementors, e.g., (iirc) it doesn't even mandate the payload to be in squashfs format (need to double check this one).

Brainstorming about about the design goals of a future ype-3 image format has been started here:
https://github.com/AppImage/AppImageKit/wiki/Brainstorming

Let's not introduce new image formats frequently. Our tools (e.g., go-appimage, AppImageLauncher, etc.) will need to support all of them to be backward compatible... maybe it is possible to achieve some (or even many) of the goals with merely a new type-2 runtime?

@TheAssassin
Copy link
Owner

The type 2 specification requires the magic bytes in the position they are right now. Please stop telling that this could be fixed in the existing type 2. It can not. It never could.

Nobody's suggested to move away from squashfs (except for you actually), and neither has anyone proposed to make larger changes. The use of ELF sections and all that stuff however brings more problems than it's worth. This time, we have all the freedoms we want to have, since we simply can change the specification, as we have to break anyway.

Too broad specifications by the way make implementing support a lot harder than it has to be. The type 2 specification is incomplete and was updated/amended frequently. That's a situation I want to avoid. I've come up with a simple but powerful versioning mechanism which allows adding new features while retaining full backwards compatibility. It's very easy to implement, especially since I intend to provide a C(++) library for the new "AppImage header" format.

@probonopd
Copy link
Author

The type 2 specification requires the magic bytes in the position they are right now.

That's correct.

Please stop telling that this could be fixed in the existing type 2. It can not. It never could.

Actually I didn't mean to imply this.

Did we find a new, more suitable location for the magic bytes yet?

@TheAssassin
Copy link
Owner

Regarding the "doesn't mandate squashfs", that's one core issue. All the tools have to assume this, or they have to use super slow subprocessing. Actually you'd have to implement both, attempting to use squashfs by default, then falling back to extracting with the runtime. The latter however is not even possible in may cases, as you might not want to trust the embedded runtime. Inspection tools like appimagelint for instance would then have to refuse service.

The work in this repository fixes the problem by narrowing the specification allowing other tools to implement mechanisms. All they have to do is check whether the revision identifier in the AppImage's header is newer than the one they support at least. That should be the case for the vast majority of applications, since all features from type 2 should/will be supported. New features are added very infrequently, if any, and then are probably only really relevant for edge case applications.

Simple example: AppImage type 3 revision 100 added update information. A year later, when there is revision 200, an application developer wants to build a tool which uses update information. They can look in the changelog when update information was added. They see that it was revision 100. Now, they can check whether the AppImages they want to work with at least provide revision 100. That'll be the case for almost any AppImage build after revision 100. So, if e.g., the current 200 exceed the minimum required 100, the feature can be expected to be supported. Pretty simple, isn't it? There's no need for any strings, those data aren't ever evaluated by humans, but by tools. A simple monotonic ID is sufficient.

The lack of versioning and the too broad specification by the way prevent runtimes from being used interchangably. You've recently found that other zip-style runtime, which might conform to the spec, but I cannot use a random runtime on it. That's really a problem security wise. And it's something that can be fixed now.

Did we find a new, more suitable location for the magic bytes yet?

There are several suggestions in the AppImageKit repository's issue tracker. I haven't investigated that yet, as my time is constrained. But this is the most important thing to solve in type 3.

For a time I just wanted to rebrand the type 2 runtime by the way, only change the magic bytes location. But given the quality of the code, the dependency problems (license and code wise) and the problematic specification which also doesn't do any versioning, I concluded it's best to move on. If we provide a very good specification plus software libraries, I think it's much easier to work with this type.

I also intend to convert old AppImages into type 3. Since the payload part won't have to change (at least for now), we can just concat the old payload to the new runtime. That'll be a great plus.

@TheAssassin
Copy link
Owner

Oh, one more thing: this work focuses on a "divide and conquer approach". Right now, everything in AppImage is way too monolithic. There's tight coupling everywhere. Tight coupling is evil, though.

This work separates the pure runtime from the metadata. The metadata can be generated separately and this way appimagetool does not longer have to patch the bundled runtime to include data in some ELF sections. It sits in between the runtime and the payload. It's versioned, it follows a simple scheme and is easy to implement. It supports scalar and variable length data types. Reading metadata from an AppImage has never been this easy.

This allows for one really nice bonus feature: we can officially sign the runtime and people can, by validating the signature, be sure the runtime executable has been created by us. This might come in very handy for future sandboxing purposes, as a sandbox would also not have to ship their own runtime. As a developer I can save myself from downloading an upstream runtime and extracting an unknown AppImage with that, since I know the runtime is at least safe to use.

@probonopd
Copy link
Author

probonopd commented Feb 27, 2020

Sounds reasonable to me, as long as we get clarity about what needs to be spec'd (e.g., filesystem format - also the compression type?) and what does not need to be spec'd but can intentionally be left open to the implementer (e.g., choice of programming language, whether to link statically, etc.).

@probonopd
Copy link
Author

probonopd commented Feb 27, 2020

appimagetool does not longer have to patch the bundled runtime to include data in some ELF sections

As explained before, an explicit design goal of mine has been that one should be able to move e.g., AppImages to a different server (=URLs change) without having to redo the whole AppImage (=uncompress and recompress the whole filesystem image). The ELF section approach does this just fine, and after all we have implemented it already in many (all?) of our tools...

But somehow our current implementation of the runtime ended up using more ELF sections than what the spec says, with checksums that require things to be calculated and written in a certain, undocumented order, and that made it a bit complicated in the end.

@TheAssassin
Copy link
Owner

clarity about what needs to be spec'd

The approach is, everything is specified as precisely as necessary with every revision. What I mean with divide-and-conquer is that we can enforce this scheme for the AppImage header format. The programming language of the runtime doesn't matter then any more. These runtimes just need to be built for the right header format (or support multiple revisions).

The entire runtime specification then references this AppImage header sub-specification, and can be much broader. Divide and conquer.

The ELF section approach does this just fine

Just have a look at the AppImage header... well... header in include/. It was inspired by stuff like the ELF sections. It allows for arbitrary-size buffers for dynamic length strings. It specifies that strings might either be zero-terminated or terminated by the size of the buffer. You can either allocate a very large buffer (e.g., 16 kiB for the signature public key) to make it future proof. Or you can simply re-do the AppImage header, prepend the runtime and append the filesystem image.

We had discussed using files inside AppImages before, but as you say, that adds more problems than it solves. However, ELF is a format I don't really want to overstrain spec wise. We've seen where that leads (magic bytes in a location that is usually ignored but may be checked, leading to execution problems). Therefore I separated the runtime from the metadata and the payload.

An AppImage now is basically: runtime binary + AppImage header + payload

@TheAssassin
Copy link
Owner

Please stop editing your issues to add more content. That renders discussions useless. Editing typos and other minor things is fine. But adding entire sections is super counterproductive. If you really want to do that, please prepend Edit: or so. That way, someone reading the discussion knows that this block might or might not have been known to the authors of the following comments.

@CosmicToast
Copy link

re: header going from say, 100 to 200
this is all fine and dandy as long as everything is backwards compatible, but wanting to break compatibility is something that sometimes happens
there's no easy way to embed information of compatibility breakage with that flat scheme
is there any specific reasoning behind avoiding established standards like semver for header versioning?

@TheAssassin
Copy link
Owner

@5paceToast IMO breaking changes will require another type 4 then. Therefore I didn't see the explicit necessity to differentiate between patch and minor releases. Given the experience I have with type 2, adding features happens much more often than fixing issues.

The overall type (in semver, the major release) is actually contained as a value in the header. To some extent, we could say we have a major.minor versioning scheme therefore, lacking the .patch suffix.

@CosmicToast
Copy link

@TheAssassin major.minior sounds good to me.
I don't know why there would need to be a type 4 runtime when the actual runtime part basically wouldn't change (just some behavior, as is usually the case with major changes).

@TheAssassin
Copy link
Owner

A breaking change would usually imply that we e.g., drop the AppDir format for the payload, and invent something new. Changes in the runtime behavior usually are bug fixes, and I don't think that's the kind of stuff we need to add version tags to. What needs versioning are things like payload format (e.g., squashfs, zip, ISO9660,...), metadata fields (which ones are available), etc. The stuff programmers need to know when trying to deal with AppImages. The runtime behavior should explicitly not be part of this versioning in the AppImage header. As suggested by me and @probonopd, it would be beneficial if the runtime was interchangable anyway, so you could use different ones in different languages. If a runtime does something very wrong (e.g., overflows as the boundaries of header fields are not respected), that's not a reason to bump any version identifier.
Tools working with AppImages aren't interested in the runtime anyway.

I think we need to separate format versioning from runtime versioning anyway. They implement data extraction etc., themselves or they can also just ship a "known good" runtime that works for them. The latter has been practiced by AppImageLauncher with very good success (but had to be removed to have it included in distros). It's still used by appimagelint for example, which downloads its own runtime from GitHub to mount AppImages somewhat more safely than using the embedded runtime.

@CosmicToast
Copy link

What needs versioning are things like payload format (e.g., squashfs, zip, ISO9660,...), metadata fields (which ones are available), etc.

Yes, I understood that.
Consider, for example, changing the meaning or format of one of the metadata fields.
The reasons for doing it could be arbitrary or well-justified (e.g a known problem with one of the formats).
It's effectively a compat-breaking change, and you cannot safely assume that it will behave the same as the previous release.
i.e it would be a "major" bump, but it'd only be represented as a "minor" bump, potentially breaking behavior.
There is no need for the runtime or image format to change in any other ways because they're still fundamentally the same.

@TheAssassin
Copy link
Owner

In that case you need to bump the type, obviously. It's a breaking change which means we have to bump the major version.

Considering your example, let's play that through with update information. Update information has a string-based format right now, using | as separators to allow for multiple fields. If we ever changed that, we could for example leave that field empty in the future, and add a new field that fixes the problem. Of course tools which are not aware of the new field cannot know about it, and will not work. However, it won't break their behavior.

The assumption I make is that we can work around this kind of situation efficiently, we've done it before. I think that situations in which you'd really have to break in a way that requires a bump of the type are very, very rare. Actually, the first one I ever experienced is the magic bytes problem. Of course this assumption is a trade-off, but I think in this special case simplicity wins over more complex versioning.

It's also not too hard to go for a new type. The header format allows for this already, you have to increment the AppImage version field only. There's not too many modifications needed.

After all AppImages types aren't really something users want to know about. The core UX, make executable and run, will work for any AppImage.

@CosmicToast
Copy link

I don't think the versioning would be that much more complex (again, this is a fairly well-established format), but I'm fine postponing that until a type4 happens, if ever.

@TheAssassin
Copy link
Owner

@5paceToast what are the concrete benefits of an AppImage type 3 revision 1.0.1 compared to an AppImage type 3 revision 120? I'm trying to understand what other cases we could cover by using real semver. I mean, to some extent, I actually do use it.

What I want to avoid is making the versioning as complicated as four version identifiers. We already have a type, to me that is (and always has been) the major version number.

By the way, I'm not reluctant to really splitting the revision into minor and patch versions. That's entirely possible. For now, I've only made a suggestion on what this header format could look and work like. I have only added a few fields as examples, they're not the final set that will be supported.

@CosmicToast
Copy link

My understanding is that we are versioning multiple "things":

  1. The runtime(s).
  2. The header format.
  3. The image type.

The runtimes control their own versions, so this isn't a concern.
The image type represents things that are fundamentally incompatible - e.g wanting to add a different header.
The header format represents that which is inside of the header.
i.e the header format can change without the image type changing, necessarily and vice versa (you can make a type4 image format that would have the exact same header, but (for example) would have different features elsewhere).

What this fundamentally means is that incompatibilities in the header metadata and in the image format aren't the same thing.
As such, I think it makes sense to track them separately.
Because header metadata and co. don't really get "behavior-only patches" (since there is no behavior), having major.minor in the image format makes sense to me.
However tying the header metadata and co. compatibility versioning to the image format doesn't, since (again) they can actually change independently.

Another example could be the converse case.
Imagine a new image format that adds another entity to the three being discussed here, but the header format is unmodified.
There shouldn't be a need to reset compatibility metrics for the contents of the header just because the image looks a bit different.

@TheAssassin
Copy link
Owner

@probonopd what do you think about this sort of versioning? I do see the point of @5paceToast, but I also fear the increased complexity of version compatibility checks. Personally, I think it's a little over-engineering to track header and AppImage type with semantic versioning. We're reasonably unlikely to change the AppImage format over the next time. If we have a type 4 for instance, we can recycle old goods, but it's IMO easier to understand if we just reset its revision to 0 even if it were compatible with an AppImage type 3 header revision 20 or so.

Anyway, let's move versioning to another issue to not bloat this one.

Today, I thought, "hey, if we make it fully static, we can just build for the oldest ISA our target CPUs may run". TL;DR: we can probably ship our runtime as a 32-bit executable for x86 and ARM based CPUs, it should be able to run on the succeeding 64-bit ones. I know that arm64 can run armhf binaries (Debian naming scheme) for instance. And we all have run i386 on x86_64 before.

@probonopd
Copy link
Author

probonopd commented Mar 7, 2020

So, let's see, @5paceToast suggests to version:

  1. The file format. I call those "type" to avoid confusion. My line of thought was that something either is compatible to a certain type definition or it isn't and I wanted to keep complexity low, hence we are using ints. Also, please understand that the types are not versions, they co-exist. There may be reasons to still use type-1 images and I think it is good practice that our tools support all image formats out there. Hence I am super conservative when it comes to adding new image types
  2. The implementation of the runtime. For this, we are currently using build hashes/numbers. E.g. --appimage-extract is not specified by the type, it is a mere feature of the runtime that was added in some version of the runtime implementation
  3. What do you mean by the "header format" and how would it be different from 1.?

@probonopd
Copy link
Author

probonopd commented Mar 7, 2020

Today, I thought, "hey, if we make it fully static, we can just build for the oldest ISA our target CPUs may run". TL;DR: we can probably ship our runtime as a 32-bit executable for x86 and ARM based CPUs, it should be able to run on the succeeding 64-bit ones. I know that arm64 can run armhf binaries (Debian naming scheme) for instance. And we all have run i386 on x86_64 before.

Indeed. WINE AppImages do this all the time to run 32-bit Windows apps on 64-bit Linux machines. But of course you give up the advantages of the 64-bit architecture for the app inside that AppImage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants