Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding malicious symbolic links to an EPUB publication #2322

Closed
iherman opened this issue Jun 3, 2022 · 19 comments · Fixed by #2337
Closed

Adding malicious symbolic links to an EPUB publication #2322

iherman opened this issue Jun 3, 2022 · 19 comments · Fixed by #2337
Labels
Cat-Security Grouping label for all security related issues EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation

Comments

@iherman
Copy link
Member

iherman commented Jun 3, 2022

As discussed in the meeting with @GJFR (see EPUB meeting) a malicious EPUB publication may include a symbolic link in its content pointing 'outside' the container to the local file system. This threat should be documented in the §15.2 Thread model of the reading system.

We may also consider explicitly disallowing such files in the content document, but only if epubcheck is able to find those in the content. @rdeltour @mattgarrish ?

@iherman iherman added Cat-Security Grouping label for all security related issues Agenda+ Issues that should be discussed during the next working group call. labels Jun 3, 2022
@mattgarrish
Copy link
Member

I'd expect epubcheck would report the link as invalid without needing to detect that it's a symlink. Now that we're defining a unique origin for each publication, a relative symlink won't escape the structure of the publication, so there won't be a resource at the location. And if we disallow file URLs, absolute symbolic links will be flagged.

The same would be true on the reading system side, so long as it follows the new requirements.

@rdeltour
Copy link
Member

rdeltour commented Jun 6, 2022

Now that we're defining a unique origin for each publication, a relative symlink won't escape the structure of the publication, so there won't be a resource at the location. And if we disallow file URLs, absolute symbolic links will be flagged.

I may be wrong, but symlinks would not be impacted by anything we say about URLs, would they? Symlinks are basically OS-dependent special files.

We may also consider explicitly disallowing such files in the content document, but only if epubcheck is able to find those in the content.

I'll look if EPUBCheck can easily detect that.

@iherman
Copy link
Member Author

iherman commented Jun 6, 2022

Now that we're defining a unique origin for each publication, a relative symlink won't escape the structure of the publication, so there won't be a resource at the location. And if we disallow file URLs, absolute symbolic links will be flagged.

I may be wrong, but symlinks would not be impacted by anything we say about URLs, would they? Symlinks are basically OS-dependent special files.

I agree, @rdeltour. Epubcheck should, if it is able to, detect that they are a very special file on the file system in the zip content; they are os dependent.

@mattgarrish
Copy link
Member

Symlinks are basically OS-dependent special files.

Ya, but you can't put a symlink in an epub publication, can you? How would that work if the properties aren't set to make it a symbolic link?

I thought this was about referencing a symlink to access a location on the local filesystem, like a path starting with /etc/. That can't happen if the reference is resolved relative to the base url of the publication.

I don't know how we stop it from a script, though.

@rdeltour
Copy link
Member

rdeltour commented Jun 6, 2022

Ya, but you can't put a symlink in an epub publication, can you? How would that work if the properties aren't set to make it a symbolic link?

ZIPs can be created to store a symlink (eg. with the zip -y command).

@mattgarrish
Copy link
Member

ZIPs can be created to store a symlink

Ah, right, my mistake. I was thinking we blocked that in OCF, but nope.

@iherman
Copy link
Member Author

iherman commented Jun 10, 2022

The issue was discussed in a meeting on 2022-06-09

List of resolutions:

  • Resolution No. 3: Add symbolic links to the threat model, investigate whether EPUBCheck can detect symlinks.
View the transcript

3. Adding malicious symbolic links to an EPUB application.

See github issue epub-specs#2322.

Dave Cramer: symbolic link outside the epub pointing to local fs, should we discuss this in threat model and/or explicitly disallow.
… not sure how this would be done in practice.

Brady Duga: there is a parameter to include symlink in zip.

Dave Cramer: yes, -y.
… but i didn't get a chance to try this.
… should we add the idea to the threat model?.

Wendy Reid: i think so.

Brady Duga: this one is more obscure, easy for people not to think of this. So maybe just a notice to watch out for this, and similar threats of files referencing other files.
… and this wouldn't be a MUST or a SHOULD?.

Matt Garrish: could it be a MUST though? "RS must not preserve symbolic link when unpacking" something like that.

Brady Duga: it depends on the fs, the OS... not sure how I would tell people not to do this, and I don't want to limit this to the way you would do it using zip.

Dave Cramer: and if you have a really bad RS that allows scripting, and the script asks the local fs to do stuff.

Ben Schroeter: is epubcheck looking into this issue?.

Matt Garrish: romain is looking into whether epubcheck could detect such symlinks.

Proposed resolution: Add symbolic links to the threat model, investigate whether EPUBCheck can detect symlinks. (Wendy Reid)

Brady Duga: +1.

Wendy Reid: +1.

Ben Schroeter: +1.

Matt Garrish: +1.

Matthew Chan: +1.

Dave Cramer: +1.

Shinya Takami (高見真也): +1.

Toshiaki Koike: +1.

Masakazu Kitahara: +1.

Resolution #3: Add symbolic links to the threat model, investigate whether EPUBCheck can detect symlinks.

@dauwhe
Copy link
Contributor

dauwhe commented Jun 10, 2022

So, I tried to make an EPUB with a symbolic link. Made a symbolic link to one of my spine items in MacOS. Zipped with the -y flag. EPUBCheck was not happy:

Validating using EPUB version 3.2 rules.
FATAL(RSC-016): ..epub/OPS/s001-BookTitlePage-01.xhtml(1,1): Fatal Error while parsing file: Content is not allowed in prolog.
ERROR(RSC-005): ..epub/OPS/s001-BookTitlePage-01.xhtml(-1,-1): Error while parsing file: Content is not allowed in prolog.

Check finished with errors
Messages: 1 fatal / 1 error / 0 warnings / 0 infos

EPUBCheck completed

Reading systems were also not happy. Thorium opened the book but had an error message for that spine item: error on line 1 at column 1: Document is empty. iBooks seemed to be weirded out by the whole idea, and just wouldn't navigate to that page.

I expect what I found is very specific to a particular operating system.

@iherman
Copy link
Member Author

iherman commented Jun 11, 2022

I expect what I found is very specific to a particular operating system.

... but also shows that such a restriction can be implemented and tested. Ie, it makes sense to add explicitly to the spec.

Thanks! And keep your code, you got yourself a test :-)

@iherman
Copy link
Member Author

iherman commented Jun 14, 2022

There are actually several similar notions:

  1. Unix, MacOS: Symbolic Link
  2. MacOS: File Alias
  3. Windows: shortcut files
  4. Windows: junction points

I am familiar with (1) and (2), not really with (3) and (4). But they all strike me as somewhat similar.

Before getting into spec text, there are some questions to answer

  1. Which are the type of files we want to mention in the thread model and/or explicitly forbid in EPUB (my answer would be: all four)
  2. If we decide to forbid them normatively, can EPUBcheck, as well as a Reading System, detect those entries? (my answer is: probably yes. Symbolic link can be detected in nodejs via the fs.lstat call, and I found a reference to a nodejs package for macos alias that does this. I have no idea about the window shortcut files and junction points, though.)
  3. stupid editorial thing: if we put a normative no-no into spec, are there better references to, say, the definition of a symbolic link than a wikipedia link or a reference to some microsoft internals?

@iherman
Copy link
Member Author

iherman commented Jun 14, 2022

Looking at #2322 (comment) we may get this in the wrong order. What about:

OCF ZIP containers MUST only contain (in UNIX terminology) ordinary files or directories.

MUST treat any OCF containers that contain any other resource than ordinary files or directories as in error

We then avoid the necessity to list the various aliases like shortcut files or symbolic links, but we also avoid other possibly thorny issues like files representing sockets, special files like print device, and other hairy thing that the operating system may come up with. I do not think this would unduly restrict any deployed publication out there, and would greatly reduce the security risk.

Is that a realistic restriction from the point of view of epubcheck or a reading system?

Cc @danielweck @rdeltour @bduga @GJFR

@rdeltour
Copy link
Member

There are actually several similar notions:

  1. Unix, MacOS: Symbolic Link
  2. MacOS: File Alias
  3. Windows: shortcut files
  4. Windows: junction points

About EPUBCheck:

  1. symbolic links can be detected with ZIP or file APIs.
  2. macOS aliases seem to have a magic number, so we can detect it.
  3. Windows shortcut files I don't know. We can check the 'lnk' extension but that seems brittle.
  4. I'm not familiar with junction points, but are they even an issue? When you ZIP a junction point, I expect its content to be included in the ZIP like with any other directory, no? (to be verified)

What about:

OCF ZIP containers MUST only contain (in UNIX terminology) ordinary files or directories.

"ordinary files or directories" sound a bit vague. An alias file is really an ordinary file, just having some special behavior on a specific OS.
What's an "ordinary file"?

Also, there are other shortcut-like files out there. For instance .desktop files on some linux distros. Possibly others.
Is it realistic to forbid them all?

@iherman
Copy link
Member Author

iherman commented Jun 15, 2022

"ordinary files or directories" sound a bit vague. An alias file is really an ordinary file, just having some special behavior on a specific OS.
What's an "ordinary file"?

Yes, I was fighting with this myself. I took the terminology in the unix world; the closest to a more formal definition that I found was in https://www.geeksforgeeks.org/unix-file-system/. There may be some ways of referencing this better but I did not find it; is there an authoritative reference to Linux and/or Unix that one could use? (But your example for the MacOS alias file is discomforting...)

I am a bit worried about starting by enumerating the type of file we do not want to allow. That is also a fragile approach... (what happens if someone has the bad idea to put a file of the sort /dev/print into the zip file?)

@dauwhe dauwhe added Agenda+ Issues that should be discussed during the next working group call. and removed Agenda+ Issues that should be discussed during the next working group call. labels Jun 15, 2022
@mattgarrish
Copy link
Member

Can't we exclude them based on their all having the same function:

The EPUB container MUST NOT contain files and folders that affect pathname resolution. These include POSIX and UNIX-like symbolic links, MacOS aliases, and Windows shortcuts and junction points.

@iherman
Copy link
Member Author

iherman commented Jun 15, 2022

The approach in #2322 (comment) might work, but I am not sure how we would clearly specify "affecting pathname resolution". And it would still allow for some other nasty files like /dev/print in the zip file.

Just for the records, I was also looking at the references to make #2322 (comment) cleaner, and I could locate at least a stable UNIX reference, namely

The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
Copyright © 2001-2018 IEEE and The Open Group

which has terminology definitions for some file related concepts. One alternative is to refer to those and say that we accept only regular files and directories an nothing else.

3.129 Directory
A file that contains directory entries. No two directory entries in the same directory have the same name.

3.130 Directory Entry (or Link)
An object that associates a filename with a file. Several directory entries can associate names with the same file.

3.164 File
An object that can be written to, or read from, or both. A file has certain attributes, including access permissions and type. File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory. Other types of files may be supported by the implementation.

3.323 Regular File
A file that is a randomly accessible sequence of bytes, with no further structure imposed by the system.

I am not arguing this is the way we should go, just putting out there for discussion... (and we should be sure that MacOS alias files are also excluded...)

@bduga
Copy link
Collaborator

bduga commented Jun 15, 2022

I think this is going to be hard to spec given that it depends on implementation details. For instance, referencing a symbolic link may or may not change the pathname. On *nix platforms you could have a hard link - do we want to forbid that? Probably not. My feeling was that this is a novel attack and one implementors may not be aware of so it seems worth mentioning in the security section for Reading Systems, but not with a hard requirement. Something more like "Attackers may try to gain access to non-publication resources using file indirection techniques (for example, symbolic links or file aliases). It is the Reading Systems responsibility to ensure that malicious content cannot gain access to non-publication resources using this attack." Or maybe just the first sentence of that.

@mattgarrish
Copy link
Member

I am not sure how we would clearly specify "affecting pathname resolution"

If we can't clearly specify symbolic links, then how do we specify the superset of files that have "no further structure imposed by the system"?

From what I've read, storing symbolic links of any type is non-standard and not univerally supported. There are implementations that use the external file attribute in the central directory record to store the information, but I'm not even sure that's universal given that mac aliases work on an identifiable byte sequence in the file.

In other words, I'm fine with only noting this in the reading system specification, since attempts to ban the practice on the authoring side are probably going to be incomplete, and won't account for reading systems loading unpacked epubs.

Maybe we just informatively warn authors that they shouldn't include these kinds of files or expect them to function as expected. Short of hackery in the package document to hide them as foreign resources with fallbacks, isn't the only way to legitimately exploit symbolic links to include the files without any reference to them and then use a script to read whatever data is loaded? (i.e., hide them as data files.) It's not really a practice you can innocently stumble on.

@iherman
Copy link
Member Author

iherman commented Jun 16, 2022

Just trying to find a way forward (also before our call tomorrow to have something clear to be voted upon)

  1. We add a text to the RS security section along the lines of @bduga's proposal in Adding malicious symbolic links to an EPUB publication #2322 (comment)
  2. We add a note in the Core (probably in §4.2.2 ZIP file requirements) along the lines you propose, but also a reference to possible security issues. We may also want to add that the -y of the zip program should not be used.
  3. We do not add anything normative, neither in Core nor in RS (because we can't).

I am wondering whether we can do something slightly stronger in place of (3) using SHOULD NOT instead of MUST NOT. The only reason is that, per #2322 (comment), epubcheck could filter out some cases, and that is probably a useful thing to legitimately warn the user about...

@iherman
Copy link
Member Author

iherman commented Jun 17, 2022

The issue was discussed in a meeting on 2022-06-17

List of resolutions:

View the transcript

1. Malicious Symbolic Links in EPUB.

See github issue epub-specs#2322.

Dave Cramer: this was raised as possible security issue as a way to create links to the local fs. Because these things are OS dependent, we have been working on how to define this requirement, and how stringent to be. Also investigating how epubcheck can help.

Dave Cramer: See proposal in a comment to close the issue.

Brady Duga: the SHOULD NOT is a little weird because the purpose of SHOULD is that 'we don't think you should do this, but there are exceptional cases where it might make sense'.
… we're using it to me 'we're not sure how to express this so we'll just say SHOULD NOT'.

Matt Garrish: why not make it make it a MUST NOT and accept that epubcheck can't catch everything?.

Brady Duga: problem with a MUST is that we are trying to define behaviour over an undefined concept, so it's hard to put normative statements around that.

Ivan Herman: agreed. I've tried to find language to fit this, but I couldn't..
… I would love to say 'MUST NOT' but not sure how to define the prohibited behaviour.
… and I agree that using SHOULD NOT is vague for the same reasons.

Brady Duga: this isn't a real problem, there aren't a lot of epubs with symlinks in it. If you do put symlink it, your epub will probably break on most RS..
… real risk here is that someone finds out that there is a key secret file in a RS, and they make an epub with a symlink to the file, and then exfiltrate the data.
… we just want to make sure RS devs know that this is a threat.
… in practice, people aren't doing this.

Dave Cramer: and we're already catching some of this. I tested this in an epub, and epubcheck threw an error.
… so even if we can't define these things, they don't match the signature of the things we do allow.

Matt Garrish: you could set the symlink as a data file to prevent it from being checked, but if someone is up to no good there isn't a guaranteed way to find it.

Dave Cramer: i could see it being done accidentally, like if you had an alias on your desktop and accidentally put it in your epub.
… i'm also okay with doing this even if we don't have an iron clad definition of what it is.

Proposed resolution: proceed as described in #2322 (comment) and then close the issue. (Ivan Herman)

Dave Cramer: +1.

Wendy Reid: +1.

Ivan Herman: +1.

Brady Duga: +1.

Masakazu Kitahara: +1.

Matthew Chan: +1.

Zheng Xu (徐征): +1.

Toshiaki Koike: +1.

Charles LaPierre: +1.

Resolution #1: proceed as described in #2322 (comment) and then close the issue.

@wareid wareid removed the Agenda+ Issues that should be discussed during the next working group call. label Jun 22, 2022
@mattgarrish mattgarrish added the EPUB33 Issues addressed in the EPUB 3.3 revision label Jul 2, 2022
@mattgarrish mattgarrish added the Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation label Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cat-Security Grouping label for all security related issues EPUB33 Issues addressed in the EPUB 3.3 revision Spec-EPUB3 The issue affects the core EPUB 3.3 Recommendation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants