Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non Recognizable formats #9

Closed
EphDoering opened this issue Jun 1, 2015 · 16 comments
Closed

Non Recognizable formats #9

EphDoering opened this issue Jun 1, 2015 · 16 comments

Comments

@EphDoering
Copy link

Currently there is no mechanism for a web app to use the clipboard to interface with a native application that uses a format not recognized by the browser.

Many (most?) native applications use custom formats that allow them to copy and paste rich featured content either within the application or across to other applications (think MS Office figures or the case I'm interested in MathCad formulas)

It's likely that many of these formats will never be mime-typed, especially for small applications and even if they were, browsers would have to be able to recognize the format in order to present it to the web app. Given the shear number of native applications and their custom formats, this problem should be passed on to the web developer who wants to develop an interface with one application, and thus only has to understand that application to write the code.

Web applications could submit their own mappings for use on that page. For example, Mathcad uses 13 different registered formats on windows. So if a web app wanted to use 2 of those formats it could call a function:

registerClipboardFormats({
   "application/vnd.mathsoft.equation+xml":[
        "XMCD Format", //Windows handle
        "MCD Format", //old Windows handle with the same format
        "XMCD_FORMAT", //Linux handle
        "com.mathsoft.xml"], // OSX handle
   "application/vnd.mathsoft.equation.meta+xml":[
        "Provenance Data", //windows handle
        "PROVENANCE_DATA", //Linux handle
        "com.mathsoft.pd"] // OSX handle
});
//returns:
//{
//   "application/vnd.mathsoft.equation+xml":true,
//   "application/vnd.mathsoft.equation.meta+xml":true
//}

Then whenever any of the OS handles are on the clipboard during a paste event those objects could be exposed via the mime-type listed. Additionally, during a copy event if one of the newly registered mime-types is set, then the browser could look to see if any of the OS handles given are registered, and those that are could be set to the data provided.

This custom registration would allow small applications to have their datatypes be recognized in a cross platform way. The only additional coding required to support more operating systems would be to add the handle name (a five minute job). Browsers could even collect and use this page specific registered data to augment the default registered datatypes if a valid registered mime-type is used if they so desired.

An additional benefit of this custom registration is that the currently undefined behavior about registering custom formats could be sidestepped by only setting formats that have already been registered by other applications. The registration function could even provide feedback as to which formats found an already registered format and which were effectively rejected. This would allow developers to notify the user that they might need to copy something with the native app and then retry. Then the web app could try to register the formats again after the native app had done the registration. This would prevent every website from registering their own formats on the native clipboard potentially maxing out the number of registration slots on windows.

@EphDoering
Copy link
Author

Also related: https://www.w3.org/Bugs/Public/show_bug.cgi?id=21699

This discusses exposing what formats are available to write and talked about potential fingerprinting issues. Exposing what formats are possible to write/read might therefore be bad.

@evanw
Copy link

evanw commented Jun 3, 2015

If that's a concern, then it makes this a good target for a permission in my opinion. Our users want this feature so much they're willing to install something natively to bypass the browser, so I'm sure they'll grant us permission to use their clipboard if the browser prompts them to. We're currently considering offering a native download that runs a web server over localhost that serves the real clipboard to our domain just to get around the lack of a real clipboard API in the browser. Please fix the web so we don't have to do hacks like that.

@hallvors
Copy link
Contributor

hallvors commented Jun 5, 2015

The registerClipboardFormats() proposal has a pretty nice simplicity to it. I guess the UA could also have UI that let users map MIME types to description strings manually, so that users on platforms this web app doesn't "support" can work around the app's impoliteness..? Or install extensions that define common formats.

Some questions:

  1. As written, we assume that no string which identifies one type of content on OS A but another type of content on OS B exists. (We're not saying for example registerClipboardFormats('text/html': { 'windows':'CF_HTML', 'mac': ['HTML Text', 'HTML', 'text/html'], 'linux': ...). Is this assumption safe enough, or might it become a significant problem that your app says "MCD Format" is a MathCad type but on some platform you aren't familiar with it's defining a "Movie Content Description" data type?
  2. What happens if several sites/tabs call this method with different values? Overwrite, append?

@EphDoering
Copy link
Author

  1. I think each page should have a fresh default registered type list (perhaps augmented by an extension or UI), and then each page can register the formats they want to use, and they will only be active for that page. This prevents pages from overwriting each other causing unexpected behavior. One remaining question would be would be if overwriting default types were possible. This may allow a site to improve default functionality, but that seems unlikely, and may interfere with default user interactions causing undesirable consequences. Thus I think the default formats should not be overwritten, and I don't think a page would need the ability to overwrite itself so I think overwriting could be prohibited.

  2. This has been bothering me for days.... I was trying to avoid using OS names. In order to be cross browser, the names would need to be written into the spec, which may hinder new OSs being supported. Requiring OS names would also prevent happy coincidence support: a native application using the same handle on different OSs where the developer didn't specifically search out the duplicate would be supported anyway. I believe the risk of this happen is very low given (2.) as any one page will likely only be dealing with a handful of native applications, and the likelihood of a handful of apps using conflicting handles across OSs seems low to me. That said as long as the conflicts weren't circular, it would be possible to define priorities based on the order that handles and mime types were added that could resolve simple conflicts.

Somewhat unrelated: I think there's no advantage of registering multiple formats at once, also for string based formats it would be nice to get strings vs files, so it should probably be registerClipboardFormat('mime/type', 'string/file',['osFormatName1','osFormatName2', ...])

@hallvors
Copy link
Contributor

@FrederickDoering I'm not sure what you mean by "strings vs files" - a separate MIME type for files? File extensions?

@hallvors
Copy link
Contributor

What happens "under the hood" here is basically:

  • The UA has a registry mapping the supported native clipboard types to the MIME types used for the clipboard API. For example, on Windows one of the entries is probably 'text/html' -> 'CF_HTML'
  • When registerClipboardFormat() is called, for each MIME type (key) in the dictionary, it will check with the OS if the native clipboard knows about each format description on the list.
  • The first matching format description (if any) will be added to the mapping table

On reading from the clipboard, all data with the descriptions given in our mapping table will be exposed to JS with the corresponding MIME type.

On writing to the clipboard, all data with a MIME type listed in the mapping table will be placed on the clipboard with the corresponding native description.

Right?

I'm not sure if implementors are happy about this. There's some pushback against allowing web contents to write binary data to the clipboard for security reasons - if you have vulnerabilities in local software (which is of course a when, not an if) an attacker can trick you into pasting some exploit payload into the vulnerable software.

@EphDoering
Copy link
Author

@hallvors
Regarding file vs. string. A data transfer item has a "kind" of either "string" or "file". There needs to be some way of specifying which kind will be returned by a paste operation. Additionally, the encoding of strings would also need to be specified. The way I solved this in the chrome extension I made to demonstrate proof of concept (currently only working on windows) is to specify the encoding on a per osHandle basis so the syntax is:

registerClipboardFormat(String mimeType, Object osHandles);

where osHandles is a dictionary that uses the os specific handle names as keys and then an encoding enum as values where the encodings currently supported are:

ASCII, UTF_8, UFT_16, and BINARY

The binary encoding always returns files, while the other three return strings (as standard UTF-16 JS strings. The encoding is just for how the string is stored on the OS clipboard)

So for my Mathcad usage I called:

registerClipboardFormat("application/vnd.mathsoft.equation+xml",{
    "XMCD Format":registerClipboardFormat.formats.UTF_16,
    "com.mathsoft.xml":registerClipboardFormat.formats.UTF_16
});

@EphDoering
Copy link
Author

@hallvors
Yes, your explanation covers the general idea. Details like conflicts, ordering, and scope of map are irrelevant to the native application safety.

While I can see that there would indeed be potential for a native app to be exploitable through pasting arbitrary data to it, I don't see that excluding "binary" data would grant much if any additional protection.

The data on the windows clipboard is all just bytes. So writing a "string" to the clipboard still just results in the bytes of the string being written to the clipboard. Unless there's a filter that doesn't allow certain characters like null, or backspace, the ability to write binary data grants no additional capabilities then was already possible.

That said, I hadn't considered the possibility of exploiting native apps in my security considerations. So I'll need to think about that some more, but here are some of my initial thoughts:

I believe current clipboard support already allows arbitrary data to be put on the clipboard under the text/plain format. However, applications are expecting there to be arbitrary data on that format and thus will hopefully already be hardened against attacks through that channel. The same cannot be said for formats that are used primarily by one vendor. Fro example the windows clipboard handle Biff12 is used by Excel, and Excel might only be expecting valid formatting when pasting from Biff12 so if excel wsa programmed poorly it might be possible to place something on Biff12 so then when it's pasted into excel, you can do something malicious.

While this exploit is already available to any other native app on the computer, it's much harder to get a malicious native app then to copy something from a malicious (or just insecure and therefor unintentionally malicious) website and paste it into excel (or any other exploitable native app).

Thus allowing writing to the clipboard to arbitrary formats increases the exposure of user.

I think this is a good argument for requiring permissions for this capability.

Do you have a reference for the "pushback" you mentioned so I can inform myself further on this matter?

@hallvors
Copy link
Contributor

@FrederickDoering For "pushback" see https://lists.w3.org/Archives/Public/public-webapps/2015AprJun/0819.html and later discussion in that thread. I posted about your proposal on the public-webapps mailing list and (as expected) got feedback about the security problems it would likely cause: https://lists.w3.org/Archives/Public/public-webapps/2015JulSep/0211.html

@EphDoering
Copy link
Author

@hallvors Thanks.

I'm disappointed but I can't say I'm surprised that windows allows a form of code execution from CBF_DATA. I can see how the ability to write to the clipboard can be dangerous even for formats that one might initially assume are benign like RTF. However, I believe this proposal would work for reading from the native clipboard and writing to a private, in browser only, clipboard. That way, pages that want to co-operate and use the same mime-type can register formats and have the clipboard work as expected. And pages could even receive data from native applications, but would only be able to write to the native clipboard a very limited set of things (text/plain, text/html, etc.).

I think to eliminate any security concerns with writing to the clipboard, for each format, the following steps would have to be taken:

  1. Verify that the there are no known intentional or unintentional ways of exploiting that format assuming the payload is properly formed. (very difficult)
  2. every time that format is written to, verify that the payload is properly formed. (not as difficult, though possibly still awful depending on the format)

I believe the first step has already been taken for the "required" mime-types, but the second step has not yet been implemented and there was a suggestion to remove some of the mime-types from the required list as a solution to eliminate exploits from malformed payloads.

In conclusion, I don't see a security risk from reading arbitrary formats from the native clipboard, and for writing to the native clipboard I see the need to verify the payloads are well formed, which cannot be done for arbitrary formats, but should be done for all formats that support writing.

@Arcnor
Copy link

Arcnor commented Nov 16, 2015

Was there any decision on this matter not reflected here, or did the question just die? I found myself today wanting to interact with some native app (drawing apps, copying data in com.adobe.pdf format) and hit this wall. In my case, being able to read & write will be nice, but just being able to handle reading will be better than nothing.

Is there currently implemented on any browser that will allow me to register my interest for a particular format, or any other way of getting the binary data? (right now, I'm getting an "image/png" file that I can't do anything with)

@hallvors
Copy link
Contributor

@Arcnor the state of this issue is "we'd love to enable this, but it seems we can't do it safely". Unfortunately. ;(

@EphDoering
Copy link
Author

@hallvors all of the pushback was for concerns based on writing to the clipboard... There were no mentions of any safety concerns for reading the clipboard. Reading the clipboard would still be a useful ability even without the ability to write to the clipboard. Should I start a separate issue just for read access to arbitrary formats?

@hallvors
Copy link
Contributor

Please :)

@dominiccooney
Copy link

Did that issue get filed?

@EphDoering
Copy link
Author

I did not. Feel free to open one; I got pulled onto other projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants