Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web reality: Newline normalization in form payloads and in FormData objects #6247

Closed
andreubotella opened this issue Dec 24, 2020 · 4 comments · Fixed by #6287
Closed

Web reality: Newline normalization in form payloads and in FormData objects #6247

andreubotella opened this issue Dec 24, 2020 · 4 comments · Fixed by #6287

Comments

@andreubotella
Copy link
Member

andreubotella commented Dec 24, 2020

I previously opened whatwg/url#562 for this, but it soon became clear that it's not something limited to urlencoded.

There are differences between the newline normalization in forms in the different browsers and the behavior mandated by the spec. This is most apparent with filenames as serialized in the urlencoded and text/plain enctypes, since all browsers agree to normalize the newlines, but the spec would have them without normalization.

For some context about the spec's behavior, newline normalization in forms happens in the "append an entry" algorithm, which normalizes names and string values, but doesn't touch the filename or data of File values. This algorithm is called when constructing the entry list, except in the case of form-associated custom elements, for which it's called for string or File submission values but not for FormData submission values.

Additionally, step 7 in "constructing the entry list" algorithm fires a formdata event whose event object contains a FormData object whose entry list is the form's entry list, and that object is live (i.e., modifying it from JS modifies the form's entry list). Those modifications are not then normalized.

Since the urlencoded and text/plain enctypes only support file values, their serialization algorithms explicitly handle File values by replacing them with the filename, but they perform no further normalization.


Test results and discussion:

For discussing these results, it's useful to distinguish between two types of "sources" for form entries, which I'll be calling "String/File-sourced" and "FormData-sourced" (these are ad-hoc names), distinguished by whether those entries were added through the spec's "append an entry" algorithm. String/File-sourced entries are entries which come from:

  • <input type="text">, <input type="hidden">, etc. including <input type="file">
  • Form-associated custom elements whose submission value is a string or a File object.

FormData-sourced entries come from creating or modifying a FormData object from JS code and then serializing it as a form payload. As far as I can tell, that is only possible in the following ways:

  • The FormData object is set as the submission value of a form-associated custom element.
  • The FormData object is the formData attribute of FormDataEvent and is modified from JS.
  • A Request or Response object is constructed with the FormData object as its body.

Note that not all browsers implement all of the above: form-associated custom elements are only implemented in Chrome, Safari doesn't support the formdata event, Firefox in multipart/form-data replaces newlines in names and filenames with a space (see #3223 #6282). However, for each group of cases below, every browser supports at least one of the possible ways to test it.

"Normalized" here means following the normalization in the "append an entry" algorithm. When used for serializations, it means all names and values are serialized, including string values that come from filenames in the entry list.

  • new FormData(formElement) and FormDataEvent#formData:
    • String/File-sourced entries:
      • Spec, Chrome: names and string values are normalized
      • Firefox, Safari: unchanged
    • FormData-sourced entries: unchanged
  • application/x-www-form-urlencoded and text/plain serializations:
    • Spec:
      • String/File-sourced entries: names and values which derive from strings are normalized
      • FormData-sourced entries: unchanged
    • Chrome on application/x-www-form-urlencoded, Firefox, Safari: normalized
    • Chrome on text/plain:
      • String/File-sourced entries: normalized
      • FormData-sourced entries: unchanged except for values which derive from an original filename, which are normalized. This is almost certainly a bug.
  • multipart/form-data serialization:
    • String/File-sourced entries:
      • Spec, Chrome, Safari: names and string values are normalized, filenames are unchanged
      • Firefox: string values are normalized (names and filenames N/A)
    • FormData-sourced entries:
      • Spec, Chrome: unchanged
      • Firefox: string values are normalized (names and filenames N/A)
      • Safari: names and string values are normalized, filenames are unchanged

It seems to me like we should change the spec to have urlencoded and text/plain normalize everything, and the rest can stay as currently in the spec (but with wpt tests and bugs filed on browsers).

Note that for urlencoded at least, this normalization has to take place at some point before the serializer is invoked, since URLSearchParams objects interoperably don't normalize entries. This would also remove the need for the urlencoded and text/plain serializers to have to deal with files.

@whatwg/forms any thoughts?

@annevk
Copy link
Member

annevk commented Jan 4, 2021

I guess the one thing I might not do is normalize FormData-sourced entries only for urlencoded and text/plain. I'd rather we handle them consistently across types.

@andreubotella
Copy link
Member Author

I guess the one thing I might not do is normalize FormData-sourced entries only for urlencoded and text/plain. I'd rather we handle them consistently across types.

I suppose that would mean going the Safari route for multipart/form-data, rather than changing urlencoded and text/plain. That'd be fine by me.

I'd like to hear from other people in @whatwg/forms though.

andreubotella pushed a commit to andreubotella/html that referenced this issue Jan 13, 2021
When entries are added to a form's entry list through the "append an
entry" algorithm, their newlines are normalized, but entries can be
added to an entry list through other means. This change adds a final
newline normalization before serializing the form payload, since "append
an entry" cannot be changed because its results are observable through
the `FormData` object or through the `formdata` event.

This change additionally changes the input passed to the
`application/x-www-form-urlencoded` and `text/plain` serializes to be a
list of name-value pairs, where the values are strings rather than
`File` objects. This simplifies the serializer algorithms.

Closes whatwg#6247. Closes whatwg/url#562.
andreubotella pushed a commit to andreubotella/html that referenced this issue Jan 13, 2021
When entries are added to a form's entry list through the "append an
entry" algorithm, their newlines are normalized, but entries can be
added to an entry list through other means. This change adds a final
newline normalization before serializing the form payload, since "append
an entry" cannot be changed because its results are observable through
the `FormData` object or through the `formdata` event.

This change additionally changes the input passed to the
`application/x-www-form-urlencoded` and `text/plain` serializers to be a
list of name-value pairs, where the values are strings rather than
`File` objects. This simplifies the serializer algorithms.

Closes whatwg#6247. Closes whatwg/url#562.
domenic pushed a commit to web-platform-tests/wpt that referenced this issue Jan 13, 2021
See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jan 15, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Jan 16, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142

UltraBlame original commit: 4283b11039b1ee9cc93eb5289dcdfeb9c6d06972
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jan 19, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Jan 25, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142

UltraBlame original commit: b70b8b891ecd9535c03560c6a06c4a1972bcd51e
@mfreed7
Copy link
Contributor

mfreed7 commented Feb 26, 2021

Thanks for the great summary of the current state of the implementations and spec. Generally, it sounds like this'll be a good area to bring the implementations closer together. I'm a bit concerned about compat, but given the disjoint state of the implementations, it probably isn't a huge risk. But non-zero either - we've been bitten in the past by seemingly benign changes to form logic in Chromium. Do you have any thoughts on the risk here?

I've read through the attached issues, and this one a few times, and I think I understand what's being proposed. But just to be sure, spec-wise everything will stay roughly as is, with the exception of FormData-sourced entries for all serializations, where we will normalize everything but filenames. Is that right?

Are all of the WPT tests updated and landed for this proposed change? If not, I'd support landing them, to help the implementations see where we stand.

Also, you mentioned this:

application/x-www-form-urlencoded and text/plain serializations:

  • Chrome on text/plain:
    • FormData-sourced entries: unchanged except for values which derive from an original filename, which are normalized. This is almost certainly a bug.

Do you happen to know if there's a chromium bug filed for this one?

@andreubotella
Copy link
Member Author

andreubotella commented Feb 27, 2021

Thanks for the great summary of the current state of the implementations and spec. Generally, it sounds like this'll be a good area to bring the implementations closer together. I'm a bit concerned about compat, but given the disjoint state of the implementations, it probably isn't a huge risk. But non-zero either - we've been bitten in the past by seemingly benign changes to form logic in Chromium. Do you have any thoughts on the risk here?

I suspect most server-side frameworks and languages pass the form data without doing any newline normalization, so some sites might be affected by this. But considering that submitting forms with the FormData API is relatively rare and that the only thing (other than text/plain) that Chromium needs to change is to have it match Webkit, I doubt there's too much to worry about. After all, the current difference in newline behavior between String/File-sourced entries and FormData-sourced entries isn't well known at all, and it's more surprising than the alternative. But that of course doesn't rule out sites depending on it.

I've read through the attached issues, and this one a few times, and I think I understand what's being proposed. But just to be sure, spec-wise everything will stay roughly as is, with the exception of FormData-sourced entries for all serializations, where we will normalize everything but filenames. Is that right?

Everything but filenames in multipart/form-data serializations. And it additionally would normalize filenames in urlencoded and text/plain serializations, even in string/File-sourced entries, to make the spec match every implementation (see whatwg/url#562).

Are all of the WPT tests updated and landed for this proposed change? If not, I'd support landing them, to help the implementations see where we stand.

They haven't landed yet, but I think they're updated.

Also, you mentioned this:

application/x-www-form-urlencoded and text/plain serializations:

  • Chrome on text/plain:

    • FormData-sourced entries: unchanged except for values which derive from an original filename, which are normalized. This is almost certainly a bug.

Do you happen to know if there's a chromium bug filed for this one?

Other than https://bugs.chromium.org/p/chromium/issues/detail?id=1167095, no. Or at least there wasn't one when I filed it.

annevk pushed a commit that referenced this issue May 20, 2021
User agents normalize newlines when serializing form data to text/plain, application/x-www-form-urlencoded, and multipart/form-data. (This can be observed through FormData or the formdata event.)

This additionally changes the input passed to the application/x-www-form-urlencoded and text/plain serializers to be a
list of name-value pairs, where the values are always strings rather than potentially File objects.

Tests: web-platform-tests/wpt#26740.

Follow-up: #6624 & #6697.

Closes #6247. Helps with whatwg/url#562.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants