Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing multi form data requires CRLF termination #2411

Closed
imRentable opened this issue Aug 29, 2023 · 11 comments · Fixed by #2862
Closed

Parsing multi form data requires CRLF termination #2411

imRentable opened this issue Aug 29, 2023 · 11 comments · Fixed by #2862
Labels
💎 Bounty bug Something isn't working 💰 Rewarded

Comments

@imRentable
Copy link

imRentable commented Aug 29, 2023

Describe the bug
When a client sends a multi form request whose body is terminated by CR instead of CRLF (after the closing boundary), the parsing logic may not parse the body completely. I.e. some form fields are missing.

To Reproduce
Steps to reproduce the behaviour:
Execute the following test suite with the latest ZIO HTTP version (3.0.0-RC2):

object DemoSuite extends ZIOSpecDefault {
  val CR = '\r'
  val CRLF = "\r\n"
  override def spec: Spec[TestEnvironment with Scope, Any] = suite("Demo")(
    test("encoding") {
      val body = Chunk.fromArray(
        s"""|--(((AaB03x)))${CR}
            |Content-Disposition: form-data; name="hocon-data"${CR}
            |Content-Type: text/plain${CR}
            |${CR}
            |foos: []${CR}
            |--(((AaB03x)))${CR}
            |Content-Disposition: form-data; name="json-data"${CR}
            |Content-Type: text/plain${CR}
            |${CR}
            |{ "bars": [] }${CR}
            |--(((AaB03x)))--${CR}""".stripMargin.getBytes(),
      )

      val form = Form(
        FormField.textField("hocon-data", "foos: []", MediaType.text.`plain`),
        FormField.textField("json-data", """{ "bars": [] }""", MediaType.text.`plain`),
      )

      val boundary = Boundary("(((AaB03x)))")

      val actualByteStream = form.multipartBytes(boundary)

      for {
        form2       <- Form.fromMultipartBytes(body)
        actualBytes <- actualByteStream.runCollect
      } yield assertTrue(
        actualBytes == body,
        form2 == form,
      )
    }
  )
}

It will only yield the first entry hocon-data.

Expected behaviour
It yields both form data entries hocon-data as well as json-data.

Additional context

A client library that will send their multi form data bodies this way is STTP. Example:

import sttp.client3.httpclient.zio.{HttpClientZioBackend, send}
import sttp.client3.{Identity, RequestT, UriContext, basicRequest, multipart}
import zio.Scope
import zio.test.{Spec, TestEnvironment, ZIOSpecDefault, assertTrue}

object SttpDemo extends ZIOSpecDefault {
  override def spec: Spec[TestEnvironment with Scope, Any] = test("STTP Demo") {
    val request: RequestT[Identity, Either[String, String], Any] = basicRequest
      .post(uri"http://www.example.com")
      .multipartBody(
        multipart("hocon-data", """foos: []"""),
        multipart("json-data", """{ "bars": [] }""")
      )
    send(request).map(response => assertTrue(response.is200))
  }.provide(HttpClientZioBackend.layer())
}

Unfortunately, I have not been able to find precise specification on the matter.

@imRentable imRentable added the bug Something isn't working label Aug 29, 2023
@fsvehla
Copy link

fsvehla commented Aug 29, 2023

It’s RFC 1867

See ¶ 5.9

As with all MIME transmissions, CRLF is used as the separator for
lines in a POST of the data in multipart/form-data.

@imRentable
Copy link
Author

@fsvehla I am unsure whether this includes the very end of this kind of bodies since there is already a dedicated terminator for it (closing boundary). The much newer RFC 7578 states:

(...) the parts are delimited with a
boundary delimiter, constructed using CRLF, "--", and the value of
the "boundary" parameter.

This does not necessarily mean to me that the end of a body is a part that needs to be separated.

@seakayone
Copy link
Contributor

seakayone commented Apr 17, 2024

I am currently in progress of adding multipart support for tapir with zio-http as the backend and ran into this issue.

Let me also quote from the RFC 2046 (Multipart in MIME) referenced by RFC 7578 (Returning Values from Forms: multipart/form-data):

The encapsulation boundary following the last body part is a distinguished delimiter that indicates that no further body parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line:

 `--gc0p4Jq0M2Yt08jU534c0p--` 

There appears to be room for additional information prior to the first encapsulation boundary and following the final boundary. These areas should generally be left blank, and implementations should ignore anything that appears before the first boundary or after the last one.

This means that the last boundary is terminated with two more hyphens -- rather than a CRLF and also that everything after that should be ignored. Albeit this is for MIME but nevertheless referenced by RFC 7578.

The test fixtures in zio-http (zio.http.forms.Fixtures) contain both, the terminating two hyphens at the end of the last boundary as well as a CRLF.

IMHO the CRLF after the last boundary should be ignored and not be mandatory.

@seakayone
Copy link
Contributor

Interestingly enough in v3.0.0-RC6 the CRLF after the closing boundary is not mandatory if the the last part is a binary/streaming ContentType, e.g. "image/png" or "application/octet-stream".

@jdegoes
Copy link
Member

jdegoes commented Jun 4, 2024

/bounty $250

Copy link

algora-pbc bot commented Jun 4, 2024

💎 $250 bounty • ZIO

Steps to solve:

  1. Start working: Comment /attempt #2411 with your implementation plan
  2. Submit work: Create a pull request including /claim #2411 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

  • 🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to zio/zio-http!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @seakayone Jun 10, 2024, 8:12:03 AM #2862
🟢 @fliiiix #2862
🟢 @grigala #2862

@seakayone
Copy link
Contributor

seakayone commented Jun 10, 2024

/attempt #2411

I would like to finalize #2862

The idea is to detect not only the line breaks but also the closing boundary in the FormStateBuffer by remembering whether or not the buffer has seen the boundary bytes.

The PR is a first draft and shows the general approach should work.

Copy link

algora-pbc bot commented Jun 10, 2024

💡 @seakayone, @fliiiix and @grigala submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Copy link

algora-pbc bot commented Jun 12, 2024

🎉🎈 @seakayone has been awarded $83.34! 🎈🎊

Copy link

algora-pbc bot commented Jun 12, 2024

🎉🎈 @grigala has been awarded $83.33! 🎈🎊

Copy link

algora-pbc bot commented Jun 12, 2024

🎉🎈 @fliiiix has been awarded $83.33! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty bug Something isn't working 💰 Rewarded
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants