Delete content-encoding header after decompressing #190

tanguilp · 2023-06-19T16:41:17Z

When a response is decompressed, we should remove the content-encoding header because other downstream steps may find themselves confused by an erroneous %Req.Response{} struct where:

the body is uncompressed
but the content-encoding header say otherwise

wojtekmach · 2023-06-19T18:12:44Z

Thank you for the PR. This is the first time a built-in step changes headers so I want to be super careful before setting that precedent. I think it's fine conceptually, we have steps that change response.body after all.

Do you know if any other http clients do this?

Does decode_body remove content-encoding?

tanguilp · 2023-06-19T18:53:42Z

Tesla doesn't seem to do this. I'm not a Rubyist but I've found a library that does so.

And you might have noticed that we missed to recalculate the content-length! So yes, changing headers seems to be normal.

I'll submit a PR for updating content-length.

I'm trying to implement something similar to https://github.com/tanguilp/tesla_http_cache for Req, and this issue with the content-encoding not matching the body encoding somehow confuses http_cache, but I'm not sure why. Going to investigate.

Does decode_body remove content-encoding?
I didn't notice it, but format doesn't use content-encoding but content-type, so that's a different issue.

I'd be cautious with decompressing compressed content by default: https://fly.io/phoenix-files/can-phoenix-safely-use-the-zip-module/

tanguilp · 2023-06-20T10:12:26Z

I can confirm the issue:

one first request, the response is returned, uncompressed and then cached by custom ReqHTTPCache I'm currently working one. Body is uncompressed, and content-encoding is set to gzip
one second request, the cached response is returned by ReqHTTPCache and the response steps start being executed
&Req.Steps.decompress_body/1 is called, and tries decompressing an uncompressed body because content-encoding is still set to gzip

What I could do is to halt processing with Req.Request.halt/1. However, it might not return the expected result (for instance, when response is returned from cache, the json payload is not decoded).

For information here is the struct with steps:

%Req.Request{
  method: :get,
  url: URI.parse(""),
  headers: [],
  body: nil,
  options: %{http_cache: %{store: :http_cache_store_process}},
  registered_options: MapSet.new([:finch, :location_trusted, :path_params,
   :pool_timeout, :raw, :user_agent, :cache, :form, :redirect_log_level,
   :max_redirects, :http_errors, :range, :decode_json, :follow_redirects,
   :output, :base_url, :compress_body, :compressed, :retry, :plug, :retry_delay,
   :retry_log_level, :finch_request, :decode_body, :json, :http_cache,
   :max_retries, :receive_timeout, :cache_dir, :params, :connect_options,
   :extract, :unix_socket, :auth]),
  halted: false,
  adapter: &Req.Steps.run_finch/1,
  request_steps: [
    put_user_agent: &Req.Steps.put_user_agent/1,
    compressed: &Req.Steps.compressed/1,
    encode_body: &Req.Steps.encode_body/1,
    put_base_url: &Req.Steps.put_base_url/1,
    auth: &Req.Steps.auth/1,
    put_params: &Req.Steps.put_params/1,
    put_path_params: &Req.Steps.put_path_params/1,
    put_range: &Req.Steps.put_range/1,
    cache: &Req.Steps.cache/1,
    put_plug: &Req.Steps.put_plug/1,
    compress_body: &Req.Steps.compress_body/1,
    read_from_http_cache: #Function<0.104066892/1 in ReqHTTPCache."-fun.read_from_http_cache/1-">
  ],
  response_steps: [
    retry: &Req.Steps.retry/1,
    follow_redirects: &Req.Steps.follow_redirects/1, 
    decompress_body: &Req.Steps.decompress_body/1,
    decode_body: &Req.Steps.decode_body/1,
    handle_http_errors: &Req.Steps.handle_http_errors/1,
    output: &Req.Steps.output/1,
    cache_response: #Function<1.104066892/1 in ReqHTTPCache."-fun.cache_response/1-">
  ],
  error_steps: [retry: &Req.Steps.retry/1],
  private: %{}
}

I'll update this PR later.

wojtekmach

Thank you! Please un-draft it and mention the behaviour in docs. :)

tanguilp · 2023-06-23T13:48:49Z

I've rebased this branch from the one to be merged by #192, which is why I'll wait for it to be merged before opening this PR.

wojtekmach · 2023-06-26T07:19:20Z

@sneako @ericmj I have a favour to ask, do you think this (and #192) make sense, i.e. some steps changing some headers? I think in these particular cases, adjusting content-length and removing content-encoding, makes sense. But I wouldn't change content-type or any other headers at the moment. Just double-checking. And no rush. :)

sneako · 2023-06-26T08:50:20Z

Hey! I can see the reasoning behind this idea, but it looks like curl does not do this. and IIRC we have been following their conventions when in doubt.

curl https://api.github.com/repos/wojtekmach/req -v
Returns the uncompressed body with content-length: 5340 and no content-encoding

curl -H 'accept-encoding: gzip' https://api.github.com/repos/wojtekmach/req -v
Returns the compressed body with content-length: 1366 and content-encoding: gzip

curl https://api.github.com/repos/wojtekmach/req -v --compressed
Returns the decompressed body with content-length: 1366 and content-encoding: gzip

yordis · 2023-08-10T21:27:35Z

Hey @teamon, just wanted to give you an update on the situation at Tesla PR 606

I've already created a PR, but I need your help with updating the headers since the middleware modifies the response body. It's really important that the data integration aligns with the permutation.

I know @sneako mentioned using curl, but we have to keep in mind that it's not on the same level of composition as Req and Tesla. We shouldn't compromise on data integrity.

wojtekmach · 2023-08-18T08:57:05Z

I couldn't find definite reference for this behaviour (if someone does, please chime in!) but empirically this behaviour is also found in Fetch API:

$ deno

> let res = await fetch("http://httpbin.org/gzip"); [res.headers, await res.json()]
[
  Headers {
  "access-control-allow-credentials": "true",
  "access-control-allow-origin": "*",
  connection: "keep-alive",
  "content-type": "application/json",
  date: "Fri, 18 Aug 2023 08:31:38 GMT",
  server: "gunicorn/19.9.0"
},
  {
    gzipped: true,
    headers: {
      Accept: "*/*",
      "Accept-Encoding": "gzip, br",
      "Accept-Language": "*",
      Host: "httpbin.org",
      "User-Agent": "Deno/1.36.1",
      "X-Amzn-Trace-Id": "Root=1-64df2c69-7b747ad735dcd8e0696ade8f"
    },
    method: "GET",
    origin: "89.73.7.111"
  }
]

And in reqwest:

fn main() -> Result<(), reqwest::Error> {
    let res = reqwest::blocking::Client::builder()
        .gzip(true)
        .build()?
        .get("http://httpbin.org/gzip")
        .send()?;

    eprintln!("Response: {:?} {}", res.version(), res.status());
    eprintln!("Headers: {:#?}\n", res.headers());
    println!("{}", res.text()?);
    Ok(())
}

$ cargo run --features gzip,blocking --example simple http://httpbin.org/gzip

Response: HTTP/1.1 200 OK
Headers: {
    "date": "Fri, 18 Aug 2023 08:31:09 GMT",
    "content-type": "application/json",
    "access-control-allow-origin": "*",
    "connection": "keep-alive",
    "server": "gunicorn/19.9.0",
    "access-control-allow-credentials": "true",
}

{
  "gzipped": true,
  "headers": {
    "Accept": "*/*",
    "Accept-Encoding": "gzip",
    "Host": "httpbin.org",
    "X-Amzn-Trace-Id": "Root=1-64df2c4b-69c3bc432e4a14781b0d8a33"
  },
  "method": "GET",
  "origin": "89.73.7.111"
}

it's even documented as such in reqwest gzip() method:

Enable auto gzip decompression by checking the Content-Encoding response header.

If auto gzip decompression is turned on:

When sending a request and if the request’s headers do not already contain an Accept-Encoding and Range values, the Accept-Encoding header is set to gzip. The request body is not automatically compressed.

When receiving a response, if its headers contain a Content-Encoding value of gzip, both Content-Encoding and Content-Length are removed from the headers’ set. The response body is automatically decompressed.

The last bit in the docs is very interesting. Both Fetch and reqwest remove content-length header. I was able to find some reference for this, from RFC 9112 § 6.3:

An intermediary that chooses to forward the message MUST first remove the received Content-Length field and process the Transfer-Encoding (as described below) prior to forwarding the message downstream.

It's about Transfer-Encoding, not Content-encoding, but I'd expect the semantics to be similar but it of course needs double-checking. Obviously this is relevant for #192.

@tanguilp I'm keen on moving forward with this. Apologies for delay but it might have been worth it, some references mentioned here might be useful to others. Please un-draft this!

tanguilp · 2023-08-18T10:33:50Z

@tanguilp I'm keen on moving forward with this. Apologies for delay but it might have been worth it, some references mentioned here might be useful to others. Please un-draft this!

No worries, I’m myself on holidays. Will take a look next week. Interesting references, thanks 🙏

wojtekmach · 2023-08-18T10:36:51Z

Yup no rush, enjoy your holiday!

wojtekmach · 2023-08-25T09:27:59Z

@tanguilp I'm clearing the backlog and have cherry-picked this branch with minor modifications as ac45a34. I've also added 1947528 on top. Thank you!

tanguilp · 2023-08-25T09:53:04Z

Thanks! Funny thing is we were working on it at the exact same moment this morning, up to the same function names. Imagine my surprise when I tried to rebase :D

Thanks again and keep up the great work!

wojtekmach · 2023-08-25T10:06:23Z

oh no haha, sorry about that!

wojtekmach force-pushed the main branch from de858a2 to 778d58a Compare June 19, 2023 17:46

tanguilp marked this pull request as draft June 20, 2023 10:12

wojtekmach approved these changes Jun 23, 2023

View reviewed changes

tanguilp and others added 5 commits June 23, 2023 14:51

Add failing test

6d3115d

Recalculate content-length after decompressing the body

40525ee

Apply suggestions from code review

54a8795

Update Req.Steps.decompress_body/2 doc

38c7cdc

Fix typo

0ce7e1e

tanguilp mentioned this pull request Jun 23, 2023

Set content-length when decompressing #192

Merged

tanguilp added 5 commits June 23, 2023 15:14

Add failing tests

50a64af

Simplify test

a0ee9b8

Implement Req.Response.delete_header/2

ccd2d9e

Delete content-encoding after decompression

484943d

Update doc

2fe1380

tanguilp force-pushed the fix_decompress_and_delete_content-encoding branch from 639c035 to 2fe1380 Compare June 23, 2023 13:45

tanguilp marked this pull request as ready for review June 26, 2023 08:38

tanguilp marked this pull request as draft June 26, 2023 08:38

yordis mentioned this pull request Aug 10, 2023

feat: improve decompression middleware elixir-tesla/tesla#606

Merged

wojtekmach closed this Aug 25, 2023

wojtekmach mentioned this pull request Sep 3, 2023

decompress_body: Do we want to recalculate content-length? #239

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete content-encoding header after decompressing #190

Delete content-encoding header after decompressing #190

tanguilp commented Jun 19, 2023

wojtekmach commented Jun 19, 2023

tanguilp commented Jun 19, 2023

tanguilp commented Jun 20, 2023

wojtekmach left a comment

tanguilp commented Jun 23, 2023

wojtekmach commented Jun 26, 2023

sneako commented Jun 26, 2023

yordis commented Aug 10, 2023

wojtekmach commented Aug 18, 2023

tanguilp commented Aug 18, 2023

wojtekmach commented Aug 18, 2023

wojtekmach commented Aug 25, 2023

tanguilp commented Aug 25, 2023

wojtekmach commented Aug 25, 2023

Delete content-encoding header after decompressing #190

Delete content-encoding header after decompressing #190

Conversation

tanguilp commented Jun 19, 2023

wojtekmach commented Jun 19, 2023

tanguilp commented Jun 19, 2023

tanguilp commented Jun 20, 2023

wojtekmach left a comment

Choose a reason for hiding this comment

tanguilp commented Jun 23, 2023

wojtekmach commented Jun 26, 2023

sneako commented Jun 26, 2023

yordis commented Aug 10, 2023

wojtekmach commented Aug 18, 2023

tanguilp commented Aug 18, 2023

wojtekmach commented Aug 18, 2023

wojtekmach commented Aug 25, 2023

tanguilp commented Aug 25, 2023

wojtekmach commented Aug 25, 2023