Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Req exits with :badarg when redirecting to invalid URL #327

Closed
bfolkens opened this issue Mar 21, 2024 · 5 comments
Closed

Req exits with :badarg when redirecting to invalid URL #327

bfolkens opened this issue Mar 21, 2024 · 5 comments

Comments

@bfolkens
Copy link
Contributor

bfolkens commented Mar 21, 2024

When redirect encounters an invalid URL, Req will exit with :badarg due to Finch exiting with the same. The following error is an example:

iex> Req.new(method: :get, url: "http://mob.muchong.com/html/201502/8544958.html") |> Req.Request.run()

 [debug] redirecting to //muchong.com/html/201502/8544958.html
** (exit) :badarg
    (finch 0.18.0) lib/finch/http1/pool.ex:96: Finch.HTTP1.Pool.request/6
    (finch 0.18.0) lib/finch.ex:472: anonymous fn/4 in Finch.request/3
    (req 0.4.13) lib/req/steps.ex:979: Req.Steps.run_finch_request/3
    (req 0.4.13) lib/req/steps.ex:837: Req.Steps.run_finch/4
    (req 0.4.13) lib/req/request.ex:993: Req.Request.run_request/1
    (req 0.4.13) lib/req/request.ex:938: Req.Request.run/1
    (req 0.4.13) lib/req/steps.ex:1977: Req.Steps.redirect/1
    (req 0.4.13) lib/req/request.ex:1010: anonymous fn/2 in Req.Request.run_response/2
    (elixir 1.15.6) lib/enum.ex:4830: Enumerable.List.reduce/3
    (elixir 1.15.6) lib/enum.ex:2564: Enum.reduce_while/3
    (req 0.4.13) lib/req/request.ex:938: Req.Request.run/1

The request above fails after it redirects to //muchong.com/html/201502/8544958.html. As it turns out, After URI.merge/2 in Req.Steps.build_redirect_request/2 - the URL port may become nil if merged with relative scheme ("//somehost.com").

URI.merge(URI.parse("https://good.url/"), URI.parse("//test")).port == nil

In other words, the following triggers the failure:

Req.new(method: :get, url: %URI{scheme: "http", host: "muchong.com", path: "/html/201502/8544958.html"}) |> Req.Request.run()

And the following succeeds:

Req.new(method: :get, url: %URI{scheme: "http", port: 80, host: "muchong.com", path: "/html/201502/8544958.html"}) |> Req.Request.run()

Initially I thought to just monkeypatch URI.merge/2, however, since there are other cases where it seems we're encountering invalid URLs, what I did internally for our use case was to add a URL validation step around line 2005 after the merge, and then halt the pipe with an error if there is an invalid URL.

Another thought would be if we should just add a URL validation "step" instead of putting this into the redirect step explicitly.

Happy to submit a patch if that's acceptible for the library...

@wojtekmach
Copy link
Owner

wojtekmach commented Mar 21, 2024

Thank you for the report.

I'm not sure if it's standardised but I believe skipping the scheme, //example.com as opposed to http[s]://example.com, implies the scheme that is currently used. I think I maybe saw it in HTML referencing assets like CSSes and images.

In other words, if it's standardised (ideally in https://www.rfc-editor.org/rfc/rfc9110.html) that redirects without scheme should imply current scheme, we should fix it in Req. For example, if scheme and port is nil but host is set, fill them in.

Could you investigate?

@bfolkens
Copy link
Contributor Author

bfolkens commented Mar 21, 2024

The function URI.merge/2 correctly merges the URI scheme, but leaves port nil which is what causes the :badarg downstream.

iex(1)> URI.merge(URI.parse("https://good.url/"), URI.parse("//relative"))
%URI{
  scheme: "https",
  authority: "relative",
  userinfo: nil,
  host: "relative",
  port: nil,
  path: nil,
  query: nil,
  fragment: nil
}

In other words, if it's standardised (ideally in https://www.rfc-editor.org/rfc/rfc9110.html) that redirects without scheme should imply current scheme, we should fix it in Req. For example, if scheme and port is nil but host is set, fill them in.

Yes, RFC 9110 15.4 Redirection 3xx step 1 indicates that a relative Location header should be resolved relative to the original request URI.

However, RFC 3986 5.2 Relative Resolution doesn't seem to prescribe any steps for including the port in the merge steps. It looks like that's covered in RFC 3986 6.2.3 Scheme-Based Normalization, where the spec indicates an empty port should be assigned that of the default one for the scheme (as we would expect).

Since the URI.merge/2 documentation suggests that they're trying to follow RFC 3986 5.2 Relative Resolution, it seems that we would then need some sort of "normalize URI" step prior to exiting Req.Steps.build_redirect_step/2.

While something of the above will solve the nil port problem, we still experience issues with redirects where URI's contain characters that fail :inet_parse.visible_string/1, so I think we should also include some sort of halt and exception instead of allowing the :bardarg from Finch once we solve the above issue.

@bfolkens
Copy link
Contributor Author

Something like the following solves the nil port problem:

diff --git a/lib/req/steps.ex b/lib/req/steps.ex
index 712a8a5..222c28c 100644
--- a/lib/req/steps.ex
+++ b/lib/req/steps.ex
@@ -1964,7 +1964,10 @@ defmodule Req.Steps do
           request.options[:redirect_trusted]
       end

-    location_url = URI.merge(request.url, URI.parse(location))
+    location_url =
+      request.url
+      |> URI.merge(URI.parse(location))
+      |> normalize_redirect_uri()

     request
     # assume put_params step already run so remove :params option so it's not applied again
@@ -1980,6 +1983,10 @@ defmodule Req.Steps do
     Logger.log(level, ["redirecting to ", location])
   end

+  defp normalize_redirect_uri(%URI{scheme: "http", port: nil} = uri), do: %URI{uri | port: 80}
+  defp normalize_redirect_uri(%URI{scheme: "https", port: nil} = uri), do: %URI{uri | port: 443}
+  defp normalize_redirect_uri(%URI{} = uri), do: uri
+
   # https://www.rfc-editor.org/rfc/rfc9110#name-301-moved-permanently and 302:
   #
   # > Note: For historical reasons, a user agent MAY change the request method from

@bfolkens
Copy link
Contributor Author

This seems related sneako/finch#186

@wojtekmach
Copy link
Owner

Fixed in #328

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants