Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I reference an ESI URL that includes an encoded HTML entity with Varnish? #2034

Closed
pkaeding opened this issue Aug 8, 2016 · 7 comments

Comments

@pkaeding
Copy link

pkaeding commented Aug 8, 2016

I am using Varnish 4.1.2, and trying to include ESI content. Sometimes, the URL for the included content might include encoded HTML entities, such as ' (').

For example, I have the following in the HTML returned by my app for the outer shell:

<esi:include src="/esi/map/alice&#x27;s%20house"/>

Expected Behavior

This should result in an ESI request for a path (/esi/map/alice's%20house) with 3 segments and no query string, which when fully decoded, result in:

  • esi
  • map
  • alice's house

The apostrophe in the third segment is HTML encoded because it is being transmitted in an HTML document. The space is URL encoded because it is part of a URL.

Current Behavior

However, Varnish seems to not decode the HTML encoded entity before making the ESI request. It sends a request for /esi/map/alice&#x27;s%20house, which is a bad request, because & is not legal in the path or a URL.

Steps to Reproduce

  1. Include the following in a response that should be interpreted for ESI:
    <esi:include src="/esi/map/alice&#x27;s%20house"/>
  1. Watch varnishlog with something like the following, to obeserve what requests are made to fulfill the ESI:
varnishlog -i ReqMethod,ReqURL,ReqStatus,ReqHeader,RespHeader,BereqURL,BerespStatus,BereqHeader,BerespHeader

Context

This is preventing me from using ESI to serve content that uses user-submitted data for one path segment of the URL.

Your Environment

  • Version used: 4.1.2
  • Operating System and version: Alpine Linux 3.3 (in Docker)
@fgsch
Copy link
Member

fgsch commented Aug 8, 2016

I'm not sure we should expand html entities. Why not urlencode the ' ?

@slimhazard
Copy link
Contributor

fgsch, +1. HTML entities do no form legal URLs, and wouldn't work any better in A HREF or IMG links. URL encoding as with the %20 in the example is the right solution.

More to the point, Varnish shouldn't slow down everything else to run a decoder over what is best understood as an error in the src link.

pkaeding, the best solution would be for your app to decode HTML entities and re-encode them with URL encoding before passing along the string.

@nigoroll
Copy link
Member

nigoroll commented Aug 8, 2016

Because https://www.w3.org/TR/esi-lang is not particularly clear about this, I'll seek clarification and we'll follow up then

@pkaeding
Copy link
Author

That's unfortunate, but I understand the pragmatism of the performance argument. (I would argue that HTML entities do work in A HREF or IMG links when the user agent is set up to properly interpret HTML. I gather that Varnish is not really interpreting the HTML, rather just looking for the specific ESI tokens that it cares about. This seems like a reasonable trade-off, all things considered.)

Thanks for the clarification! I will modify my app to work around this.

@mnot
Copy link

mnot commented Aug 11, 2016

ESI is an XML-based language, so &-encoded character escapes are allowed there syntactically.

However, semantically that attribute is a URI, not an IRI. Putting non-ascii content in that field isn't specified. Really, it should create some sort of error condition, but the outcome would be the same -- "don't do that."

Understand that the ESI spec is getting close to two decades old, was created before internationalisation was even partly understood in Internet protocols, and hasn't been standardised or publicly maintained. So it's not surprising that these sorts of ambiguities pop up.

@nigoroll
Copy link
Member

Thank you @mnot
@pkaeding Mark is the Editor of the ESI spec so his answer should be the most authoritative we can get.
To summarize: Use url-encoding

@hermunn
Copy link
Member

hermunn commented Aug 12, 2016

Backport review: Not a bug. The ESI parsing will stay as it is, so there is nothing to backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants