-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wai-app-static generates non-standard (unquoted) ETag headers... #815
Comments
Are you interested in opening up a PR with the change you're hoping to see?
…On Sun, Sep 20, 2020, 11:23 AM Viktor Dukhovni ***@***.***> wrote:
It also compares ETag values in a way that does not conform to the the
specification.
The code in question
<https://github.com/yesodweb/wai/blob/master/wai-app-static/Network/Wai/Application/Static.hs#L155-L174>
just takes the base64-encoded bytestring as the "ETag" header value, but RFC7232,
Section 2.3 <https://tools.ietf.org/html/rfc7232#section-2.3> requires
quoted ETag values.
- Quotes should be added around the default "strong" ETags after
base64 encoding.
- The need to add quotes should be documented as a requirement on
custom ETagLookup methods provided by application developers.
- When quoting custom ETag values, application developers need to
prefix W/ if they choose to generate *weak* ETag values.
(If compatibility were not a concern, the ETagLookup type could be
changed to have a sum-type return value with "weak" and "strong" variants,
and then the quoting and W/ prefixing done in the library based on the
etag variant).
Then, when comparing ETag values in the if-none-match case, the *weak*
comparison criteria from RFC7242 Section 2.3.2
<https://tools.ietf.org/html/rfc7232#section-2.3.2> should be used to
ignore any W/ prefix of an ETag of the form:
W/"<visible-ascii-string-sans-quotes>".
Finally, I have (for POSIX system) a potential candidate for a much more
efficient ETag that hashes just file metadata (device, inode, size, ctime
seconds, ctime nanoseconds), that should be good enough for the vast
majority of users. It is only somewhat weak on systems where the ctime
resolution is only 1 second (nanoseconds always zero) and when files are
rapidly cycled, perhaps multiple times within the same second, possibly
replacing a file's content with content of the same size. Barring such
exotic conditions, the below is much faster than a content hash.
It could be used as a *weak* Etag, by including in the hash the current
time rounded to a multiple of the cache maxAge (if specified), which
would ensure that the ETag changes every ~maxAge seconds.
module Etag (getEtag) where
import qualified Crypto.Hash as Himport qualified Data.ByteArray as BAimport qualified Data.ByteArray.Encoding as BAimport qualified Foreign.Marshal.Alloc as Fimport qualified Foreign.Ptr as Fimport qualified Foreign.Storable as Fimport qualified System.IO.Error as Eimport qualified System.Posix.Files as Pimport Control.Monad (guard)import Data.Bifunctor (second)import Data.ByteString (ByteString)import Data.Word (Word64)
(|.>) :: (a -> b) -> (b -> c) -> (a -> c)(|.>) = flip (.)infixl 1 |.>
getEtag :: FilePath -> IO (Maybe ByteString)
getEtag path = E.catchIOError
do stat2Etag path
do const $ pure Nothing
stat2Etag :: FilePath -> IO (Maybe ByteString)
stat2Etag path = do
stat <- P.getFileStatus path
let (sec, nsec) = timeParts $ P.statusChangeTimeHiRes stat
meta = [ fromIntegral $ P.deviceID stat, fromIntegral $ P.fileID stat
, fromIntegral $ P.fileSize stat, sec, nsec ]
(guard (P.isRegularFile stat) >>) . pure <$> hashWords meta
where
hashWords :: [Word64] -> IO ByteString
hashWords = hashMany (H.hashInit @H.SHA256)
|.> fmap H.hashFinalize
|.> fmap (BA.convertToBase BA.Base64URLUnpadded)
timeParts = second (round . (* 1000000000)) . properFraction
-- | Hash a list of storable items in some fixed order.
hashMany ctx ss = go ss []
where
go (x:xs) acc = do
F.alloca \ptr -> do
F.poke ptr x
let v = BA.MemView (F.castPtr ptr) (F.sizeOf x)
go xs (v:acc)
go [] acc = pure $! H.hashUpdates ctx acc
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#815>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMCB7TQ5YLKWGICHF2PKLSGW3W7ANCNFSM4RTQHIKQ>
.
|
Sure, if there are no objections. But first, I would like feedback on whether that should be the slightly kludgy backwards-compatible approach, with quoting and "W/" prefixing left to each implementor of data EtagValue =
StrongEtag ByteString |
WeakEtag ByteString with the quoting and "W/" prefixed done internally in In the backwards-compatible variant, the default In the (I think cleaner) approach, the default implementation returns a Doing the "more efficient" file-metadata etags is probably a separate PR if there's interest and agreement that these are useful. |
I'm OK with the backwards-incompatible approach. For the more efficient etag: yes, it should be a separate PR, and ideally have a flag to control the behavior. I'm OK if the new behavior is the default though. |
Looking more closely at the code and the specification, I see that there is not yet any handling of multi-valued "If-None-Match" headers, or the wildcard "*" value. So correct handling of "If-None-Match" will require parsing the header. Which brings me to my next question. What parser combinator libraries if any are acceptable/preferred dependencies in I did not see any parser combinators already in use in the library, and the parsing of some other headers seems to be hand-rolled ad-hoc splitting on commas and the like, which may be incorrect in the presence of quotes of various sorts. I was hoping to follow prior art, but that may not be possible. Should parsing of "If-None-Match" also be hand-rolled, or should it use |
I'd prefer not adding a new dependency in this library |
I think this means a best-effort hand-rolled parser, or perhaps an internal small combinator module, with just the essentials, specialised to parsing (ByteString) header values. Mostly it would just need to recognise a few forms of quoting, so that splitting on various delimiters does not happen inside quotes. |
It also compares
ETag
values in a way that does not conform to the the specification.The code in question just takes the base64-encoded bytestring as the "ETag" header value, but RFC7232, Section 2.3 requires quoted
ETag
values.ETags
after base64 encoding.ETagLookup
methods provided by application developers.W/
if they choose to generate weak ETag values.(If compatibility were not a concern, the
ETagLookup
type could be changed to have a sum-type return value with "weak" and "strong" variants, and then the quoting andW/
prefixing done in the library based on the etag variant).Then, when comparing ETag values in the
if-none-match
case, the weak comparison criteria from RFC7242 Section 2.3.2 should be used to ignore anyW/
prefix of an ETag of the form:W/"<visible-ascii-string-sans-quotes>"
.Finally, I have (for POSIX system) a potential candidate for a much more efficient ETag that hashes just file metadata (device, inode, size, ctime seconds, ctime nanoseconds), that should be good enough for the vast majority of users. It is only somewhat weak on systems where the
ctime
resolution is only 1 second (nanoseconds always zero) and when files are rapidly cycled, perhaps multiple times within the same second, possibly replacing a file's content with content of the same size. Barring such exotic conditions, the below is much faster than a content hash.It could be used as a weak Etag, by including in the hash the current time rounded to a multiple of the cache
maxAge
(if specified), which would ensure that theETag
changes every ~maxAge
seconds.The text was updated successfully, but these errors were encountered: