wai-app-static generates non-standard (unquoted) ETag headers... #815

vdukhovni · 2020-09-20T08:22:59Z

It also compares ETag values in a way that does not conform to the the specification.

The code in question just takes the base64-encoded bytestring as the "ETag" header value, but RFC7232, Section 2.3 requires quoted ETag values.

Quotes should be added around the default "strong" ETags after base64 encoding.
The need to add quotes should be documented as a requirement on custom ETagLookup methods provided by application developers.
When quoting custom ETag values, application developers need to prefix W/ if they choose to generate weak ETag values.

(If compatibility were not a concern, the ETagLookup type could be changed to have a sum-type return value with "weak" and "strong" variants, and then the quoting and W/ prefixing done in the library based on the etag variant).

Then, when comparing ETag values in the if-none-match case, the weak comparison criteria from RFC7242 Section 2.3.2 should be used to ignore any W/ prefix of an ETag of the form: W/"<visible-ascii-string-sans-quotes>".

Finally, I have (for POSIX system) a potential candidate for a much more efficient ETag that hashes just file metadata (device, inode, size, ctime seconds, ctime nanoseconds), that should be good enough for the vast majority of users. It is only somewhat weak on systems where the ctime resolution is only 1 second (nanoseconds always zero) and when files are rapidly cycled, perhaps multiple times within the same second, possibly replacing a file's content with content of the same size. Barring such exotic conditions, the below is much faster than a content hash.

It could be used as a weak Etag, by including in the hash the current time rounded to a multiple of the cache maxAge (if specified), which would ensure that the ETag changes every ~maxAge seconds.

module Etag (getEtag) where

import qualified Crypto.Hash as H
import qualified Data.ByteArray as BA
import qualified Data.ByteArray.Encoding as BA
import qualified Foreign.Marshal.Alloc as F
import qualified Foreign.Ptr as F
import qualified Foreign.Storable as F
import qualified System.IO.Error as E
import qualified System.Posix.Files as P
import Control.Monad (guard)
import Data.Bifunctor (second)
import Data.ByteString (ByteString)
import Data.Word (Word64)

(|.>) :: (a -> b) -> (b -> c) -> (a -> c)
(|.>) = flip (.)
infixl 1 |.>

getEtag :: FilePath -> IO (Maybe ByteString)
getEtag path = E.catchIOError
    do stat2Etag path
    do const $ pure Nothing

stat2Etag :: FilePath -> IO (Maybe ByteString)
stat2Etag path = do
    stat <- P.getFileStatus path
    let (sec, nsec) = timeParts $ P.statusChangeTimeHiRes stat
        meta = [ fromIntegral $ P.deviceID stat, fromIntegral $ P.fileID stat
               , fromIntegral $ P.fileSize stat, sec, nsec ]
    (guard (P.isRegularFile stat) >>) . pure <$> hashWords meta
  where
    hashWords :: [Word64] -> IO ByteString
    hashWords = hashMany (H.hashInit @H.SHA256)
        |.> fmap H.hashFinalize
        |.> fmap (BA.convertToBase BA.Base64URLUnpadded)

    timeParts = second (round . (* 1000000000)) . properFraction

    -- | Hash a list of storable items in some fixed order.
    hashMany ctx ss = go ss []
      where
        go (x:xs) acc = do
            F.alloca \ptr -> do
                F.poke ptr x
                let v = BA.MemView (F.castPtr ptr) (F.sizeOf x)
                go xs (v:acc)
        go [] acc = pure $! H.hashUpdates ctx acc

The text was updated successfully, but these errors were encountered:

snoyberg · 2020-09-20T17:06:16Z

Are you interested in opening up a PR with the change you're hoping to see?

…

On Sun, Sep 20, 2020, 11:23 AM Viktor Dukhovni ***@***.***> wrote: It also compares ETag values in a way that does not conform to the the specification. The code in question <https://github.com/yesodweb/wai/blob/master/wai-app-static/Network/Wai/Application/Static.hs#L155-L174> just takes the base64-encoded bytestring as the "ETag" header value, but RFC7232, Section 2.3 <https://tools.ietf.org/html/rfc7232#section-2.3> requires quoted ETag values. - Quotes should be added around the default "strong" ETags after base64 encoding. - The need to add quotes should be documented as a requirement on custom ETagLookup methods provided by application developers. - When quoting custom ETag values, application developers need to prefix W/ if they choose to generate *weak* ETag values. (If compatibility were not a concern, the ETagLookup type could be changed to have a sum-type return value with "weak" and "strong" variants, and then the quoting and W/ prefixing done in the library based on the etag variant). Then, when comparing ETag values in the if-none-match case, the *weak* comparison criteria from RFC7242 Section 2.3.2 <https://tools.ietf.org/html/rfc7232#section-2.3.2> should be used to ignore any W/ prefix of an ETag of the form: W/"<visible-ascii-string-sans-quotes>". Finally, I have (for POSIX system) a potential candidate for a much more efficient ETag that hashes just file metadata (device, inode, size, ctime seconds, ctime nanoseconds), that should be good enough for the vast majority of users. It is only somewhat weak on systems where the ctime resolution is only 1 second (nanoseconds always zero) and when files are rapidly cycled, perhaps multiple times within the same second, possibly replacing a file's content with content of the same size. Barring such exotic conditions, the below is much faster than a content hash. It could be used as a *weak* Etag, by including in the hash the current time rounded to a multiple of the cache maxAge (if specified), which would ensure that the ETag changes every ~maxAge seconds. module Etag (getEtag) where import qualified Crypto.Hash as Himport qualified Data.ByteArray as BAimport qualified Data.ByteArray.Encoding as BAimport qualified Foreign.Marshal.Alloc as Fimport qualified Foreign.Ptr as Fimport qualified Foreign.Storable as Fimport qualified System.IO.Error as Eimport qualified System.Posix.Files as Pimport Control.Monad (guard)import Data.Bifunctor (second)import Data.ByteString (ByteString)import Data.Word (Word64) (|.>) :: (a -> b) -> (b -> c) -> (a -> c)(|.>) = flip (.)infixl 1 |.> getEtag :: FilePath -> IO (Maybe ByteString) getEtag path = E.catchIOError do stat2Etag path do const $ pure Nothing stat2Etag :: FilePath -> IO (Maybe ByteString) stat2Etag path = do stat <- P.getFileStatus path let (sec, nsec) = timeParts $ P.statusChangeTimeHiRes stat meta = [ fromIntegral $ P.deviceID stat, fromIntegral $ P.fileID stat , fromIntegral $ P.fileSize stat, sec, nsec ] (guard (P.isRegularFile stat) >>) . pure <$> hashWords meta where hashWords :: [Word64] -> IO ByteString hashWords = hashMany (H.hashInit @H.SHA256) |.> fmap H.hashFinalize |.> fmap (BA.convertToBase BA.Base64URLUnpadded) timeParts = second (round . (* 1000000000)) . properFraction -- | Hash a list of storable items in some fixed order. hashMany ctx ss = go ss [] where go (x:xs) acc = do F.alloca \ptr -> do F.poke ptr x let v = BA.MemView (F.castPtr ptr) (F.sizeOf x) go xs (v:acc) go [] acc = pure $! H.hashUpdates ctx acc — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#815>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMCB7TQ5YLKWGICHF2PKLSGW3W7ANCNFSM4RTQHIKQ> .

vdukhovni · 2020-09-21T01:10:37Z

Are you interested in opening up a PR with the change you're hoping to see?
…

Sure, if there are no objections. But first, I would like feedback on whether that should be the slightly kludgy backwards-compatible approach, with quoting and "W/" prefixing left to each implementor of ETagLookup, or whether (major version bump) the return type should be changed to a sum type:

data EtagValue =
    StrongEtag ByteString |
    WeakEtag ByteString

with the quoting and "W/" prefixed done internally in wai.

In the backwards-compatible variant, the default ETagLookup is updated to add quotes, and documentation is added telling others to do the same if they choose to roll their own...

In the (I think cleaner) approach, the default implementation returns a StrongEtag and callers get to update their code to signal which type of ETag they are generating. Then the library does the right thing...

Doing the "more efficient" file-metadata etags is probably a separate PR if there's interest and agreement that these are useful.

snoyberg · 2020-09-21T02:31:59Z

I'm OK with the backwards-incompatible approach. For the more efficient etag: yes, it should be a separate PR, and ideally have a flag to control the behavior. I'm OK if the new behavior is the default though.

vdukhovni · 2020-09-21T04:23:33Z

I'm OK with the backwards-incompatible approach. For the more efficient etag: yes, it should be a separate PR, and ideally have a flag to control the behavior. I'm OK if the new behavior is the default though.

Looking more closely at the code and the specification, I see that there is not yet any handling of multi-valued "If-None-Match" headers, or the wildcard "*" value. So correct handling of "If-None-Match" will require parsing the header. Which brings me to my next question. What parser combinator libraries if any are acceptable/preferred dependencies in wai?

I did not see any parser combinators already in use in the library, and the parsing of some other headers seems to be hand-rolled ad-hoc splitting on commas and the like, which may be incorrect in the presence of quotes of various sorts. I was hoping to follow prior art, but that may not be possible.

Should parsing of "If-None-Match" also be hand-rolled, or should it use attoparsec (possibly its Zepto module, given the simple grammar) or similar?

snoyberg · 2020-09-21T04:38:56Z

I'd prefer not adding a new dependency in this library

vdukhovni · 2020-09-21T04:45:18Z

I'd prefer not adding a new dependency in this library

I think this means a best-effort hand-rolled parser, or perhaps an internal small combinator module, with just the essentials, specialised to parsing (ByteString) header values. Mostly it would just need to recognise a few forms of quoting, so that splitting on various delimiters does not happen inside quotes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wai-app-static generates non-standard (unquoted) ETag headers... #815

wai-app-static generates non-standard (unquoted) ETag headers... #815

vdukhovni commented Sep 20, 2020

snoyberg commented Sep 20, 2020 via email

vdukhovni commented Sep 21, 2020

snoyberg commented Sep 21, 2020

vdukhovni commented Sep 21, 2020

snoyberg commented Sep 21, 2020

vdukhovni commented Sep 21, 2020

wai-app-static generates non-standard (unquoted) ETag headers... #815

wai-app-static generates non-standard (unquoted) ETag headers... #815

Comments

vdukhovni commented Sep 20, 2020

snoyberg commented Sep 20, 2020 via email

vdukhovni commented Sep 21, 2020

snoyberg commented Sep 21, 2020

vdukhovni commented Sep 21, 2020

snoyberg commented Sep 21, 2020

vdukhovni commented Sep 21, 2020