Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wai-app-static generates non-standard (unquoted) ETag headers... #815

Open
vdukhovni opened this issue Sep 20, 2020 · 6 comments
Open

wai-app-static generates non-standard (unquoted) ETag headers... #815

vdukhovni opened this issue Sep 20, 2020 · 6 comments

Comments

@vdukhovni
Copy link

It also compares ETag values in a way that does not conform to the the specification.

The code in question just takes the base64-encoded bytestring as the "ETag" header value, but RFC7232, Section 2.3 requires quoted ETag values.

  • Quotes should be added around the default "strong" ETags after base64 encoding.
  • The need to add quotes should be documented as a requirement on custom ETagLookup methods provided by application developers.
  • When quoting custom ETag values, application developers need to prefix W/ if they choose to generate weak ETag values.

(If compatibility were not a concern, the ETagLookup type could be changed to have a sum-type return value with "weak" and "strong" variants, and then the quoting and W/ prefixing done in the library based on the etag variant).

Then, when comparing ETag values in the if-none-match case, the weak comparison criteria from RFC7242 Section 2.3.2 should be used to ignore any W/ prefix of an ETag of the form: W/"<visible-ascii-string-sans-quotes>".

Finally, I have (for POSIX system) a potential candidate for a much more efficient ETag that hashes just file metadata (device, inode, size, ctime seconds, ctime nanoseconds), that should be good enough for the vast majority of users. It is only somewhat weak on systems where the ctime resolution is only 1 second (nanoseconds always zero) and when files are rapidly cycled, perhaps multiple times within the same second, possibly replacing a file's content with content of the same size. Barring such exotic conditions, the below is much faster than a content hash.

It could be used as a weak Etag, by including in the hash the current time rounded to a multiple of the cache maxAge (if specified), which would ensure that the ETag changes every ~maxAge seconds.

module Etag (getEtag) where

import qualified Crypto.Hash as H
import qualified Data.ByteArray as BA
import qualified Data.ByteArray.Encoding as BA
import qualified Foreign.Marshal.Alloc as F
import qualified Foreign.Ptr as F
import qualified Foreign.Storable as F
import qualified System.IO.Error as E
import qualified System.Posix.Files as P
import Control.Monad (guard)
import Data.Bifunctor (second)
import Data.ByteString (ByteString)
import Data.Word (Word64)

(|.>) :: (a -> b) -> (b -> c) -> (a -> c)
(|.>) = flip (.)
infixl 1 |.>

getEtag :: FilePath -> IO (Maybe ByteString)
getEtag path = E.catchIOError
    do stat2Etag path
    do const $ pure Nothing

stat2Etag :: FilePath -> IO (Maybe ByteString)
stat2Etag path = do
    stat <- P.getFileStatus path
    let (sec, nsec) = timeParts $ P.statusChangeTimeHiRes stat
        meta = [ fromIntegral $ P.deviceID stat, fromIntegral $ P.fileID stat
               , fromIntegral $ P.fileSize stat, sec, nsec ]
    (guard (P.isRegularFile stat) >>) . pure <$> hashWords meta
  where
    hashWords :: [Word64] -> IO ByteString
    hashWords = hashMany (H.hashInit @H.SHA256)
        |.> fmap H.hashFinalize
        |.> fmap (BA.convertToBase BA.Base64URLUnpadded)

    timeParts = second (round . (* 1000000000)) . properFraction

    -- | Hash a list of storable items in some fixed order.
    hashMany ctx ss = go ss []
      where
        go (x:xs) acc = do
            F.alloca \ptr -> do
                F.poke ptr x
                let v = BA.MemView (F.castPtr ptr) (F.sizeOf x)
                go xs (v:acc)
        go [] acc = pure $! H.hashUpdates ctx acc
@snoyberg
Copy link
Member

snoyberg commented Sep 20, 2020 via email

@vdukhovni
Copy link
Author

Are you interested in opening up a PR with the change you're hoping to see?

Sure, if there are no objections. But first, I would like feedback on whether that should be the slightly kludgy backwards-compatible approach, with quoting and "W/" prefixing left to each implementor of ETagLookup, or whether (major version bump) the return type should be changed to a sum type:

data EtagValue =
    StrongEtag ByteString |
    WeakEtag ByteString

with the quoting and "W/" prefixed done internally in wai.

In the backwards-compatible variant, the default ETagLookup is updated to add quotes, and documentation is added telling others to do the same if they choose to roll their own...

In the (I think cleaner) approach, the default implementation returns a StrongEtag and callers get to update their code to signal which type of ETag they are generating. Then the library does the right thing...

Doing the "more efficient" file-metadata etags is probably a separate PR if there's interest and agreement that these are useful.

@snoyberg
Copy link
Member

I'm OK with the backwards-incompatible approach. For the more efficient etag: yes, it should be a separate PR, and ideally have a flag to control the behavior. I'm OK if the new behavior is the default though.

@vdukhovni
Copy link
Author

I'm OK with the backwards-incompatible approach. For the more efficient etag: yes, it should be a separate PR, and ideally have a flag to control the behavior. I'm OK if the new behavior is the default though.

Looking more closely at the code and the specification, I see that there is not yet any handling of multi-valued "If-None-Match" headers, or the wildcard "*" value. So correct handling of "If-None-Match" will require parsing the header. Which brings me to my next question. What parser combinator libraries if any are acceptable/preferred dependencies in wai?

I did not see any parser combinators already in use in the library, and the parsing of some other headers seems to be hand-rolled ad-hoc splitting on commas and the like, which may be incorrect in the presence of quotes of various sorts. I was hoping to follow prior art, but that may not be possible.

Should parsing of "If-None-Match" also be hand-rolled, or should it use attoparsec (possibly its Zepto module, given the simple grammar) or similar?

@snoyberg
Copy link
Member

I'd prefer not adding a new dependency in this library

@vdukhovni
Copy link
Author

I'd prefer not adding a new dependency in this library

I think this means a best-effort hand-rolled parser, or perhaps an internal small combinator module, with just the essentials, specialised to parsing (ByteString) header values. Mostly it would just need to recognise a few forms of quoting, so that splitting on various delimiters does not happen inside quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants