Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anti replay #1005

Closed
wants to merge 6 commits into from
Closed

Anti replay #1005

wants to merge 6 commits into from

Conversation

ekr
Copy link
Contributor

@ekr ekr commented May 4, 2017

No description provided.

ekr added 4 commits April 30, 2017 16:22
ticket is associated with a different PSK. This provides somewhat
increased security in cases where you have multiple PSKs for the
same connection and one PSK is compromised.

The motivation here is that in cases where the server maintains a
session database rather than self-encrypted tickets, the server might
delete tickets as they are used,. This change provides FS for
connections which have been used, even if there are other outstanding
tickets in the session cache associated with the same original
connection.
Colm MacCarthaigh. Specifically:

- Describe both one time tickets and client hello storage
  ("strike register") mechanisms and SHOULD-level require
  people to do them.

- Provide a security considerations section describing the
  threats.
@ekr
Copy link
Contributor Author

ekr commented May 4, 2017

@colmmacc please review.

time. The server can determine its view of the age of the ticket by
subtracting the the time the ticket was issued from the current
time. If the client and server clocks were running at the same rate,
the client's view of would be shorter than the actual time elapsed on

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something missing after client's view of...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops "ticket_age"

described in {{stateless-anti-replay}}.
See {{replay-0rtt}} for more information on the limitations
of these mechanisms.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that clients and application protocols must assume in their security profile (and in-terms of what they are willing to send via 0-RTT) that servers are only implementing stateless-anti-replay? (When the "server" spans multiple clusters with non-trivial latency between them, stateless-anti-replay is the only one that works and it has the weakest guarantees.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is mentioned below, but repeating it here as well may be worthwhile.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to use an instance of Client Hello Recording per cluster in a distributed setup, effectively reducing the # of possible replay for a "Large Number" to 1 per cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the three alternatives in the SHOULD all equal?

Servers need not permit 0-RTT at all, but those which do SHOULD implement either Single-Use Tickets {{single-use-tickets}}, Client Hello Recording {{client-hello-recording}}, or the stateless mechanism described in {{stateless-anti-replay}}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a WG decision ultimately, but IMO they are not. the first two clearly are stronger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go further: the third method doesn't work - it doesn't prevent replays. It permits thousands to billions of replays, depending on the amount of bandwidth and hosts available, and it doesn't mitigate several of the attacks. It should be taken out - it's insecure.

this enables a variety of attacks via side channels such
as cache timing or measuring the speed of cryptographic
operations {{Mac17}}.

Copy link

@enygren enygren May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say "idempotent and side-effect free" rather than just "idempotent"? DELETE and PUT are idempotent but do have side-effects and without additional application layer controls an attacker doing 0-RTT replay could reorder them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to side-effect free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I would talk about "actions" rather than requests. The things that the server does in response to receiving 0-RTT are what will be exploited.

Side-effect free is useful, but forbidding that doesn't really cover it. In @enygren's example, the side effects are relevant, but the primary effect (creation/update of a resource vs. removal) is what we're really concerned with. The only way to ensure that this is perfectly safe is to use the "safe" definition in HTTP - that is the request does nothing but generate a response. And even then, it's rare that such a request is ever free from side effects or side channels.

This risks us defining something that is very HTTP-centric. I would prefer that we instead say that idempotency is desirable for the actions that the server takes, but that idempotency could be insufficient. That is more or less what the text here is getting at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this a bit, but note that I'm not sure that any anti-replay mechanism we have considered will handle this in the face of sufficient client and server complicity.

Consider the following case:

  • Client sends CH + Early data
  • Attacker injects SH + RST
  • Client retries with a fresh connection (and a fresh CH)
  • Server processes the data
  • Attacker replays CH + Early Data

Ugh

server nodes, it may be hard to achieve high rates of PSK and and
0-RTT success when compared with self-encrypted tickets which do not
require consistent server-side storage for basic functionality but
only for 0-RTT anti-replay.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps replace but only for 0-RTT anti-replay with , but may require such storage for 0-RTT anti-replay or something else that makes the sense more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be some more clarification that 0-RTT success and PSK success are partially independent, and that tickets can still be used for PSK even if the single-use property cannot be guaranteed; that is, PSK can succeed even in cases where 0-RTT must be rejected for safety. (Unless I misunderstand?)

{{stateless-anti-replay}} only prevents replay outside the
time window.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something like this which covers the fundamental requirement and responsibilities:

"The onus is on clients not to send messages in 0-RTT data which are not safe to have replayed and which they would not be willing to retry across multiple 1-RTT connections. The onus is on servers to protect themselves against attacks employing 0-RTT data replication."

(or "___ have responsibility to" instead of "the onus is on"?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty obvious, but worth stating, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, worth stating again. Maybe also that the application profile should tell the client to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to an earlier location

described in {{stateless-anti-replay}}.
See {{replay-0rtt}} for more information on the limitations
of these mechanisms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to use an instance of Client Hello Recording per cluster in a distributed setup, effectively reducing the # of possible replay for a "Large Number" to 1 per cluster.


2. If the ClientHello matches an existing ClientHello, then
abort the handshake using an "illegal_parameter" alert
(this should never happen in a functional system).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more efficient implementations can use some hashing/bloom filter rather than storing the entire client hello. This will have false-positives, in which case zero RTT data should just be rejected, not the connection aborted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh hipster crypto to the rescue.

Efficient storage isn't so much the problem as global synchronization within reasonable time frames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure the entire client hello can be stored, but why not do it more efficiently?

I don't think this state is global (ie one client hello cache per cluster).

(this should never happen in a functional system).

3. Otherwise, store the ClientHello during the window
and accept 0-RTT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be wise to explicitly recommend storing the PSK binder rather than the entire ClientHello. This has the benefit of essentially being a compact token cryptographically tied to the 0-RTT key (also preventing someone from polluting the replace cache with random data).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ClientHello needs to be valid, then polluting the cache is as trivial as just creating new and different ClientHello values. In a way, the binder is just a way of having the other side calculate your hash for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kyle is right that you need to validate the binder, because it changes the cost of polluting the cache to the cost of getting a new PSK (and you can use PSK-specific filtering to blacklist bad actors).

Maybe it's obvious to others, but we also can't use a hash of the packet because if the CH contains two PSKs, then the attacker can corrupt the second binder without detection and potentially pollute the cache. So, I think you want either CH.Random or the binder.

Copy link
Contributor

@colmmacc colmmacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<8

target: https://github.com/tlswg/tls13-spec/issues/1001
author:
-
ins: C. MacCarthaigh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's "MacCárthaigh", but maybe RFCs have to be ascii. And now my first comment gets to be super vain! Oh man.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, ASCII only. Feel free to supply some other flattening :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost able to put people's names in RFCs, but the road is a long one (for reasons that I won't burden you with).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the 8th fallacy of naming! (Not all names are representable in Unicode.) ((Number probably wrong; I made it up.))

all 0-RTT data and for PSK usage when PSK is used without DH.

Because this mechanism requires sharing the session database between
server nodes, it may be hard to achieve high rates of PSK and and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor mistake: "and and"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"between server nodes in environments with multiple servers acting as endpoints for the same service"

this can be estimated by measuring the time between sending
the Finished message and receiving the first message in the
client's second flight, or potentially using information
from the operating system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lines 3600 - 3604 are probably impractical. If folks do want to use STEK-encrypted tickets for global resumption, then the RTT to one location will be very different than another. Even absent that, RTTs can vary quite a lot for Mobile users. As an implementor, I'd just use a global tolerance value (like 500ms or something).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to just say the server should store one thing:

  • The time that the server generated the session ticket, offset by an estimate of the round trip time between client and server.

That also makes it more clear that only information in the ticket can be used for these calculations (any RTT information in the resumption handshake can not be trusted).

client sent the ClientHello as:

~~~~
creation time + (client's view - RTT estimate/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the division by two is necessary. The ticket gets delayed by half an RTT on the way to the client in the first place, and half an RTT on the way back to the server. So it nets out to one RTT of difference. Means we also needn't worry about any asymmetry between the two directions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that initially, but I believe that's wrong, because we're interested in the client's claimed sending time.

Consider the case where clocks are globally synchronized and we have a 200 ms RTT.

  • We send NST at 0
  • The client receives NST at 100ms
  • The client waits 1000ms and then sends CH (at 1100)
  • The server receives CH at 1200

Now, if the client had put the absolute time in CH, it would have been 1100, but it puts in relative time so that's 1000. When we add 1/2 RTT, we get 1100. If we were to add RTT, we would get 1200, which is wrong, because that's when the server got it.

This is different from below where we are interested in the mismatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right! Wait a second, no. I'm still wrong. But you're not totally right either. If we want to get very pedantic about it (pedantic on a crypto spec? no!) , all we know is that the client is some portion of the RTT behind. We don't know that it's 1/2. The RTT might be asymmetric. Imagine it's 20ms server -> client, but 180ms client to server.

  • We send NST at 0
  • The client receives NST at 20ms
  • The client waits 1000ms and then sends CH (at 1020)
  • The server receives at 1200

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, totally. It's just the best we can do.

Remember that we're not trusting this value, we're just using it to have the best chance of getting the time within the window we are saving CH for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Well, we are sort of trusting this, because we are using it to distinguish between hard fail and forced 1-RTT. OTOH, the attacker can always force us into that posture by just delaying the packet until its out of window, so I don't think that it's an issue. But it might be easiest to just use "issue time + obfuscated ticket age"....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ekr , the comments are so deep I got lost on which section we're in. I thought this was about the stateless anti-replay.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be that you always want to force 1-RTT and never hard-fail on detecting replays. There may be some legitimate scenarios to receive a replay. For example, TCP FastOpen (or QUIC) combined with 0RTT and packet duplication in the network. Hard-fail could result in some weird race conditions here. Having hard-fail (fatal alert, I assume?) as distinct from forcing 1-RTT could also just give an attacker more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP FO is still TCP - a duplicate packet will be rejected by the TCP state machine and shouldn't make it as far as TLS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with Nygren about not giving the attacker more information -- that is, don't hard-fail and just fall back to 1-RTT [unless you're under attack and need to shed load]. The 1-RTT will fail for the attacker's replays, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm persuaded by knekrit's argument
  2. MT, cute trick, but it doesn't work if you want the window to be on arrival time, so only for the stateless technique, not strike registers.
  3. enygren: I was more ambivalent about whether to abort in this round


3. Otherwise, store the ClientHello during the window
and accept 0-RTT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be adding something here ala ...

It is also critical to the sure that the record of ClientHellos that led to accepted 0-RTT sections from a given window is complete, before accepting any new 0-RTT sections for that same window. For example, if the system recording ClientHellos crashes with no durable record of the ClientHellos previously accepted, then the system needs to wait at least one full window before accepting any ClientHellos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably not going to fly. The window is large and waiting that long would kill all the benefits that 0-RTT provides. I would instead recognize that challenges exist in synchronizing state across participating nodes. See Erik Nygren's comments on the list that amount to basically "not gonna happen", which I agree with.

It's fine to recommend this design, but the need to have globally consistent state is a massive hurdle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The window here is just the clock skew tolerance window (ie 10s of seconds). Waiting 1 window before starting to accept 0-RTT data sounds very reasonable to me.

I believe the intent of these is to have 1 of these client hello record caches per cluster, rather than a global state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my wording confused things. All I was referring to is that when you restart a strike register with a clean slate, you need to wait the window period of time before accepting any new entries. The reason is because of this race:

T1: Strike register accepts key K
T2: Strike register crashes, loses all memory of key K
T3: Strike register restarts
T4: Strike register accepts K again

To avoid that, the register needs to have a pause on start-up. Or it can record everything durably, but that's very slow. The register can still respond with microseconds during ordinary operation.
Clearly, I need better wording.

weaker anti-replay defense because of the difficulty reliably
storing and retrieving the received ClientHello messages.

### Stateless Time-Based Anti-Replay {#stateless-anti-replay}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stateless mitigation is pointless, as it doesn't bound the number of replays, but some notes anyway ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Limited" should probably be in the subsection heading.

handshake. Clock skew distributions are not
symmetric, so the optimal tradeoff may involve an asymmetric range
of permissible mismatch values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would suggest adding something that says:

Note that while stateless anti-replay can bound over how long in time a packet may be replayed, the total amount of replays tolerated is bounded by bandwidth and system capacity. This can be thousands to billions of replays in real-world settings.

And I'd argue for adding this too:

Stateless anti-replay SHOULD NOT be used in environments without strong assurance of application and system behavior and MUST NOT be used in environments that must interoperate with third-party systems and applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the clarification but not the normative requirement.

this enables a variety of attacks via side channels such
as cache timing or measuring the speed of cryptographic
operations {{Mac17}}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to side-effect free.

described in {{stateless-anti-replay}}.
See {{replay-0rtt}} for more information on the limitations
of these mechanisms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would argue for something stronger:

"0-RTT server implementations that must interoperate with third party systems and applications MUST implement a robust anti-replay mechanism".

My reasoning here is that these a CDN or TLS accelerator that enables 0-RTT without robust anti-replay, will break other downstream systems. (For example upstream 0-RTT leading to throttle exhaustion down stream). I want clear and strong language so that there can be no ambiguity when a CVE is requested against the upstream component. It's not ok imo to break a basic assumption about the internet like that.

Copy link
Contributor

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have preferred to see the nonce in a different PR.

duplicates. However, recording all ClientHellos causes state to grow
without bound, so in practice the server must instead record
ClientHellos within a given time window based on the
"obfuscated_ticket_age" value provided by the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead say "Recording all ClientHellos causes state to grow without bound. A server can instead record ClientHellos within a given time window and use the "obfuscated_ticket_age" to ensure that tickets aren't reused outside that window."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MT's version is better, yes.

client sent the ClientHello as:

~~~~
creation time + (client's view - RTT estimate/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's half an RTT for sure, but I'm not clear on why you would want to bucket based on when the client creates the ClientHello as opposed to when you receive it. For any given ticket, the RTT value is a constant, so all you are doing is adding work.

BTW, there's a simpler model for handling ticket ages in general. The server stores the value now()+RTT-ticket_age_add in the ticket (in ekr's example, this is 200-taa). Then when the server receives the ticket, it extracts that, adds the obfuscated elapsed time value (1000+taa) and compares that to the current time (1200) with whatever allowance for slop you want.


2. If the ClientHello matches an existing ClientHello, then
abort the handshake using an "illegal_parameter" alert
(this should never happen in a functional system).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh hipster crypto to the rescue.

Efficient storage isn't so much the problem as global synchronization within reasonable time frames.

(this should never happen in a functional system).

3. Otherwise, store the ClientHello during the window
and accept 0-RTT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ClientHello needs to be valid, then polluting the cache is as trivial as just creating new and different ClientHello values. In a way, the binder is just a way of having the other side calculate your hash for you.


3. Otherwise, store the ClientHello during the window
and accept 0-RTT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably not going to fly. The window is large and waiting that long would kill all the benefits that 0-RTT provides. I would instead recognize that challenges exist in synchronizing state across participating nodes. See Erik Nygren's comments on the list that amount to basically "not gonna happen", which I agree with.

It's fine to recommend this design, but the need to have globally consistent state is a massive hurdle.

this enables a variety of attacks via side channels such
as cache timing or measuring the speed of cryptographic
operations {{Mac17}}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I would talk about "actions" rather than requests. The things that the server does in response to receiving 0-RTT are what will be exploited.

Side-effect free is useful, but forbidding that doesn't really cover it. In @enygren's example, the side effects are relevant, but the primary effect (creation/update of a resource vs. removal) is what we're really concerned with. The only way to ensure that this is perfectly safe is to use the "safe" definition in HTTP - that is the request does nothing but generate a response. And even then, it's rare that such a request is ever free from side effects or side channels.

This risks us defining something that is very HTTP-centric. I would prefer that we instead say that idempotency is desirable for the actions that the server takes, but that idempotency could be insufficient. That is more or less what the text here is getting at.

fall back to 1-RTT and process the data upon application layer
replay. The scale of this attack is limited by the client's
willingness to replay and therefore only allows a small number of
replays, which will also use different encryption keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the "therefore only allows a small number of replays". That's all up to the client. I don't consider 10 to be small, which is where we are at in Firefox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I agree with MT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed "small" to "limited"

{{stateless-anti-replay}} only prevents replay outside the
time window.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty obvious, but worth stating, I think.

@martinthomson
Copy link
Contributor

Hmm, it seems like some of my replies look like new comments. I clearly fail at GitHib.

@ekr
Copy link
Contributor Author

ekr commented May 5, 2017

@martinthomson: I guess I'm writing badly, but you're misunderstanding "store". You accept 0-RTT and then you process the 0-RTT data, but you store a copy of the CH (or the hash) during the window so you don't accept a replay. This doesn't involve delaying the processing of the CH or the EarlyData at all.

@martinthomson
Copy link
Contributor

re: store, I was responding to @colmmacc, where he suggested that you have to wait for propagation of storage attempts from all nodes in the cluster.

all 0-RTT data and for PSK usage when PSK is used without DH.

Because this mechanism requires sharing the session database between
server nodes, it may be hard to achieve high rates of PSK and and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"between server nodes in environments with multiple servers acting as endpoints for the same service"

and these PSKs are deleted upon use, then connections established
using one PSK enjoy forward security with respect to other PSKs
established on the same connection. This is a security advantage for
all 0-RTT data and for PSK usage when PSK is used without DH.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this part really fits in the current exposition as-is. (Also, doesn't the addition of a per-ticket nonce into the PSk ticket derivation give self-contained tickets the same forward secrecy property?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because compromise of the STEK leads to compromise of all tickets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, database-key tickets certainly have better forward security properties than self-contained tickets. I'm just not sure about what the "with respect to other PSKs established on the same connection" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do to rewrite it.

server nodes, it may be hard to achieve high rates of PSK and and
0-RTT success when compared with self-encrypted tickets which do not
require consistent server-side storage for basic functionality but
only for 0-RTT anti-replay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be some more clarification that 0-RTT success and PSK success are partially independent, and that tickets can still be used for PSK even if the single-use property cannot be guaranteed; that is, PSK can succeed even in cases where 0-RTT must be rejected for safety. (Unless I misunderstand?)

duplicates. However, recording all ClientHellos causes state to grow
without bound, so in practice the server must instead record
ClientHellos within a given time window based on the
"obfuscated_ticket_age" value provided by the client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MT's version is better, yes.

client sent the ClientHello as:

~~~~
creation time + (client's view - RTT estimate/2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with Nygren about not giving the attacker more information -- that is, don't hard-fail and just fall back to 1-RTT [unless you're under attack and need to shed load]. The 1-RTT will fail for the attacker's replays, of course.

server SHOULD reject early data and fall back to a full 1-RTT
handshake. Clock skew distributions are not
symmetric, so the optimal tradeoff may involve an asymmetric range
of permissible mismatch values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the discussion above about storing received ClientHello(-related stuff) settles, we should probably normalize this text with what we end up with there.

Also, a server could rate-limit how often it accepts 0-RTT, to provide some reduction in the amount of replay possible. The amount of reduction gained probably is not enough to make it worth doing, but I'll toss it out there.

TLS-using applications. Specifically, if applications are not
engineered to be idempotent, then duplication of requests
may cause side effects (e.g., purchasing an item or transferring
money) to be duplicated, thus harming the site or the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is some HTTPS mindset sneaking on. (Which is not necessarily wrong, just something to be aware of.)

fall back to 1-RTT and process the data upon application layer
replay. The scale of this attack is limited by the client's
willingness to replay and therefore only allows a small number of
replays, which will also use different encryption keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I agree with MT

have multiple copies of the data be accepted during the replication
window. The stateless mechanism described in
{{stateless-anti-replay}} only prevents replay outside the
time window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should reiterate that this can be tens or hundreds of thousands of replays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main body text, not the security considerations, if I'm reading the line numbers correctly.
The security considerations might do well to reiterate the consequences of billions of replays (e.g., Colm's analysis).

{{stateless-anti-replay}} only prevents replay outside the
time window.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, worth stating again. Maybe also that the application profile should tell the client to do so.

@@ -3593,6 +3548,127 @@ appropriate application traffic key as described in {{updating-traffic-keys}}.
In particular, this includes any alerts sent by the
server in response to client Certificate and CertificateVerify messages.

## 0-RTT and Anti-Replay {#replay-time}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much, but should this section be labeled {#anti-replay} or something else instead? The old {#replay-time} section was named that when it only described the stateless time-mismatch mechanism.

Copy link
Contributor

@kaduk kaduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm happy with the direction this is going in, but think a few more tweaks are in order.

of these mechanisms.

Clients are unable to determine which, if any, of these mechanisms
servers actually implement and therefore MUST only send early
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement may be a little stronger than reality, in that a client can intentionally replay things and see whether it gets accepted right away, and after 20 seconds. The first two methods should not be distinguishable to the client, of course, though.

all 0-RTT data and for PSK usage when PSK is used without DH.

Because this mechanism requires sharing the session database between
server nodes, in environments with multiple distributed servers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sharing is only necessary in such environments, so maybe
"between server nodes in environments with multiple distributed servers, in such environments it may be hard"?

the round trip time between client and server. I.e.,

~~~~
adjusted_creation_time = creation time + estimated RTT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I bet there are subtle things an attacker could do to influence the estimated RTT that do not require delayingn the actual NST and/or ClientHello messages, which could throw a wrench into this.

expected_arrival_time = adjusted_creation_time + client's ticket age
~~~~

For a given storage window, the server implements anti-replay as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to introduce the phrase "storage window" somehow (or reword)?

For a given storage window, the server implements anti-replay as
follows.

1. Verify the PSK binder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"What do I do if the verification fails?"

of permissible mismatch values.

Note that while stateless anti-replay can bound over how long in time
a packet may be replayed, the total amount of replays tolerated is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"amount" is not pluralizable; use just "amount of replay tolerated"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, "amount" is pluralizable, just quirky in how it's applied.
A quick search result: https://english.stackexchange.com/a/254372

I agree that "amounts" isn't valid here, however with the potential to use "replays" to refer to each replayed message and "replay" to refer to the concept of replay, both the current text and this suggestion are valid. English can be weird. I sorta lean slightly towards kaduk's suggested wording, but whatever ekr prefers or a coin flip to chose is fine by me. ;)

(PSK) binding between the ticket value and the resumption master secret.
At any time after the server has received the client Finished message,
it MAY send a NewSessionTicket message. This message creates a
pre-shared key (PSK) binding between the ticket value and a secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"PSK binding" is perhaps a confusing term to use, given that we have "PSK binders" that are different.
Maybe "association" or "relationship"?

as cache timing or measuring the speed of cryptographic
operations {{Mac17}}.

The limited anti-replay described in {{anti-replay}} are intended to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anti-replay" is singular and would take "is", but I think this is supposed to be "anti-replay mechanisms".

(which processes the data immediately) and another cluster which will
fall back to 1-RTT and process the data upon application layer
replay. The scale of this attack is limited by the client's
willingness to replay and therefore only allows a limited number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the different-cluster attack limited by the client's willingness to replay?
I guess it relies on the fallback to 1-RTT when the 0-RTT attempt is blackholed, so that's a "yes".

have multiple copies of the data be accepted during the replication
window. The stateless mechanism described in
{{stateless-anti-replay}} only prevents replay outside the
time window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main body text, not the security considerations, if I'm reading the line numbers correctly.
The security considerations might do well to reiterate the consequences of billions of replays (e.g., Colm's analysis).

that an application is always aware that it is sending or receiving
data that might be replayed.

Clients MUST NOT send messages in early data which are not safe to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the target for this requirement really clients? How will they know? I think this is a requirement being placed upon protocols (as below) in how they specify client behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two examples:

  • a client browser decides to only send 0-RTT messages for GET requests to the domain root
  • a client facebook mobile application decides to only send 0-RTT messages to request new timeline changes

I don't see how you could do that outside the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should bind both how protocols specify client behavior and actual client behavior.

@ekr ekr closed this Jul 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet