-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] DoS mitigations #258
Comments
What stops an attacker from generating high-difficulty Blake2b hash once and using it many times? |
I guess there would need to be some sort of cache that would include blocks passing Blake2b but failing RandomX.
Yes, that's also an option.
|
A cache of failing hashes will be safer from cryptographical point of view. I'm not sure if giving a direct acces to the final hash like this: |
Why would it be unsafe? It's actually a good idea to thwart precomputed DoS attacks because the attacker cannot know the hashing blobs for future blocks. |
I'm not a cryptography expert and I don't see any shortcuts in this approach right now, my concern is that adding the original input blob to the final hash can create some shortcuts in PoW, if Blake2 gets broken for example. |
I don't think we can make any statements about the security of RandomX if Blake2 is broken, regardless of this proposal.
Missing input to a hash function is what can cause vulnerabilities. I don't think it's possible to cause a vulnerability by adding more context to a hash function. Moreover, this inner/outer pow is not a new idea. It was used by Ethereum and is currently also used by Grin. |
I know, and I don't see any problems with it myself. I'm just saying it will need another audit from cryptography experts to be 100% sure. |
I don't think we need an audit for a 1-line change in specs and perhaps 8 lines of C++ code. MRL has enough capacity for this. |
So the proposal is:
Afaict this is a good solution to the stated problem. Note that it only solves DoS in cases where computing I believe the only way to mitigate DoS (at the validation site) that involves multiple validation steps with non-trivial cost and no pre-validation (e.g. validating a transaction or set of transactions) is to randomize the order of validation. Then you can get 50% reduction in average validation cost for invalid maximally-efficient DoS statements. |
@UkoeHB yes, this is the gist of the proposal. The third step involves invoking the RadomX virtual machine and is therefore quite expensive, but the attacker needs to perform substantial amount of work to pass step 2.
No, the register file used in the hash is the final output of the RandomX VM.
This proposal only focuses on DoS using the PoW algorithm. However, I don't think randomization is the best way. An attacker can trivially submit blocks that violate all rules. In that case, the optimal strategy is to validate from the cheapest to the most expensive rule (where the cost is measured e.g. by CPU cycles needed to validate the rule). |
Right, I just mean if validating PoW is always done as part of block validation, and then you have some other validation step after the PoW check, the DoSer can just target that final step.
Randomization is the best way if the attacker knows your validation strategy, because he can always target the last step in a fixed validation sequence. You can increase the cost to the attacker to execute a max-cost attack by ordering validation steps by cost, but that means increasing your own local cost to validate the attacker's statements (relative to dynamically randomizing validation steps). When statement-creation and statement-validation costs are massively disproportionate, as in PoW, then it makes sense to move the disproportionate steps to the end and order them by cost (assuming you can guarantee that the attacker pays statement-creation costs like this proposal does). Randomization is basically academic though, it's probably a serious pain to implement correctly. |
I'm trying to design a better system to defend against invalid blocks called "Proof of Attack" (PoA) I'm trying to see we can Exploit Dandelion++ to Reverse Attack (triage) malicious nodes that are submitting invalid blocks; Here's a Draft Example of the "Proof of Attack" (PoA) defence system:
16: End result: The benefit of this method is that triaging doesn't actually prevent invalid blocks from being submitted, it just triages / sorts them last so weak nodes can just ignore them, leaving the powerful nodes to crunch the numbers. So the network quickly heals itself. This is just a draft idea i came up with. Not sure if it'll work or not. |
I do not agree with step 10. It basically reads like "it will work because I say so" ("because of entropy" is not a valid argument unless you explain what it means). AFAICT, it is very vulnerable to Sybils, in the form of simple nodes that do not have to perform any kind of work. Spin up enough nodes, and you can "prove" any arbitrary other node "attacked" by sending an invalid block, claiming it came from your target. PoW is how bitcoin/monero/etc get out of this Sybil issue, since spinning up more nodes stops being so cheap. If you can explain point 10 to people's satisfaction, it can be looked at again. Also, Dandelion works on unmined txes, not blocks, doesn't matter here though. |
They send this information to the node they received the block/tx from? Wouldn't that node be the malicious one? Otherwise they wouldn't have transmitted an invalid statement. No need for triage, just block IPs that send you invalid stuff. What you could do is form graphs of trusted nodes that report malicious node IPs to each other. A very complicated system to build though. |
Re: Step 10,
PoA does not target the malicious node directly because the best the proof can do is to narrow down the attack to just 2 nodes The malicious node could be, The last one that claims responsibility (who may be lying), or, the one that Denies responsibility (may also be lying), so it's a 50% possibilty To explain the entropy is as follows: very quickly, after several cycles (e.g with Dos ), entropy builds up and as more nodes share the PoA, the list of "Re-curring IP addresses" very quickly narrows down the source. e.g 50% of 50% =75%,of 50% = 87.5% then 93% etc, soon becomes 99.9% after just a few hops and the malicious nodes (no matter how many there are) soon run out of IP addresses. Because the source node will always have a higher count of malicious Data than the honest nodes in a random environment. like how contaminated water disperses from an approx single point, when the vectors cross, you can triangulate the probability of the source very quickly. So for example, if a Sybil attacker spins up multiple nodes and tells a huge bunch of lies by sending false blocks, only the participating nodes (the attackers) are affected because only they can prove Proof of Attack. So after just one malicious piece of data, the probability of unmasking the attacker is reduced from 10+ nodes to just 2 nodes, Also i should emphasizee the punitive action is merely "Triaging" ie: Optimization tagging. The key thing to remenber is that this is "immunization" by optimization and triaging, not blocking / banning an IP per se (that's optional), so although many honest nodes >90% may get on the 'naughty list', they are given benefit of doubt, but the malicious nodes have the 'most doubt' against them (they are on the most naughty lists due to entropy as demonstrated above) and are almost always at the bottom and are affected the most, but still function normally as a failsafe. i'll have to create a diagram to demonstrate this as it's a very messy with words. I apologise this is just a draft. |
The nodes store the malicious data and when provided proof of an Attack (PoA) and 2 way communication can occur within the honest nodes to run a trace route back to the source who will ultimately Lie about it.
|
Dandelion is used to propagate mempool transactions, not blocks. Blocks must be propagated as fast as possible.
No such proof is possible unless you do some sort of PKI and have all nodes sign their messages. Impractical.
Invalid blocks are not propagated. If a node receives an invalid block, the sending node is almost certainly malicious and can be immediately blocked.
Explain that to someone running a node on a Rasberry Pi with 1 GB of RAM. In any case, this discussion has gone off topic. I suggest @MoneroChan to open a separate discussion in the Monero repository. |
I apologise, i was getting off track and appear to have mistook the issue Back on topic:
Hope this helps, I'm sorry if i misunderstood the issue. |
Even if you immediately ban the malicious peer, you have already wasted the CPU time needed to calculate the hash, which might be significant. The IP ban can be circumvented by attackers with access to many addresses (botnets etc.). Monero developers were clearly aware of the DoS potential of the PoW algorithm, hence the harsh penalty for nodes that submit invalid PoW. For Bitcoin, the PoW is much cheaper to verify and bitcoin nodes have a more lenient policy for misbehavior. They mark misbehaving peers as "discouraged" and only accept them as incoming connections if their inbound peer slots are not fully occupied (source).
This is a recipe for a chain split. You cannot reject blocks that might be valid. |
Yes. but my main proposal was: "Blocks that exceed the timeout (and the corresponding nodes that sent it) are put on a low priority 'to do later' list when free CPU cycles are available, " So there is a 'fixed' amount of time that can be wasted by the CPU (adjustable set by node operator) Example; if a quantum computer exploits a zero day vulnerability to maliciously craft a block that takes 100 years to solve, This way, the weak Raspberri Pi Node only works on it when it has spare CPU cycles and doesn't cause a DoS traffic jam backlog. (The "discarded or blocked part" you are worried about, can be made optional; but sooner or later you will eventually have to discard it if the maliciously crafted block takes infinity to solve)
True, However, If we think about this logically, if every node connected to the raspberry Pi is a malicious botnet, then the poor raspberry pi has zero chance of getting a valid block no matter what checks you implement to the algorithm because all links are cutoff to the actual chain begin with. So for practical simulation sake, to solve this DoS issue, we will assume there is at least "one" good node the Raspberri Pi is connected to that on average sends blocks that are computable within an acceptable timeframe, like a bootstrapper. So with this mind, as i proposed, the Raspberri pi simply prioritizes blocks from that known good node first before other nodes. The key point i'm suggesting is that you would still be protected by using this timeout method, in the event of a future unknown Zero Day DoS vulnerability that may not be codable for. E.g Looking at 0.20.1 Release Notes from bitcoin, they may still be vulnerable because their method requires confirmation the block is invalid before acting on the node. Does the network already do this? If i'm mistaking something please let me know, |
Implemented in #265 |
Network nodes typically verify RandomX hashes in the light mode. One hash calculation takes around 15 ms on high-end machines and up to 1 second on low-end 32-bit machines (Raspberry Pi Zero). This might be a denial-of-service (DoS) vector if malicious nodes started flooding the network with invalid blocks. I don't know if any actual attacks exploiting this have happened, but it might be a good idea to mitigate this.
As per the specification, the RandomX result is calculated as a Blake2b hash of the final register file (with the constant registers replaced by the scratchpad hash). We could include some intermediate data in the block blob so that the final result can be quickly calculated with one call to the Blake2b compression function. This would require a change of the block format, but since Seraphis will bring large changes to the transaction format anyways, adding one field to the block blob would be a comparatively small change.
If we wanted to make hashes backwards compatible, it would require at least 192 bytes to be included in the block (128 bytes for the last Blake2b block and 64 bytes of the internal hash state (see blake2b.c).
However, even such a "backwards compatible" change would require all mining software to be updated to also return the intermediate state. So I don't think we need to restrict ourselves to backwards compatible solutions.
A much simpler solution would be to change step 14 of the algorithm from
R = Hash256(RegisterFile)
.to
R = Hash256(Hash256(RegisterFile))
.This would produce different hashes, but the intermediate state would be reduced to just 32 bytes. This change is also very easily proven to be secure (double hash is as secure as a single hash) and it would be a non-contentious fork because the performance implications are negligible (additional ~0.3 μs on top of the full ~1600 μs calculation in full mode) and no changes are needed to the RandomX virtual machine.
If network nodes first validated the difficulty of the final Blake2b hash, it would become infeasible to use the PoW algorithm to DoS the network. At the current network difficulty of ~300G, it takes about 1 minute on a high-end GPU to produce a Blake2b hash that is below the target.
The text was updated successfully, but these errors were encountered: