Postmortem On The Lightning Replacement Cycling Attack
So a lot of noise has been made around the Lightning vulnerability recently disclosed by Antoine Riard. Many people are claiming the sky is falling, that Lightning is fundamentally broken, and nothing could be further from the truth. I think part of the problem is that people don’t really understand how this vulnerability works, firstly, and secondly many people don’t understand how this individual vulnerability overlaps with other known issues on the Lightning Network that have known solutions.
So first, let’s go through and try to understand the vulnerability itself. When a Lightning payment is routed across the network, one thing that is key to understand is how the timelocks for refunding a failed payment work. The hop closest to the receiver has a timelock of ‘x’, and every hop going back to the sender has one of ‘x+1’, ‘x+2’, and so on. The timelocks get progressively longer as you go each hop from the receiver back towards the sender. The reason for this is that if a payment reaches the receiver, but some problem stops the preimage from propagating all the way back to the sender, the hop where it stopped has time to enforce it on-chain, and put the preimage there that all preceding hops need to confirm the payment. Otherwise someone in the middle, where the failure happens, could have their outgoing hop claim the funds with the preimage, and the hop that forwarded it to them claim it with their refund path, and leave that person in the middle shit out of luck having lost funds.
The Replacement Cycling Attack is a complicated way to try and accomplish exactly that undesired outcome, the target node losing money by having the outgoing hop claim the funds with a success transaction, and the incoming hop claiming funds through the refund transaction. This necessitates stalling out the victim node, and preventing them from seeing the preimage in the success transaction on one side until after the timelock expires on the other side, so they can claim the refund there.
This requires a very targeted and complicated game of manipulating the victim’s mempool. Let’s look at the actual transaction structure involved here. You have the commitment transaction, which is the main transaction representing the Lightning channel state. It has an output for each side of the channel representing funds completely under the control of one member or the other, and outputs for each HTLC in the process of being routed. These outputs are the ones we are concerned with. Each HTLC output can be spent either immediately at any time with the preimage from the receiver, or after the timelock expires on the refund.
The attack requires that a malicious party, or two conspiring parties, have a channel on both sides of the victims node routing a payment. So Bob, the victim, has a channel with Alice and Carol, the attackers, and payment routed from Carol to Bob to Alice. Now remember, the timelock refund path between Alice and Bob will expire and become valid before the refund between Carol and Bob.
The attackers route a payment through Bob, and then Alice will refuse to send Bob the preimage to finalize the payment when she receives it. What Bob will do now is wait until the timelock window expires between himself and Alice, and go to broadcast the channel commitment transaction and refund transaction to get it confirmed before the timelock window expires. What Alice will do is then go to spend the preimage transaction to claim the funds with an output unrelated to the channel, and right afterwards doublespend the second input in the preimage success transaction. The goal here is to evict Bob’s timeout transaction from the mempool, but also evict the preimage success transaction so Bob doesn’t see it. If he does, he will learn the preimage and can simply claim the funds in his channel with Carol before her timeout transaction is valid to spend.
Alice and Carol have to do this on a consistent basis, everytime Bob rebroadcasts his timeout transaction with Alice, until the blockheight passes where Carol’s timeout transaction is valid. Then they can submit the success transaction on Alice’s side, and the timeout transaction on Carol’s side, and leave Bob holding the bag having lost the value of the payment he was routing.
The problem with this is two fold. Firstly, the victim’s Bitcoin Core node must be specifically targeted to ensure that at no time does the preimage success transaction propagate into their mempool where their Lightning node can acquire the preimage. Secondly, if the second transaction Alice uses to evict the preimage transaction is confirmed, Alice incurs a cost (remember, the idea is to replace the timeout transaction with the preimage, so that is evicted from the mempool, then replace the preimage transaction with the second one double-spending the additional input in the preimage transaction). That means every time Bob re-broadcasts his timeout transaction, Alice has to pay a higher fee to re-evict it, and when that is confirmed she actually incurs a cost.
So Bob can force Alice to incur a big cost simply by regularly rebroadcasting his timeout transaction with a higher fee, meaning if the payment HTLC output is not worth significantly more than the fees Alice could incur, the attack isn’t economically worthwhile to pull off. It would also be possible to prevent the attack completely by changing how HTLC success and timeout transactions are constructed. By using the SIGHASH_ALL flag, which means the signature commits to the entirety of the transaction and becomes invalid if the tiniest detail (like adding the new input in the preimage transaction required for this attack) is changed. This wouldn’t work with current version of Lightning channels using anchor outputs, but it would solve the issue entirely. Peter Todd has also proposed a new consensus feature that would entirely solve the issue, essentially a reverse timelock, where the transaction would become invalid after a certain time or blockheight instead of becoming valid after. Going that far however is not necessary in my opinion.
Simply rebroadcasting your transaction regularly with a slight fee bump is a massive mitigation of the attack, but there are also numerous dynamics that just make it not a serious issue regardless. First, if you aren’t a routing node, this isn’t really a serious issue. So most end users are safe from this attack. Secondly, there are many reasons why nodes do not allow any random person to open channels to them. Large nodes are very selective about who they peer with, as random channels not managed efficiently or professionally have a cost in the form of sunk or wasted capital in unused channels. So any large node that would make a juicy target for this attack is not trivial to even get connected with in the first place, let alone connect to them with multiple channels to pull off the attack in the first place. Lastly, as I’ve written about in the past, other unrelated attacks possible on the network are already necessitating filters and restrictions in how nodes choose to handle HTLCs they could forward. I.e. limits on the size of payments they will forward, how many they will allow at any given time, etc. So even if you can open a channel with a node worth attacking, as the network evolves there will be more thought through criteria and filters for deciding whether to even forward a payment in the first place.
Overall, this is a legitimate issue and possible attack, but both in terms of direct mitigations, and how the attack will interact with solutions to other issues over the long term, this is not an unsolvable problem. It is a legitimate issue, and dismissing it as purely FUD is not an accurate reaction, but to claim the sky is falling and the Lightning Network as a protocol is doomed is far overblowing the issue.
Time will march on, we will run into problems, and we will fix those problems as they come. Like we always have.
24 October 2023 16:30