Exploiting The Lightning Bug Was The Ethical Choice
By publicly exploiting a bug on Lightning that could have put users’ funds at risk, the developer was acting in the best interests of Bitcoin node runners.
This is an opinion editorial by Shinobi, a self-taught educator in the Bitcoin space and tech-oriented Bitcoin podcast host.
For the second time in roughly a month, btcd/LND have had a bug exploited which caused them to deviate in consensus from Bitcoin Core. Once again, Burak was the developer who triggered this vulnerability — this time it was clearly intentional — and once again, it was an issue with code for parsing Bitcoin transactions above the consensus layer. As I discussed in my piece on the prior bug that Burak triggered, before Taproot there were limits on how large the script and witness data in a transaction could be. With the activation of Taproot, those limits were removed leaving only the limitations on the block size limit itself to limit these parts of individual transactions. The problem with the last bug was that despite the fact that the consensus code in btcd was properly upgraded to reflect this change, the code handling peer-to-peer transmission — including parsing data before sending or when receiving — did not properly upgrade. So the code processing blocks and transactions before it actually got passed off to be validated for consensus failed the data, never passed it to the consensus validation logic and the block in question failed to ever be validated.
A very similar thing happened this time. Another limit in the peer-to-peer section of the codebase was enforcing a restriction on the witness data incorrectly, limiting it to a maximum of 1/8 of the block size as opposed to the full block size. Burak crafted a transaction with witness data just a single weight unit over the strict limit and once again stalled btcd and LND nodes at that block height. This transaction was a non-standard transaction, which means that even though it is perfectly valid by consensus rules, it is not valid according to default mempool policy and therefore nodes will not relay it across the network. It is perfectly possible to get it mined into a block, but the only way to do so is to provide it directly to a miner, which is what Burak did with the help of F2Pool.
This really drives home the point that any piece of code whose purpose is to parse and validate Bitcoin data must be heavily audited in order to ensure it is in line with what Bitcoin Core will do. It doesn’t matter if that code is the consensus engine for a node implementation or just a piece of code passing transactions around for a Lightning node. This second bug was literally right above the one from last month in the codebase. It wasn’t even discovered by anyone at Lightning Labs. AJ Towns reported it on October 11, two days after the original bug was triggered by Burak’s 998-of-999 multisig transaction. It was publicly posted on Github for 10 hours before being deleted. A fix was then made, but not released, with the intention to quietly patch the issue in the next release of LND.
Now, this is pretty standard procedure for a serious vulnerability, especially with a project like Bitcoin Core where such a vulnerability can actually cause serious damage to the base-layer network/protocol. But in this specific case, it presented a serious risk to LND users’ funds, and given the fact that it was literally right next to the prior bug that had the same risks, the chances that it would be found and exploited were very high, as demonstrated by Burak. This begs the question of whether the quiet-patch approach is the way to go when it comes to vulnerabilities like this that can leave users open to theft of funds (because their node is left unable to detect old channel states and properly penalize them).
As I went into in my piece on the last bug, if a malicious actor had found the bugs before a well-intended developer, they could have tactically opened new channels to vulnerable nodes, routed the entire contents of those channels back to themselves and then exploited the bug. From there, they would have those funds under their control and also been able to close the channel with the initial state, literally doubling their money. What Burak did in actively exploiting this issue in an ironic way actually protected LND users from such an attack.
Once it was exploited, users were open to such attacks from preexisting peers with whom they already had open channels, but they were no longer capable of being targeted specifically with new channels. Their nodes were stalled and would never recognize or process payments through channels someone tried to open after the block that stalled their node. So while it didn’t completely remove the risk of users being exploited, it did limit that risk to people they already had a channel with. Burak’s action mitigated it. Personally I think this type of action in response to the bug made sense; it limited the damage, made users aware of the risk and led to it being quickly patched.
LND was also not the only thing affected. Liquid’s pegging process was also broken, requiring updates to the federation’s functionaries to fix it. Older versions of Rust Bitcoin were affected as well, which caused the stall to affect some block explorers and electrs instances (an implementation of the backend server for Electrum Wallet). Now, with the exception of Liquid’s peg eventually exposing funds to the emergency recovery keys held by Blockstream after a timelock expiry — and, realistically in the heist-style movie plot where Blockstream stole these funds, everyone knows exactly who to go after — these other issues never put anyone’s funds at risk at any point. Also, Rust Bitcoin had actually patched this specific bug in newer versions, which apparently didn’t lead to any communication with maintainers of other codebases to highlight the potential for such issues. It was only the active exploitation of the bug live on the network that widely exposed that the issue existed in multiple codebases.
This brings up some big issues when it comes to vulnerabilities like this in Layer 2 software on Bitcoin. First, the seriousness with which these codebases are audited for security bugs and how that is prioritized versus the integration of new features. I think it is very telling that security is not always prioritized given that this second bug was not even found by the maintainers of the codebase where it was present, even though it was literally right next to the initial bug discovered last month. After one major bug that put users’ funds at risk, was no internal audit of that codebase done? It took someone from outside the project to discover it? That does not demonstrate a priority to safeguard users’ funds over building new features to draw in more users. Second, the fact that this issue was already patched in Rust Bitcoin demonstrates a lack of communication across maintainers of different codebases in regards to bugs like this. This is pretty understandable, as being completely different codebases doesn’t make someone who found a bug in one immediately think, “I should contact other teams writing similar software in totally different programming languages to warn them about the potential for such a bug.” You don’t find a bug in Windows and then immediately think to go report the bug to Linux kernel maintainers. Bitcoin as a protocol for distributed consensus across a global network is a very different beast, however; maybe Bitcoin developers should start to think along those lines when it comes to vulnerabilities in Bitcoin software. Especially when it comes to parsing and interpreting data that is consensus related.
Lastly, maybe when it comes to protocols like Lightning, which depend on observing the blockchain at all times to be able to react to old channel states in order to maintain security, independent parsing and verification of data should be kept to an absolute minimum — if not removed entirely and delegated to Bitcoin Core or data directly derived from it. Core Lightning is architected in this way, connecting to an instance of Bitcoin Core and depending entirely on that for validation of blocks and transactions. If LND worked the same way, neither of these bugs in btcd would have affected LND users in a way that put their funds at risk.
Whichever way things are handled — either outsourcing validation entirely or simply minimizing internal validation and approaching it with much more care — this incident shows that something needs to change in approaching the issue of how Layer 2 software handles interacting with consensus-related data. Once again, everyone is very lucky that this was not exploited by a malicious actor, but instead by a developer proving a point. That being said, Bitcoin cannot count on getting lucky or hoping that malicious actors do not exist.
Developers and users should be focused on improving the processes to prevent incidents like this from happening again, and not playing the game of tossing around blame like a hot potato.
This is a guest post by Shinobi. Opinions expressed are entirely their own and do not necessarily reflect those of BTC Inc or Bitcoin Magazine.
14 November 2022 01:00