Article realeased on Oct 14, 2022
On August 3rd, security researchers Jinu and Allen found a vulnerability in THORChain TSS module, and submitted it via ImmuneFi. A malicious node operator (with a small number of malicious nodes) could launch DoS and blame other innocent node operator for it.
Understanding TSS & Usage
First, we need to understand what TSS actually is. Basically, a group of operators will each have their share of the ECDSA private key, and with enough people they can sign a message without revealing their share. THORChain used this to sign outbound transactions.
The cryptography inside TSS has been improving in terms of efficiency, security, and other properties over the years. The paper https://eprint.iacr.org/2021/060 presents a method where if the TSS signing fails, the system can find out which signer had caused the failure.
This is called “Identifiable Abort” - and this is great as now the system can penalize the signer who caused the failure. For example, they can remove the rewards for block generation.
This is exactly what THORChain did.
Attack Idea & Input Validation
This also means that if we can cause a signing failure while impersonating as other node operators, we can not only cause a DoS, but also do it while getting all the block rewards and placing the blame on others. Let’s take a look at the input validation.
If we want to forge that value, we would need to forge an ECDSA signature for it.
However, the nodes does not perceive the message sender as the cryptographically validated wireMsg.Routing.From.Id value. We see that the wireMsg.Message is unmarshalled, then each msg has its own Routing.From.Id, which is not validated. This is what’s actually used.
Now we can impersonate others. We now have to simply cause a DoS, but this can be easily done by sending invalid ZK proofs in the first stage of the signing process. There were no logs to find out who was the real culprit. We’ll see this in the proof of concept section below.
Proof of Concept
The below section is from the report we sent to the THORChain team.
We modified the tests the THORChain team wrote to demonstrate an exploit.
There are four peers in the test. Their partyID is as follows.
Index 0, ID 1 ← this peer will be our “victim”, wrongly accused of delaying signing
Index 1, ID 0 ← this peer is “just a peer”
Index 2, ID 2 ← this peer is “just a peer”
Index 3, ID 3 ← this will be the “attacker peer”
In the first round of TSS signing, each peer should submit an appropriate NIZK proof to others. Here, the attacker peer will send an invalid NIZK proof to the two “just a peer”s while impersonating the “victim”. This can be done by simply modifying the tss.MessageRouting part of the message. Meanwhile, the attacker rewrites the ProcessOutCh function as ProcessOutChAttack, which implements the sendBulkMsg function differently, i.e. as sendBulkMsgAttack. In the sendBulkMsgAttack function, we set the wireMsg.Routing.From as the attacker peer’s party ID information, which is required to pass the signature check. Now the other “just a peer”s will check the signature with the attacker peer’s public key, yet think the invalid NIZK proofs originated from the “victim”, causing the damage.
Note that the test function calls SignMessage(), but in the real scenario KeySign() will be called first - but this is not a problem as the situation is equivalent anyway. We note that every peer receives the request for KeySign() and participants all know each other’s peer Index, ID, and signature public key.
Running the PoC
patch the files with the submitted files
$ mkdir thorchain_poc_0803 $ cd thorchain_poc_0803 $ wget https://gist.githubusercontent.com/HaechiAudit/e0bc3c0033d9593f4bb4d97d06677f59/raw/af3337d193a85b73a5d741de83edb3c8e983a974/go-tss.diff $ wget https://gist.githubusercontent.com/HaechiAudit/e0bc3c0033d9593f4bb4d97d06677f59/raw/af3337d193a85b73a5d741de83edb3c8e983a974/tss-lib.diff $ git clone https://gitlab.com/thorchain/tss/go-tss.git $ cd go-tss $ git checkout b87ff91ddc3cdc1f05143c9ce36501e0abb48f70 $ patch -p1 < ../go-tss.diff $ cd .. $ git clone https://gitlab.com/thorchain/tss/tss-lib.git $ cd tss-lib $ git checkout e1fed6a07f266d96b5c3d33b9ae29a9adef46edc $ patch -p1 < ../tss-lib.diff $ cd ..
run the test TestSignMessage in go-tss/keysign/keysign_test.go
$ cd go-tss $ go mod download $ cd keysign $ go test -c gitlab.com/thorchain/tss/go-tss/keysign -o keysign.test # wait 90s (KeySignTimeout) $ ./keysign.test -test.v -test.paniconexit0 -check.f '^TestSignMessage$' -check.vv
We see that all the malicious party accusations are against our victim.
The below section is from the report we sent to the THORChain team.
There are two cases - whether the set of operators that join the signing process is given, or not.
1. all participants that have a key share join in signing process
In the signing process, the attacker can cause a Denial of Service by sending an invalid proof while impersonating others in the Round 1 of TSS signing process. Recall that in Threshold ECDSA with Identifiable Abort (https://eprint.iacr.org/2021/060) that the participants can find the ones that caused the failure of signing. By impersonating others, the attacker can not only cause a DoS, but also do so while placing the blame on an innocent node operator. This means that the other operators may lose income as well. Overall, this is very critical result.
2. only some participants (in req.SignerPubKeys) join in the signing process
In this case, the attacker controlled node must be inside the req.SignerPubKeys to perform the attack - in this case, we can only temporarily delay the signing process, as eventually req.SignerPubKeys will not contain the attacker-controlled node.
However, the other operators will still be blamed for the failure of signing, which may lead to loss of income. Also, if the attacker controls some reasonable percentage of nodes (that are much less than 1/3) they still may be able to cause a DoS. Let’s crack the numbers.
The system would need at least 62 nodes to join in on the signing process, as shown below.
It’s clear that the threshold is 2/3 of the number of participants
Therefore, if the attacker had just 3 nodes, the probability that the node set doesn’t contain any attacker controlled node is just binom(89, 62) / binom(92, 62) which is around 3%.
This already slows down the system by 30 times.
If the attacker controlled 10 nodes, the probability would be on the scale of 10^-6.
This would be enough to be considered as a full DoS, done with just 10% of the nodes.
Once again, note that it’s impossible to find out the real culprit due to the nature of this vuln.
This was regarded as a low severity bug, and we received 2K RUNE for this. We believed that the “impossible to find the real attacker” part was very dangerous, but the team asserted that the $1M minimum bond made the attack hard enough. The bug is now patched, and we are safe.
KALOS is a flagship service of HAECHI LABS, the leader of the global blockchain industry. We bring together the best Web2 and Web3 experts. Security Researchers with expertise in cryptography, leaders of the global best hacker team, and blockchain/smart contract experts are responsible for securing your Web3 service.
We have secured over $60b worth of crypto assets across 400+ global crypto projects — L1/L2 projects, defi protocols, P2E games, and bridges — notably 1inch, SushiSwap, Badger DAO, SuperRare, Klaytn and Chainsafe. KALOS is the only blockchain technology company selected for the Samsung Electronics Startup Incubation Program in recognition of our expertise. We have also received technology grants from the Ethereum Foundation and Ethereum Community Fund.
Secure your smart contracts with KALOS.