Beware of scams impersonating Jump Trading Group. We only communicate through our official accounts.
- Firedancer
- Thinking
- Connect
Statistical Attacks on Proof of Solvency
Nihar Shah
TL;DR
- For proof of solvency mechanisms to prevent an exchange from misappropriating consumer deposits, consumers must check that their deposits are included in the exchange's reported list of deposits.
- In theory, only a handful of random checks on the exchange are needed to keep an exchange honest — but this fails in practice for two reasons.
- An exchange can likely predict which consumers will check, and an exchange can also likely suppress a handful of failed checks — which means it can weaken or undermine the probabilistic security that proof of solvency offers.
- Thus, exchanges and users should be thoughtful about the mechanism for users to launch such checks and to raise potential issues to restore such guarantees.
Introduction
In the wake of FTX's collapse, many prominent crypto exchanges have announced plans around "proof of solvency." Indeed, exchanges as important and diverse as Binance, BitMEX, ByBit, Gate.io, Kraken, OKX, and more have already rolled out infrastructure to support such checks.[1] Through some combination of professional and user-powered audits, exchanges use proof of solvency to attest that they hold the funds that users custody with them.
But, proof of solvency is not perfect. The crypto community already understands some of the flaws. From a verifiability perspective, exchanges may not control the on-chain addresses that they claim. From a financial perspective, proof of solvency does not guarantee actual corporate solvency, as exchanges hold other assets and liabilities on their balance sheet (ranging from the mundane, like honoring salary obligations, to the complex, like pledging assets as collateral). From a technical perspective, proof of solvency is not necessarily plug-and-play and requires care in selecting the appropriate approach (discussed by Chalkias, Chatzigiannis, and Ji among others). These are ongoing problems that the community is working to solve.
This article explores a less-discussed problem: the proof of solvency mechanism risks being undermined by a malicious exchange in statistical ways. Critically, if exchanges can predict future attestations or sow doubt on failed attestations, they can successfully misappropriate consumer funds. The strong probability guarantees behind proof of solvency in theory are remarkably brittle in practice.
There are a few solutions that a well-intentioned exchange can implement to ease this problem, but the solutions ultimately require care and future development. Indeed, most simple solutions that fix one part of the problem (and come with good optics on face value) actually worsen other parts of the problem — and a malicious exchange can use this to its advantage. It is ultimately up to the crypto community to hold exchanges to account on the thoughtfulness of their roadmap. Well-intentioned exchanges will surely welcome those standards.
Proof of Solvency's Strength, in Theory
There is a substantial literature on proof of solvency from a technical perspective (e.g. a primer by Vitalik), and so we will not rehash it here. Our focus is on the "proof of liabilities" component of the process, in which an exchange reports the sum total of customer deposits. (This is then balanced by the exchange proving the sum total of their reserves, for an on-chain representation of solvency).[2]
To operationalize and attest to this, exchanges typically pick one (or both) of two ways. They can present an anonymized list of deposits to an external auditor, or they can publish an anonymized tree of deposits (as a Merkle or Verkle tree). These have different trust assumptions and technical specifications, but they are equivalent for our purposes.
This system works as long as users check their deposits and ensure that they are included in the total sum. A malicious exchange may try to exclude or under-report deposits for a subset of customers, and then misappropriate those funds. Neither an auditor nor a tree will intrinsically know that Alice held 5 BTC with an exchange, unless Alice herself checks her inclusion. Thus, at face value, a successful check on the exchange holistically requires each and every user to check his or her deposit's inclusion.
And of course, most users will not check. The median crypto user does not know how to interact with a Merkle tree, which involves some technical complexity. The barrier to entry is lower for checking inclusion in an auditor's list, but it is still non-trivial. And more generally, we should expect fewer users to check than hoped for, due to the externality problem. A user who checks their deposits is providing a diffuse benefit to the community by keeping the exchange honest, and pays a concentrated cost (their time) to do so.[3]
But in theory, this does not matter. The math behind proof of liabilities appears wonderfully robust at first glance, and it works well with just a few good citizens policing the exchange. The probability that an exchange gets away with some form of misappropriating deposits is just the probability that every one of those depositors does not check. By contrast, in theory, an exchange can be caught if just a single misappropriated depositor checks on his or her deposit. To state this formally, if we denote the probability of a random user checking as \(\pi\), this simplifies to the following expression for an exchange that misappropriates \(n\) deposits.
\[P(\text{Not Caught}) = \prod_{i=1}^n P(\text{Depositor $i$ doesn't check}) = (1 - \pi)^n\]
For instance, if an exchange misappropriates the deposits of just sixty-nine users and each user checks with a 1% probability, it has a 50-50 chance of getting caught versus getting away with its actions. By the time an exchange targets even five-hundred users, it has basically a zero chance of getting away with its actions. For exchanges, which have user numbers in the millions, these are miniscule targets and not worth the risk.
Proof of Solvency's Brittleness, in Practice
As seductive as this framework is, it is remarkably less effective in practice. There are two critical issues:
- An exchange may be able to predict which customers are likely to check their deposits. In the simplest form, an exchange may use some basic demographics to distinguish crypto-savvy users (who are comfortable with downloading Merkle trees and verifying inclusion) and crypto-naive users (who are not).
- A single failed attestation is unlikely to trigger immediate action, for two reasons. First and foremost, there will be some false positive rate due to user error. Lists and trees represent snapshots of deposits in time — and users may misinterpret the output. Second, a single user likely will not generate enough reach to alert the community, particularly if the exchange quickly remedies or obfuscates the issue (e.g. by misappropriating a different user's deposit).
Thus, we augment the model above in a few critical ways. First, we partition the space of customers into two types: one that is sufficiently savvy to check their balances (denoted by \(\theta_1\)) and one that is not (denoted by \(\theta_2\)).[4] The first type checks on their deposit with some probability \(\pi\), while the second type never checks.
Second, a malicious exchange actively tries to predict which consumers are which type. The exchange only wants to misappropriate the deposits of the second type (i.e. the non-checking), as they can do so with zero risk if successful.[5] To represent this knowledge, we introduce the parameter \(\phi \in [0,1]\), which an exchange uses to get an updated probability of a given targeted consumer's type.
\[P(\theta_1 | \phi) = P(\theta_1) (1 - \phi)~~~\text{and}~~~P(\theta_2 | \phi) = P(\theta_2) (1 - \phi) + \phi\]
Notice that when \(\phi = 0\), the exchange has no predictive capabilities, and so their conditional guess of a given consumer's type is just the baseline probability of the two types. When \(\phi = 1\), the exchange knows the consumer's type perfectly and thus can guarantee they are only misappropriating the deposit of the second (non-checking) type.
Third, the exchange only gets caught if more than a certain number of people have failed attestations. A single failed attestation may not be enough to land the exchange in trouble, and so we generalize the parameter to requiring \(j\) failed attestations.
With these three complications, we can update the probability of the exchange getting away from misappropriating \(n\) deposits as the following equation. A full derivation can be found in the appendix.
\[P(\text{Not Caught}) = \ \sum_{k = 0}^{j - 1} {n \choose k} \left[\pi P(\theta_1) (1 - \phi)\right]^k \left[(1 - \pi) P(\theta_1) (1 - \phi) + P(\theta_2) (1 - \phi) + \phi\right]^{n - k}\]
This changes the math radically. To illustrate, we parameterize the equation and compute the 90-10 point, i.e. the number of depositors that an exchange could misappropriate from to face a 90% chance of getting away and only a 10% chance of getting caught (informally, a good chance of getting away). In this simulation, we set \(\pi = 0.2\) and \(P(\theta_1) = 0.05\), i.e. the savvy population makes up 5% of the total population and checks 20% of the time, for an aggregate 1% of the population checking. We try a variety of values for \(\phi\) and \(j\) and compute the following values for the number \(n\) consumers to target.
N for 90% Escape | \(j = 1\) | \(j = 10\) | \(j = 20\) | \(j = 50\) |
---|---|---|---|---|
\(\phi = 0.00\) | 10 | 624 | 1,455 | 4,122 |
\(\phi = 0.50\) | 21 | 1,246 | 2,907 | 8,240 |
\(\phi = 0.90\) | 105 | 6,223 | 14,527 | 41,183 |
\(\phi = 0.99\) | 1,054 | 62,214 | 145,255 | 411,795 |
We can visualize these same results below, focusing on \(\phi = 0\) and \(\phi = 0.99\) (corresponding to no knowledge versus high knowledge, on the part of the exchange) across a wider range of \(j\) values (the number of failed attestations before an exchange is caught). We plot the number of consumers needed for this 90% threshold on a logarithmic scale.
The results are striking, in that they show the strong theoretical guarantees around the proof of solvency mechanism fail quickly in practice. In the strongest version of the theoretical model, an exchange can only misappropriate ten accounts before facing a non-trivial chance of getting caught. By contrast, in the most extreme cases in practice, the exchange can misappropriate nearly half a million accounts for that same risk, if they do well predicting types and squashing intermittent reports. While we do not claim the plausibility of this particular parameterization, this table shows that either component — predictability of checkers or critical mass of failed attestations — can weaken any theoretical security offered by proof of solvency. Together, they can render it functionally useless.
Risks in the Status Quo
The concerns in this article may seem academic, but in fact exchanges are currently well-positioned to weaken proof of solvency in both ways. The existing designs around proof of solvency lend themselves to unusually strong forms of predictability and opaque forms of adjudicating failed attestations.
Consider first the predictability of the checking type. Exchanges could certainly use demographic information, as suggested earlier, for predictive abilities. But they actually can do something much better. Exchanges know exactly the subset of people likely to check: the users who navigate over to the portion of the exchange's native website or application to download the tree, check their proof, or get their address within the tree. Regardless of where the tree or list is ultimately stored, a user's initial interactions with proof of solvency always start on the exchange's platform, which is a highly revealing signal — and one that is trivial for the exchange to collect.
Indeed, a malicious exchange could take this to the extreme. Consider a simple design in which the exchange takes some time (e.g. twenty-four hours) to send a Merkle tree after a depositor requests an updated copy. Over that lag, the exchange re-tags deposits from non-checking customers to checking customers, regenerates the tree, and pushes it forward. If operationalized correctly, an exchange could misappropriate almost every deposit on its platform and yet "pass" every proof of solvency check.[6]
Even with trees that are readily downloadable, there is an attack surface. In particular, an exchange can still collect signals about users and weaponize those signals over long time scales. For instance, an exchange may (reasonably) figure that an account that has not checked on its deposit since inception is unlikely to check in the next month — and may misappropriate those tokens accordingly.
Second, consider the adjudication mechanism. The status quo implementations by exchanges generally assume that the proof of solvency checks succeed. If a check fails, there are often no official mechanisms to escalate or verify, leaving users to publicize it on Twitter or other social channels. This is problematic for two reasons. First, users will have naturally high error rates, as deposits are reported at a given point in time and users may forget about trades (i.e. changes to their deposits) made since then. The lack of an official adjudication strategy makes it hard to correct these false positives, which obfuscates true positives. Second, a lone voice, or handful of voices arguing on Twitter, can easily be mistaken for FUD. Again, a malicious exchange could easily lean into this narrative, critiquing such users as engagement farmers and convincing their userbases to ignore them.
Note that many exchanges currently work with external auditors, which helps these two issues slightly. First, auditors offer a separate surface for users to check their inclusion in a list or tree (although those users will likely still need to interact with the original exchange to learn how to do so). Second, auditors may also be better equipped to adjudicate concerns from users who find a failed check. But, auditors do not entirely solve either problem, particularly when the checking and adjudication workflows are opaque. Moreover, auditors introduce new trust assumptions, as they may be compromised.
No Easy Solutions
There are a variety of potential solutions to stop exchanges from predicting users who check and to process failed attestations reliably, but not all of them are ideal. Indeed, many solutions that improve one problem worsen the second problem, or introduce new problems or complications of their own.
At the very least, though, exchanges should take two direct steps. First, they should move as much of the mechanism off its native platform as possible. Second, they should set up clear adjudication processes for failed attestations. These steps may rely on auditors or other centralized third parties for now, but they at least allow for more transparency and clarity than the current designs in many cases. Off-platform processes allow for obvious signals to be hidden from the exchange. Clear adjudication processes allow for obvious false positives to be removed and the remaining cases to be investigated further.
But, the two problems outlined in this article will persist, and they will require more creative solutions to solve fully. We discuss five hypothetical fixes here. They are not perfect, but they indicate some potential directions for future work:
- Exchanges could hold the hands of users in checking proof of solvency (and many already do). This is good because it increases the baseline rate of users checking, which makes it easier to spot issues. But this is problematic for the two core reasons noted in this article. The more extensive the process, the more signals that an exchange acquires on its users. Moreover, a poorly-written process may confuse crypto-naive users (who would have otherwise not performed checks), and this increases the frequency of false positives.
- Exchanges could offer bounties to users who find correct failed attestations. Again, this is good because it increases the baseline rate of users checking. However, by making incentives overly high-powered, this may strongly increase the false positive rate and overwhelm any adjudicators (particularly if there are no penalties on users for leveling accusations).
- Exchanges could push the total tree or user-specific proofs to all of its users automatically, which almost entirely blocks its ability to predict which users check. However, this will increase the false positive rate, by making the information (which is non-trivial to understand correctly) widely disseminated. This may also scare new users to crypto unnecessarily.
- Exchanges could generate trees or proofs faster and more frequently. This will lower the false positive rate because those lists will more accurately reflect current balances (and so users will be less prone to misinterpreting the results). However, faster generation can be a double-edged sword, because it also makes it easier for exchanges to generate "corrected" proofs and trees after a suspicious consumer starts to investigate.
- Exchanges could ask auditing firms or regulatory bodies to employ "undercover" auditors, i.e. auditors masquerading as ordinary consumers. This can be effective because such individuals are far less prone to false positives and have the ability to investigate an exchange after even one or two failed attestations. They may also be savvier about covering their tracks. However, as a whole, consumers may not trust processes that rely heavily on an arms-length relationship between an exchange and another centralized institution.
In the short run, the best solution likely involves some mix of all these processes. Neither problem introduced in this article can ever be fully eliminated, but the optimal solution will mitigate both issues (around predicting checkers and handling failed attestations) to reasonable degrees.
In the long run, though, the crypto community can truly be creative in the solution space. For instance, the team at Jump Crypto has theorized tools like browser extensions that perform proof of solvency checks automatically in the background. Such hypothetical tools would also keep track of a given account's trades, and thus could automatically pair high verification rates (both in the number of users checking and the frequency of those users checking) with low false positive rates. This is just one potential idea amongst many others, and with time and technology, these problems may recede.[7]
Conclusion
This article is not a critique of exchanges, which are rapidly building up their proof of solvency infrastructures. These are commendable and timely efforts, and we anticipate that these mechanisms will become more commonplace and mature over time.
But, proof of solvency is also no silver bullet. We already understand the gaps that the mechanism leaves when it comes to financial health, technical security, and verifiability. This article adds to the growing set of issues by pointing out gaps that can emerge when the technology behind proof of solvency is embedded within a shoddy mechanism for interacting with users. Malicious exchanges can use this to launch statistical or probabilistic attacks and impair the technology in moderate to severe degrees.
The onus is thus on exchanges to develop robust mechanisms that undermine their predictive signals and process failed attestations. The onus is also on users to hold exchanges to that standard. Indeed, we believe honest exchanges would likely welcome such efforts, as it makes it easier for them to credibly distinguish themselves from malicious ones. With these and parallel efforts, proof of solvency can be the tool that restores trust in exchanges during this crypto winter.
Please let us know if we missed anything or got anything wrong. Thanks to the research team at Jump Crypto, especially to Rahul Maganti and Don Beaver for feedback. This article does not constitute financial advice.
Appendix
In this section, we explain the derivation of the model.
First, we note that the probability of an exchange not getting caught can be modeled through a simple Binomial distribution. If an exchange must fail \(j\) attestations or more to get caught (out of \(n\) consumer deposits misappropriated), then the probability of not getting caught is simply \(j - 1\) or fewer consumers reporting failed attestations, where all terms are conditioned on the knowledge \(\phi\) that the exchange has about a user's type.
\[P(\text{Not Caught}) = \sum_{k = 0}^{j-1} {n \choose k} P(\text{Check} | \phi)^k P(\text{No Check} | \phi)^{n - k}\]
In turn, each of these terms can be decomposed into fundamentals. First, we estimate the probability of checking by conditioning on the two types of users. As a reminder, only the first type of user actually performs checks.
\[P(\text{Check} | \phi) = P(\text{Check} | \theta_1) P(\theta_1 | \phi) + P(\text{Check} | \theta_2) P(\theta_2 | \phi)\]
\[P(\text{Check} | \phi) = \pi P(\theta_1) (1 - \phi)\]
Second, we estimate the probability of not checking by conditioning on the two types of users. Both types of users do not check, although with different probabilities.
\[P(\text{No Check} | \phi) = P(\text{No Check} | \theta_1) P(\theta_1 | \phi) + P(\text{No Check} | \theta_2) P(\theta_2 | \phi)\]
\[P(\text{No Check} | \phi) = (1 - \pi) P(\theta_1) (1 - \phi) + P(\theta_2) (1 - \phi) + \phi\]
These two terms can be substituted into the original equation for the expression we use. Notice that this nests the simple model that does not partition users into two types, by setting \(\phi = 0\).
There is heterogeneity in how these various exchanges operationalize proof of solvency. Those details are not particularly important for the article. In general, though, most exchanges utilize an external auditor to audit deposits and intermediate user interactions, although many exchanges also let users directly access the Merkle tree of deposits. ↩︎
Some of the terms in this domain are overloaded. For instance, Jesse Powell of Kraken has referred to the overall solvency check as "proof of reserves." In this article, we hew to the convention presented in Dagher et al (2015), Chalkias et al (2020), and other academic work — in which proof of reserves refers to assets, proof of liabilities refers to deposits, and proof of solvency refers to the balance of the two. ↩︎
This externality problem would be mitigated if exchanges segmented each user's deposit from other users' deposits. However, exchanges do not do this from a practical perspective. Moreover, even if they did, it is unclear if bankruptcy proceedings would recognize this segmentation as meaningful. ↩︎
This may seem like an assumption, but it is just a re-framing of the problem. Any space can be partitioned into disjoint and exhaustive types, e.g. the world is comprised of Joe Bidens and non-Joe Bidens. ↩︎
In this stylized model, consumers have homogeneous deposit distributions, and so it is always a strictly dominant strategy for an exchange to target a \(\theta_2\) consumer. In practice, there may be heterogeneity in deposit size and that would complicate the targeting decision. As one example, we expect institutional clients — who are both professional and well-capitalized — to be checking their deposits constantly, meaning that an exchange would likely leave their deposits untouched unless they have very sophisticated adversarial strategies. ↩︎
There is an even more alarming attack vector available, which is where an exchange sends different trees to different individuals, allowing them to even bypass the step of regenerating trees. Fortunately, this particular vector can be mitigated as long as exchanges publish the hash of the Merkle root on their platform. ↩︎
One novel solution that has been discussed by Vitalik and others is non-custodial exchanges. But, this represents a complete paradigm shift in the purpose of an exchange, and so we do not consider it as a direct solution to the problems outlined in this article. ↩︎
Share
Stay up to date with the latest from Jump_
More articles
Disclaimer
The information on this website and on the Brick by Brick podcast or Ship Show Twitter spaces is provided for informational, educational, and entertainment purposes only. This information is not intended to be and does not constitute financial advice, investment advice, trading advice, or any other type of advice. You should not make any decision – financial, investment, trading or otherwise – based on any of the information presented here without undertaking your own due diligence and consulting with a financial adviser. Trading, including that of digital assets or cryptocurrency, has potential rewards as well as potential risks involved. Trading may not be suitable for all individuals. Recordings of podcast episodes or Twitter spaces events may be used in the future.