Jump vs the Speed of Light
Aug 16 _ 5 min read
“The speed of light is too slow.”
This was my answer when, toward the end of interviews over a decade ago, Jump Trading’s CIO Albert Saplitski asked, “What do you think is the greatest challenge facing Jump?”
His reaction suggested my answer was atypical. While this answer probably correctly categorized me as “Kevin Bowers - The Blue-Sky, Sci-Fi Guy,” it was based on my experiences creating algorithms, software, hardware, and networks for the largest supercomputers in the world to simulate laser fusion at Los Alamos, do protein origami at D.E. Shaw Research, and model materials processing plasmas at Berkeley, among others.
In pushing supercomputers to the limits, it became blindingly obvious Einstein’s speed of light limit and Shannon’s channel capacity limit strongly constrained computing. Due to basic physics and information theory, at that time, there was a dramatic mismatch between the ability of computers to crunch numbers and their ability to move numbers around. Arithmetic was practically free relative to data motion. To scale up codes, it was orders-of-magnitude more important to optimize data choreography than to optimize compute. Since then, improvements in chip manufacturing have increased this mismatch.
The puzzled looks I got when rattling off rules-of-thumb like “the speed of light is about a foot per nanosecond” to developers from more traditional backgrounds indicated they were discouraged from even musing about the existence of physical limits. Courses, books, tutorials, languages, operating systems, development tools, and hardware architectures were unaware at best (and conspired at worst) to make compute look expensive and data motion look cheap. With rare exceptions (e.g., CUDA, and this is a major factor in its success), developers could not easily express spatial locality, data flow and concurrency needs. Worse, these details were often deliberately hidden, as development environments strived to present a fiction that modern computers were just a faster version of a PC from 1982.
Unsurprisingly, these developers usually achieved results that only scratched the surface of what was physically possible. When they needed more performance, they were encouraged to buy more hardware. When their immediate bottleneck was throughput, this could work for a while. But the ultimate bottleneck is the speed of light.
It was immediately evident in my interviews with Jump that they were already head-bumping against these limits, struggling with these environments, and uncertain how to move forward. The intense competition between market participants driven by the mechanics of how exchanges match orders between buyers and sellers was forcing Jump to evolve.
Over the years, I have helped Jump transition from conventional C++ trading software in a conventional open-source Linux environment running on conventional commodity servers connected by conventional networking and backed by conventional file systems to a highly customized technology stack implemented from transistor and fiber optic up. Though most principles from my prior supercomputing work carried over directly, there were differences, including:
Traditional supercomputers are localized to just one data center.
Jump’s systems work at global scale. Exchanges are scattered throughout the world. Exchange matching policies combined with speed of light limits require a physical presence near each one to be competitive.
Traditional supercomputers tend to be procured all at once, “big-bang” style, resulting in a highly homogeneous technology stack.
Jump’s systems work in a radically heterogeneous world. Different markets have different technical and regulatory requirements for receiving market data and sending orders. Data centers have wildly varying space, power, and cooling restrictions. Communication links between exchanges and data centers have different latencies, bandwidths, reliabilities, costs, physical layers, and network protocols.
Traditional supercomputers tend to be retired and replaced whole at their end of life.
Jump’s systems evolve incrementally. New technologies are introduced, existing technologies are running, and old uncompetitive technologies are retired - all concurrently. Even if we had the desire to temporarily shut down trading in a market to upgrade the technology stack, we are often required, from a regulatory perspective, to keep trading.
If a job on a traditional supercomputer takes seconds to achieve full speed, it does not matter if it will be running for hours or more.
Jump’s systems are “prompt” at levels off-the-shelf solutions do not even try to contemplate. Taking microseconds to spin up a machine learning calculation in response to new market data is already orders-of-magnitude too slow to be competitive in many modern markets.
On a traditional supercomputer, a misbehaving job is typically killed to give other jobs a chance to run, then tried again after a postmortem debugging.
Jump’s systems are transparently fault tolerant. At global scale, equipment failures are random but routine. Worse, the competitive nature of markets results in emergent behavior akin to an unending denial-of-service attack by multiple determined sophisticated adversaries. At the same time, misbehaving systems can have severe real-world legal and financial consequences. We need to have a low bug rate to limit system downtime, the ability to fail gracefully to mitigate risks, real-time monitoring to detect errant behaviors, and transparency for operators so they can quickly resolve issues.
The developer and the user are often the same for traditional supercomputers.
Jump’s systems, on the other hand, have been adopted by a broad community. Traders are often not from high performance computing backgrounds but nevertheless need low time-to-market for competitive implementations of their strategies. Time spent retraining as a supercomputing expert is time not trading (i.e., money lost). If it takes years to adopt innovative technology, that technology might as well not exist.
Traditional supercomputers rarely have strict data retention requirements.
Jump’s systems are accountable over long time scales. We might be asked detailed questions many years in the future about what our systems were doing on a particular day and why. This creates logging, reliability, and persistence requirements far beyond that of traditional supercomputers.
Most importantly, Jump cannot buy its way out of technology bottlenecks. Exchange matching rules heavily incentivize being the first to market with the best mutually beneficial deals. Scaling up uncompetitive hardware puts us in a position to be in second (and third and …) place, expensively offering best deal benefits to no one, least of all, us.
Most commodity technology vendors are oblivious to these dynamics. Outside of rare niches like quantitative finance and extreme supercomputing, developer cost matters more than machine cost, throughput matters more than latency - if performance is a concern at all - and problems often have an embarrassingly parallel flavor.
Unsurprisingly, technology vendors have been optimizing the ease of scaling up hardware with little concern for the impact on latency.
Jump’s approach was, and had to be, different. We were already limited by the speed of light when I joined. The only option to stay competitive was to develop our own technologies.
Today, Jump’s systems for production trading and quantitative research run at the limits of the physics and information theory at planetary scale. This includes custom networking links, network protocols, network switching technology, network interfaces, ASICs, FPGAs, time synchronization, lock-free algorithms, file systems, … - all designed with real world physical limits in mind.
In short, Jump is as much a technology firm as a trading firm. And much of the tech is at the edge of science fiction.
What does this have to do with Solana? Everything.
Solana needs an independent interoperable second validator to support its community's long-term needs.
This validator needs to work at global scale in a radically heterogeneous world.
It needs to evolve incrementally from the existing validator.
Be transparently fault tolerant.
Be easily adoptable by a broad community.
In short, Solana is facing the same problems Jump has been facing. And while the latency requirements for this validator are not as stringent today as they are for quantitative finance, ideally the validator would position Solana for even more impact long term by utilizing the latest algorithms, software, hardware, and networking technologies to increase the transaction throughput and drop the cost per transaction as low as possible.
We have named this second validator Firedancer. We are developing this second validator in the open and welcome both feedback and participation from the Solana community. Further, we are tailoring the development process to enable others to produce additional independent interoperable validators.
Some might look at this project and think it impossible. And it is ambitious with many potential pitfalls but even more potentially far-reaching benefits. It is not out of our reach. And we have never backed down from a challenge. Even if that challenge is the speed of light itself.
We believe we can do this.
Because we have done this before.
Chief Scientist Dr. Bowers leads a high performance computing & networking research team at Jump. He has also done award-winning research at D.E. Shaw, Los Alamos, Bell Labs and Berkeley among others..View all posts (2)
Stay up to date with the latest from Jump_
SAFU: Creating a Standard for Whitehats
Whitehats and DeFi protocols need a shared understanding of security policy. We propose the SAFU - Simple Arrangement for Funding Upload - as a versatile and credible way to let whitehats know what to...
Oct 24 _ min
Bridging and Finality: Ethereum
In this post, we explore the tradeoff between latency and reversion risk at the major stages in Ethereum’s current proof-of-stake consensus protocol.
Mar 03 _ min
The information on this website and on the Brick by Brick podcast or Ship Show Twitter spaces is provided for informational, educational, and entertainment purposes only. This information is not intended to be and does not constitute financial advice, investment advice, trading advice, or any other type of advice. You should not make any decision – financial, investment, trading or otherwise – based on any of the information presented here without undertaking your own due diligence and consulting with a financial adviser. Trading, including that of digital assets or cryptocurrency, has potential rewards as well as potential risks involved. Trading may not be suitable for all individuals. Recordings of podcast episodes or Twitter spaces events may be used in the future.