Thursday, March 19, 2009

Software is Low Latency's Weakest Link...Unless You're Running Red Hat Enterprise MRG

There's an article in today's Wall St and Technology, Software is Low Latency's Weakest Link.  The article states:

The weakest link in the low-latency value chain is older software or poorly written code, not market data feeds, lack of ultra-fast processors or older networks, according to experts at Wall Street & Technology's Accelerating Wall Street 2009 conference.
Furthermore,
"Within the applications, we see the greatest opportunity to improve latency," said Rob Wallos, global head of market data architecture at Citi during the session The Complete Low Latency Value Chain. "Hardware can give you a generic 20 percent improvement in performance, but there is only so far you can go with hardware. "
This is certainly true in general.  We have found that even moving up to exotic hardware like infiniband only provides incremental gains in performance unless you optimize your application to take advantage of that hardware.  That's why we've been working with hardware manufacturers and optimizing Red Hat Enterprise MRG for the latest hardware as well as for Linux.


For example, MRG includes RDMA drivers to achieve extremely low latency on infiniband.  Compare the results of MRG Messaging running on infiniband with standard TCP versus with our RDMA drivers:


This chart illustrates latency using 1K messages at a sustained throughput of 50,000 messages/second using MRG Messaging.  This is for fully reliable application-level latency between three peers: a producer is sending a message through a broker to a client, and that message is acknowledged back.  So, there are four network hops in this case.  As you can see with standard TCP (the red line), the latency is nothing ground-breaking at around 1 millsecond, even though we're using infiniband.  In other words, moving the application to infiniband doesn't automatically improve latency significantly.

With our RDMA drivers , however, MRG can achieve latencies an order of magnitude better--about 70 microseconds on this same particular hardware.  And, remember, this is for 1K message sizes and 4 network hops!  So, yes, it is true that in general, low latency bottlenecks are in software.  However, we have been developing MRG to take full advantage of modern hardware and also Linux, so we've pushed the bottlenecks back to the hardware domain in many cases.

I should also note that our next release of MRG will include a cross-memory driver for co-locating our messaging broker with an application client.  This would remove one of the network links for latency and should allow us to halve our latency performance.

6 comments:

Unknown said...

InfiniBand is "exotic hardware?" Go tell Larry Ellison that his HP Oracle Database Machine and Exadata Storage server (all running InfiniBand) is exotic hardware...let me know what he says...

Rmush said...
This comment has been removed by the author.
aRman said...

could you please help me understand how are you counting the "4 hops" while measuring latency? are these just the application servers, or does the path between the application server and the network switch count as well?

Bryan Che said...

These are the 4 network hops:

producer --msg--> broker
producer <--ack-- broker

broker --msg--> consumer
broker <--ack-- consumer

Judy Misbin-May said...

I think that Network is your hot spot. Most top traders are within microseconds in the algo models - that leaves them to the one who gets there fastest. Check out www.customfibernetworks.com to learn more about the importance of the network in the low latency race. See the White Paper

Unknown said...

So I would certainly take care of Supermarkets similar Diablo III itemsto Acme, in spite of WalMart's size. My check out is very that being employed by the government doesn't make a tangible tendency. It will be possible the very same would be genuine intended for some other buyers (elizabeth.h., condition govts). My partnerBuy Guild Wars 2 Gold and i researched an associated style in a very 3 years ago article from the Harvard Rules & Insurance plan Critique: