Ultra-Low Latency
Trading Infrastructure Technology
Low-Latency Trading Infrastructure 101:
Latency Limits of Software - The PCIe Bus
An idealised (zero-latency) engine
As a thought experiment, imagine an ideal trading engine – one that responds with an order immediately upon receipt of market data.
The engine is idealised to the point that its latency is zero. What is the lowest possible wire-to-wire, tick-to-trade latency achievable?
Following the path the market data traverses from the network to the central processing unit (CPU) illuminates this.
Network Interface Card transmission time
Electronic impulses representing each bit of the market data are delivered from the end of the wire (or fibre-optic cable) to the network interface card (NIC). This is known as the transmission time. For a 120-byte packet this takes around 96 nanoseconds at a 10G line rate.
The NIC processes these bits as they arrive to register if they form part of an ethernet frame and de-scrambles the wire format into usable data for the frame.
These de-scrambled bytes are then checked, possibly discarded using checksums, and finally put on the Peripheral Component Interconnect Bus (PCIe bus) so they can be stored in memory and read by the CPU.
Sending data across the PCIe bus
The PCIe bus connects the server CPU to peripheral hardware, including any screens, keyboards, and importantly for a server, disks and the NIC.
The PCIe requires data to be divided into packets, encoded, and for this transaction to be processed, for the packets then to pass through flow control logic, arbitration, and scheduling. The packets are then physically transmitted.
To the CPU and back
After crossing the PCIe Bus, the market data is now available at the CPU, showing what has changed in the order book. In an idealised engine, zero time passes before data is sent back to the NIC for transmission, following the same path but in reverse.
Beyond a thought experiment: the real-world latency cost
In real-world trading, it takes time for market data to be processed by the NIC, cross the PCIe Bus, be read by the CPU, and travel back via the PCIe Bus and NIC to be transmitted.
A simple way to measure this latency is by performing a ping-pong to properly set up the server and comparing timestamps.
Cisco (formerly Exablaze) optimised a server containing a Cisco Nexus K3P-S NIC and recorded a one-half round-trip time (½ RTT) latency of 750 nanoseconds.