AI/ML: Distributed Training
LT3™ eliminates the transport bottlenecks in model training and inference - dramatically reducing idle GPU time, accelerating network throughput, and reducing overall training time. No hardware or fabric changes required.
Built for collective-scale AI workloads
AI model training clusters depend on fast, synchronized communication between GPUs. But protocols like RoCEv2 struggle with packet loss, flow collisions, and congestion -leading to idle GPUs and slower time-to-train.
No need for expensive and complex hardware solutions for scheduling to avoid contention, LT3™ uses a real time erasure-coding resiliency protocol at the source. Combined with spraying packets granularly across all available paths, and decoding out-of-order without retransmissions, LT3™ achieves breakthrough results: lower collective completion time (CCT), higher GPU utilization, and faster model convergence.
Using this approach, we have achieved within 2% of the lower theoretical bound for collective completion time (CCT).
ABitRipple LT3 slashes idle GPU time and improves throughput by tackling transport issues at the source. In simulated training jobs on 256-node clusters with 200 Gbps NICs
38.3% lower CCT compared to RoCEv2 + ECMP
more compute, less wait
Performance within 2% of the theoretical minimum
Packet loss doesn’t slow or stall your data.
Lightweight, low-power, ARM-compatible.
Transparent to transfer applications and protocols.
Supports satellite, LTE, mesh, fiber, and fallback handoffs.
LT3™ was built for scale – both Intra and Inter-datacenter. It's time to remove the final bottleneck in your AI model training stack.