Tracing NCCL AllReduce on Real GPU Hardware with eBPF
In my previous post I traced RDMA AllReduce on SoftRoCE using eBPF. SoftRoCE runs RDMA over the Linux kernel network stack, so eBPF could attach probes and observe the data path directly. This post...