Номер журнала: 2020.1
Заголовок статьи: Shared memory based MPI broadcast algorithm for NUMA systems
Broadcast collective communication operation is used by many scientific applications and tend to limit overall parallel application scalability. As the number of cores per computer node keeps increasing, it becomes important for MPI to leverage shared memory for in-tranode communication. This paper investigates the design and optimization of Bcast operation for SMP/NUMA nodes. We describe an algorithm for Bcast that takes advantage of NUMA-specific placement of queues in memory for message transferring. On a Xeon Nehalem and Xeon Broadwell NUMA nodes, our implementation achieves on average 20–60 % speedup over Open MPI and MVAPICH.
M. Kurnosov, E. Tokmasheva
Bcast, broadcast, MPI, NUMA, collective communication, parallel programming, high-performance computing
Скачать полный текст