Shared memory based MPI broadcast algorithm for NUMA systems
Abstract
About the Authors
M. .. KurnosovRussian Federation
E. .. Tokmasheva
Russian Federation
References
1. Thakur R., Rabenseifner R., Gropp W. Optimization of Collective Communication Operations in MPICH // High Performance Computing Applications. 2005. V. 19 (1). P. 49-66.
2. Sanders P., Speck J., Traff J. L. Two-Tree Algorithms for Full Bandwidth Broadcast, Reduction and Scan // Parallel Computing. 2009. V. 35 (12). P. 581-594.
3. Traff J., Ripke A. Optimal Broadcast for Fully Connected Processor-node Networks // Parallel and Distributed Computing. 2008. V. 68 (7). P. 887-901.
4. Bin Jia. Process Cooperation in Multiple Message Broadcast // Parallel Computing. 2009 V. 35. P. 572-580.
5. Lameter C. NUMA (Non-Uniform Memory Access): An Overview // ACM Queue. 2013. V. 11 (7). P. 1-12.
6. Li S., Hoefler T. and Snir M. NUMA-Aware Shared Memory Collective Communication for MPI // Proc. of the 22nd Int. symposium on High-performance parallel and Distributed computing, 2013. P. 85-96.
7. Wu M., Kendall R and Aluru S. Exploring Collective Communications on a Cluster of SMPs // Proc. of the HPCAsia, 2004. P. 114-117.
8. MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE // URL: http://mvapich.cse.ohio-state.edu/ (дата обращения: 12.12.2019).
9. Graham R L., Shipman G. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives // Proc. of the 15th European PVM/MPI Users’ Group Meeting, 2008. P.130-140.
10. Jain S., Kaleem R, Balmana M., Langer A., Durnov D., Sann^ov A. and Garzaran M. Framework for Scalable Intra-Node Collective Operations using Shared Memory // Proc. of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC-2018), 2018. P. 374-385.
Review
For citations:
Kurnosov M..., Tokmasheva E... Shared memory based MPI broadcast algorithm for NUMA systems. The Herald of the Siberian State University of Telecommunications and Information Science. 2020;(1):42-59. (In Russ.)