Preview

The Herald of the Siberian State University of Telecommunications and Information Science

Advanced search

Barrier Optimization on Asymmetrical NUMA Subsystems

https://doi.org/10.55648/1998-6920-2021-15-1-36-49

Abstract

Algorithm MinNumaDist for barrier’s root selection is proposed. A root process allocates memory pages for shared counters and flags from its NUMA node. Total distance is minimized to all NUMA nodes (closeness centrality) by the algorithm. MinNumaDist reduces barrier’s time by 1035% for asymmetrical NUMA subsystems - for different number of processes on NUMA nodes or different number of NUMA nodes used from each socket.

Keywords


About the Authors

M. .. Kurnosov
Сибирский государственный университет телекоммуникаций и информатики; Институт физики полупроводников им. А. В. Ржанова Сибирского отделения Российской академии наук
Russian Federation


E. .. Tokmasheva
Сибирский государственный университет телекоммуникаций и информатики
Russian Federation


References

1. Graham R., Gorentla M., Ladd J., Shami P., Rabinovitz I., Filipov V., Shainer G. Cheetah: A Framework for Scalable Hierarchical Collective Operations // Proc. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID11), 2011. P. 73-83.

2. Zhu H., Goodell D., Gropp W., Thakur R. Hierarchical Collectives in MPICH2 // Proc. European PVM/MPI, 2009. LNCS, V. 5759. P. 325-336.

3. Graham R L., Shipman G. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives // Proc. 15th European PVM/MPI Users' Group Meeting, 2008. P. 130-140.

4. Jain S., Kaleem R., Balmana M., Langer A., Durnov D., Sannikov A. and Garzaran M. Framework for Scalable Intra-Node Collective Operations using Shared Memory // Proc. International Conference for High Performance Computing, Networking, Storage, and Analysis (SC-2018), 2018. P. 374-385.

5. Yew P. C., Tzeng N. F., Lawrie D. H. Distributing Hot Spot Addressing in Large Scale Multiprocessors // IEEE Transactions on Computers. 1987. V. C-36, Is. 4. P. 388-395.

6. Mellor-Crummey J. M., Scott M. L. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors // ACM Transactions on Computer Systems. 1991. V. 9 (1). P. 21-65.

7. Tzeng N.-F., Kongmunvattana A. Distributed Shared Memory Systems with Improved Barrier Synchronization and Data Transfer // Proc. 11th International Conference on Supercomputing, 1997. P.148-155.

8. Hengsen D., Finkel R., Manber U. Two Algorithms for Barrier Synchronization // Int. Journal of Parallel Programming. 1988. V. 17, Is. 1. P. 1-17.

9. Brooks E. The butterfly barrier // Journal of Parallel Programming. 1986. V. 15, Is. 4. P. 295-307.


Review

For citations:


Kurnosov M..., Tokmasheva E... Barrier Optimization on Asymmetrical NUMA Subsystems. The Herald of the Siberian State University of Telecommunications and Information Science. 2021;(1):36-49. (In Russ.) https://doi.org/10.55648/1998-6920-2021-15-1-36-49

Views: 266


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6920 (Print)