Algorithms of fault-tolerant resources management of geographically distributed computer systems
Abstract
About the Authors
A. Y. PolyakovRussian Federation
O. V. Moldovanova
Russian Federation
A. A. Paznikov
Russian Federation
M. G. Kernosov
Russian Federation
S. N. Mamoilenko
Russian Federation
A. V. Efimov
Russian Federation
References
1. Хорошевский В.Г. Распределённые вычислительные системы с программируемой структурой// Вестник СибГУТИ. 2010. №2 (10). С. 3-41.
2. Torque Resource Manager [Электронный ресурс]. Режим доступа: http://www.adaptivecomputing.com/products/open-source/torque/ (дата обращения 03.09.2014).
3. SLURM: Simple Linux Utility for Resource Management, A. Yoo, M. Jette, and M. Grondona, Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pages 44-60, Springer-Verlag, 2003.
4. Huedo E., Montero R.S., Llorente I.M. A framework for adaptive execution on grids // Software - Practice and Experience (SPE). 2004. Vol. 34. P. 631-651.
5. Berman F., Wolski R., Casanova H. Adaptive computing on the grid using AppLeS // IEEE Trans. on Parallel and Distributed Systems. 2003. Vol. 34. P. 369-382.
6. Cooper K., Dasgupta A., Kennedy C.K. [et al]. New Grid Scheduling and Rescheduling Methods in the GrADS Project // Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04). 2004. Vol. 34. P. 199-206.
7. Buyya R., Abramson D., Giddy J. Nimrod/G: An architecture for a resource management and scheduling system in a global computational Grid // Proc. of the 4th International Conference on High Performance Computing in Asia-Pacific Region. 2000. P. 283-289.
8. Frey J., Tannenbaum T., Livny M. [et al.] Condor-G: A computation management agent for multi-institutional grids // Cluster Computing. 2001. Vol. 5. P. 237-246.
9. Andreetto P., Borgia S., Dorigo A. Practical approaches to grid workload and resource management in the EGEE project / // In CHEP ’04: Proceedings of the Conference on Computing in High Energy and Nuclear Physics. 2004. Vol. 2. P. 899-902.
10. Wijngaards N., Overeinder B., Steen M., Brazier F. Supporting internet-scale multi-agent systems // Data Knowledge Engineering. 2002. Vol. 41. P. 229-245.
11. Caron E., Garonne V., Tsaregorodtsev A. Evaluation of Meta-scheduler Architectures and Task Assignment Policies for High Throughput Computing // Technical report № 5576. Institut National de Recherche en Informatique et en Automatique. 2005. 16p.
12. Deelman E., Singh G., Su M.-H. [et al.] Pegasus: A framework for mapping complex scientific workflows onto distributed systems // Scientific Programming. 2005. Vol. 13(3). P. 219-237.
13. Hull D., Wolstencroft K., Stevens R. [et al.] Taverna: a tool for building and running workflows of services / // Nucleic Acids Research. 2006. Vol. 34. P. 729-732.
14. Matthew I., Shields M., Wang I., Philp R. Grid Enabling Applications Using Triana // In Workshop on Grid Applications and Programming Tools. 2003. 11p.
15. Fahringer T., Prodan R., Duan R. [et al.] ASKALON: A Grid Application Development and Computing Environment // 6th IEEE/ACM International Workshop on Grid Computing. 2005. P. 122-131.
16. Young L., Mcgough S., Newhouse S., Darlington J. Scheduling Architecture and Algorithms within the ICENI Grid Middleware // In UK e-Science All Hands Meeting. 2003. P. 5-12.
17. Altintas I., Berkley C., Jaeger E. [et al.] Kepler: An Extensible System for Design and Execution of Scientific Workflows // International Conference on Scientific and Statistical Database Management. 2004. P. 21-23.
18. Kurnosov M., Paznikov A. Efficiency analysis of decentralized grid scheduling with job migration and replication // ACM International Conference on Ubiquitous Information Management and Communication. 2013. 7 p.
19. Feitelson D.G., [et. al]. Theory and practice in parallel job scheduling // Job Scheduling Strategies for Parallel Processing. 1997. Vol. 1291. P. 1 - 34.
20. Shmueli E.,Feitelson D.G. Backfilling with lookahead to optimize the packing of parallel jobs. J. Parallel & Distributed Comput. 2005. Vol. 65. Iss. 9. P. 1090 - 1107.
21. Cirne W., Grande C., Berman F. When the herd is smart aggregate behavior in the selection of job request. IEEE Transactions in Parallel and Distributed Systems. 2003. Vol. 14. P. 181 - 192.
22. Мамойленко С.Н., Ефимов А.В. Алгоритмы планирования решения масштабируемых задач на распределённых вычислительных системах. Вестник ГОУ ВПО «СибГУТИ». 2010. № 2. С. 66 - 78.
23. Cirne W., Berman F. A model for moldable supercomputer jobs. 15th Intl. Parallel & Distributed Processing Symp. 2001 URL: http://cseweb.ucsd.edu/~walfredo/papers/ moldability-model.pdf (дата обращения: 29.09.2014).
24. Elnozahy, E.N., [et. al.] A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys. 2002. Vol. 34. N. 3. P. 375 - 408.
25. Ansel J., Arya K., Cooperman G. DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop. IEEE International Parallel and Distributed Processing Symposium (IPDPS'09). 2009. 12 p.
26. Поляков А.Ю. О восстановлении программ из контрольной точки / А.Ю. Поляков. Вестник ЮУрГУ. Серия «Математическое моделирование и программирование». 2010. № 35(211). С. 91 - 103.
Review
For citations:
Polyakov A.Y., Moldovanova O.V., Paznikov A.A., Kernosov M.G., Mamoilenko S.N., Efimov A.V. Algorithms of fault-tolerant resources management of geographically distributed computer systems. The Herald of the Siberian State University of Telecommunications and Information Science. 2014;(4):11-29. (In Russ.)