Enhanced Coordinated Checkpointing in Distributed System
|Title||Enhanced Coordinated Checkpointing in Distributed System|
|Publication Type||Journal Article|
|Year of Publication||2015|
|Authors||Meroufel, B, Belalem, G|
|Journal||International Journal of Applied Mathematics and Informatics (IJAMI)|
|Keywords||atomicity, Checkpointing, collective I/O., consistency, coordination, data sieving, fault t olerance, I/O, initiator, overhead, rollback|
Coordinated checkpointing is a well-known method for achieving fault tolerance in distributed computing systems. This type of checkpointing selects an initiator to manage and ensure the checkpointing process. The majority of existing works ignore the role and the importance of this initiator. The work presented in this paper can be divided on two parts. In the first part, we examine the impact of initiator choice on different types of coordinated checkpointing and we prove its importance in term of performances. We propose also a simple and an effective strategy to select the best initiator each checkpointing round. In the second part of this work, we focused on the soft checkpointing and we have strengthened the role of initiator by adding a storage manager that ensures atomicity and speed of storage checkpoints files using a smart I/O strategy.