Fault tolerance is an important property for large scale computational systems, where geographically distributed nodes co-operate to execute tasks. Due to the number of components and their diversity, nodes, networks, disks and applications frequently fail, restart, disappear and behave unexpectedly. As the number of Cloud system components increases, the probability of failures becomes higher than that in a traditional parallel or distributed computing environment. Hence, support for the development of fault tolerant applications has been identified as one of the major technical challenges to address for the successful deployment of computational. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in Cloud Computing. Commonly utilized techniques for providing fault tolerance are job checkpointing, load balancing and replication.To study this problem, we proposed a dynamic colored graph for representing a Cloud infrastructure.
There is currently no content classified with this term.