상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
학술저널

Fault Tolerance in Distributed Systems: Global States and Checkpointing

  • 0
134662.jpg

Due to the autonomous processor behavior and arbitrary communication delays, any single processor in a distributed system cannot capture the complete system state instantaneously. Therefore, gathering process—state information in different processors and channel states may be required to solve many problems in distributed systems. An algoritnm for gathering imormation from the whole system is called a global state detection algorithm ; information gathered by such an algorithm is called a global state. This paper describes the classification of global states, detection algorithms of global states, and fault-tolerant schemes based on coordinated checkpointing. The coordinated checkpointing establishes useful global states.

1. Introduction

2. Space Time Model of Distributed Computations

3. The Classification of Global States

4. Global State Detection Algorithms

5. Checkpointing Strategy

6. Requirements for Efficient Checkpointing and Recovery

7. Tightly Coordinated Checkpointing and Recovery

8. Loosely Coordinated Checkpointing and Recovery

9. Conclusion

(0)

(0)

로딩중