Handling of Failures-Coordinator Failure If coordinator fails while the commit protocol for T is executing then participating sites must decide on T's fate: 1.If an active site contains a <commit T>record in its log,then Tmust be committed. 2.If an active site contains an <abort T>record in its log,then T must be aborted. 3.If some active participating site does not contain a <ready T>record in its log,then the failed coordinator C cannot have decided to commit 7. Can therefore abort 7. 4.If none of the above cases holds,then all active sites must have a <ready 7>record in their logs,but no additional control records(such as <abort T>of <commit T>).In this case active sites must wait for C to recover,to find decision. Blocking problem:active sites may have to wait for failed coordinator to recover. Database System Concepts-7th Edition 23.12 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 23.12 ©Silberschatz, Korth and Sudarshan th Edition Handling of Failures- Coordinator Failure ▪ If coordinator fails while the commit protocol for T is executing then participating sites must decide on T’s fate: 1. If an active site contains a <commit T> record in its log, then T must be committed. 2. If an active site contains an <abort T> record in its log, then T must be aborted. 3. If some active participating site does not contain a <ready T> record in its log, then the failed coordinator Ci cannot have decided to commit T. Can therefore abort T. 4. If none of the above cases holds, then all active sites must have a <ready T> record in their logs, but no additional control records (such as <abort T> of <commit T>). In this case active sites must wait for Ci to recover, to find decision. ▪ Blocking problem: active sites may have to wait for failed coordinator to recover
Handling of Failures -Network Partition If the coordinator and all its participants remain in one partition,the failure has no effect on the commit protocol. If the coordinator and its participants belong to several partitions: Sites that are not in the partition containing the coordinator think the coordinator has failed,and execute the protocol to deal with failure of the coordinator. No harm results,but sites may still have to wait for decision from coordinator. The coordinator and the sites are in the same partition as the coordinator think that the sites in the other partition have failed,and follow the usual commit protocol. ■Again,no harm results Database System Concepts-7th Edition 23.13 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 23.13 ©Silberschatz, Korth and Sudarshan th Edition Handling of Failures - Network Partition ▪ If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol. ▪ If the coordinator and its participants belong to several partitions: • Sites that are not in the partition containing the coordinator think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. ▪ No harm results, but sites may still have to wait for decision from coordinator. ▪ The coordinator and the sites are in the same partition as the coordinator think that the sites in the other partition have failed, and follow the usual commit protocol. ▪ Again, no harm results
Recovery and Concurrency Control In-doubt transactions have a <ready T>,but neither a <commit 7下>,nor an<abort 7下log record. The recovering site must determine the commit-abort status of such transactions by contacting other sites;this can slow and potentially block recovery. Recovery algorithms can note lock information in the log. Instead of <ready T>,write out <ready T,L>L list of locks held by T when the log is written (read locks can be omitted). For every in-doubt transaction 7,all the locks noted in the <ready T,L>log record are reacquired. After lock reacquisition,transaction processing can resume;the commit or rollback of in-doubt transactions is performed concurrently with the execution of new transactions Database System Concepts-7th Edition 23.14 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 23.14 ©Silberschatz, Korth and Sudarshan th Edition Recovery and Concurrency Control ▪ In-doubt transactions have a <ready T>, but neither a <commit T>, nor an <abort T> log record. ▪ The recovering site must determine the commit-abort status of such transactions by contacting other sites; this can slow and potentially block recovery. ▪ Recovery algorithms can note lock information in the log. • Instead of <ready T>, write out <ready T, L> L = list of locks held by T when the log is written (read locks can be omitted). • For every in-doubt transaction T, all the locks noted in the <ready T, L> log record are reacquired. ▪ After lock reacquisition, transaction processing can resume; the commit or rollback of in-doubt transactions is performed concurrently with the execution of new transactions
Avoiding Blocking During Consensus Blocking problem of 2PC is a serious concern ■ Idea:involve multiple nodes in decision process,so failure of a few nodes does not cause blocking as long as majority don't fail More general form:distributed consensus problem A set of n nodes need to agree on a decision Inputs to make the decision are provided to all the nodes,and then each node votes on the decision ·The decision should be made in such a way that all nodes will“learn” the same value for the even if some nodes fail during the execution of the protocol,or there are network partitions. Further,the distributed consensus protocol should not block,as long as a majority of the nodes participating remain alive and can communicate with each other ■ Several consensus protocols,Paxos and Raft are popular More later in this chapter Database System Concepts-7th Edition 23.15 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 23.15 ©Silberschatz, Korth and Sudarshan th Edition Avoiding Blocking During Consensus ▪ Blocking problem of 2PC is a serious concern ▪ Idea: involve multiple nodes in decision process, so failure of a few nodes does not cause blocking as long as majority don’t fail ▪ More general form: distributed consensus problem • A set of n nodes need to agree on a decision • Inputs to make the decision are provided to all the nodes, and then each node votes on the decision • The decision should be made in such a way that all nodes will “learn” the same value for the even if some nodes fail during the execution of the protocol, or there are network partitions. • Further, the distributed consensus protocol should not block, as long as a majority of the nodes participating remain alive and can communicate with each other ▪ Several consensus protocols, Paxos and Raft are popular • More later in this chapter
Using Consensus to Avoid Blocking After getting response from 2PC participants,coordinator can initiate distributed consensus protocol by sending its decision to a set of participants who then use consensus protocol to commit the decision If coordinator fails before informing all consensus participants Choose a new coordinator,which follows 2PC protocol for failed coordinator If a commit/abort decision was made as long as a majority of consensus participants are accessible,decision can be found without blocking If consensus process fails(e.g.,split vote),restart the consensus Split vote can happen if a coordinator send decision to some participants and then fails,and new coordinator send a different decision The three phase commit protocol is an extension of 3PC which avoids blocking under certain assumptions Ideas are similar to distributed consensus. Consensus is also used to ensure consistency of replicas of a data item Details later in the chapter Database System Concepts-7th Edition 23.16 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 23.16 ©Silberschatz, Korth and Sudarshan th Edition Using Consensus to Avoid Blocking ▪ After getting response from 2PC participants, coordinator can initiate distributed consensus protocol by sending its decision to a set of participants who then use consensus protocol to commit the decision • If coordinator fails before informing all consensus participants ▪ Choose a new coordinator, which follows 2PC protocol for failed coordinator ▪ If a commit/abort decision was made as long as a majority of consensus participants are accessible, decision can be found without blocking • If consensus process fails (e.g., split vote), restart the consensus ▪ Split vote can happen if a coordinator send decision to some participants and then fails, and new coordinator send a different decision ▪ The three phase commit protocol is an extension of 3PC which avoids blocking under certain assumptions • Ideas are similar to distributed consensus. ▪ Consensus is also used to ensure consistency of replicas of a data item • Details later in the chapter