千家信息网

怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm

发表于:2024-09-22 作者:千家信息网编辑
千家信息网最后更新 2024年09月22日,这篇文章主要讲解了"怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢
千家信息网最后更新 2024年09月22日怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm

这篇文章主要讲解了"怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm"吧!

一、The Deadlock Detection Algorithm

The Deadlock Detection Algorithm--------------------------------死锁检测算法Since we allow user transactions to request locks in any order, deadlockis possible.  We use a deadlock detection/breaking algorithm that isfairly standard in essence, but there are many special considerationsneeded to deal with Postgres' generalized locking model.PG允许事务以乱序的方式申请锁,因此会出现死锁的可能.PG使用的检测算法相当标准,但为了适配PG的锁模型因此有不少特别的考虑.A key design consideration is that we want to make routine operations(lock grant and release) run quickly when there is no deadlock, andavoid the overhead of deadlock handling as much as possible.  We do thisusing an "optimistic waiting" approach: if a process cannot acquire thelock it wants immediately, it goes to sleep without any deadlock check.But it also sets a delay timer, with a delay of DeadlockTimeoutmilliseconds (typically set to one second).  If the delay expires beforethe process is granted the lock it wants, it runs the deadlockdetection/breaking code. Normally this code will determine that there isno deadlock condition, and then the process will go back to sleep andwait quietly until it is granted the lock.  But if a deadlock conditiondoes exist, it will be resolved, usually by aborting the detectingprocess' transaction.  In this way, we avoid deadlock handling overheadwhenever the wait time for a lock is less than DeadlockTimeout, whilenot imposing an unreasonable delay of detection when there is an error.一个设计上考虑的关键点是在没有死锁的情况下可以让锁授予和释放可以执行得更快.通过使用一种"乐观等待"机制来实现:如果进程不能马上获取请求的锁,则马上休眠而不执行任何死锁检测.但同时设置了延迟时钟,延迟时长为DeadlockTimeout毫秒(典型值是1s).如果在进程被授予锁前延迟过期,则执行死锁检测和中断代码.通常来说,这些代码会确定是否存在死锁条件,然后进程会重新休眠并等待直至可以获取锁.但如果死锁条件确实存在,那需解决此问题,通过来说会回滚执行检测的事务.通过这种方法,避免锁等待时间小于DeadlockTimeout时的死锁处理开销,而在出现错误时则不执行不合理的延迟检测.Lock acquisition (routines LockAcquire and ProcSleep) follows these rules:锁获取(LockAcquire和ProcSleep子过程)遵循以下原则:1. A lock request is granted immediately if it does not conflict withany existing or waiting lock request, or if the process already holds aninstance of the same lock type (eg, there's no penalty to acquire a readlock twice).  Note that a process never conflicts with itself, eg onecan obtain read lock when one already holds exclusive lock.1.对于如无冲突或者进程已持有相同类型的锁的锁请求(如获取两次读锁),则马上授予锁.注意进程永远不会与自己冲突,比如已获取独占锁的情况下可以获取读锁.2. Otherwise the process joins the lock's wait queue.  Normally it willbe added to the end of the queue, but there is an exception: if theprocess already holds locks on this same lockable object that conflictwith the request of any pending waiter, then the process will beinserted in the wait queue just ahead of the first such waiter.  (If wedid not make this check, the deadlock detection code would adjust thequeue order to resolve the conflict, but it's relatively cheap to makethe check in ProcSleep and avoid a deadlock timeout delay in this case.)Note special case when inserting before the end of the queue: if theprocess's request does not conflict with any existing lock nor anywaiting request before its insertion point, then go ahead and grant thelock without waiting.2.否则,进程会加入到锁等待队列.通常来说,会加入到队列的末尾,但存在例外情况:如果进程已在对象上持有锁但与其他等待者的请求存在冲突,进程会插入到队列中这些waiter的前面.(如果不执行该检查,死锁检测代码会调整队列顺序以解决冲突,但在ProcSleep中执行该检查成本相对会低一点,同时在这种情况下可以避免一次死锁检测超时延迟)注意插入时在队列末尾的特殊情况:如果进程在插入点前的请求与现存锁或其他请求不存在冲突,则放在队列的最前面同时无需等待马上授予锁.When a lock is released, the lock release routine (ProcLockWakeup) scansthe lock object's wait queue.  Each waiter is awoken if (a) its requestdoes not conflict with already-granted locks, and (b) its request doesnot conflict with the requests of prior un-wakable waiters.  Rule (b)ensures that conflicting requests are granted in order of arrival. Thereare cases where a later waiter must be allowed to go in front ofconflicting earlier waiters to avoid deadlock, but it is notProcLockWakeup's responsibility to recognize these cases; instead, thedeadlock detection code will re-order the wait queue when necessary.锁释放时,ProcLockWakeup会扫描锁定对象的等待队列.唤醒满足(a)锁请求与现存锁不存在冲突(b)请求与先前未唤醒的waiter不存在冲突 这两个条件的waiter.规则(b)确保存在冲突的请求必须按到达的先后顺序授予.避免在允许后来者可在出现冲突的waiter前可能出现的死锁,但这不是ProcLockWakeup的职责,相反,死锁检测代码会在需要时重组等待队列.To perform deadlock checking, we use the standard method of viewing thevarious processes as nodes in a directed graph (the waits-for graph orWFG).  There is a graph edge leading from process A to process B if Awaits for B, ie, A is waiting for some lock and B holds a conflictinglock.  There is a deadlock condition if and only if the WFG contains acycle.  We detect cycles by searching outward along waits-for edges tosee if we return to our starting point.  There are three possibleoutcomes:为了执行死锁检测,使用标准方法将各个进程视为有向图(WFG)中的节点.图中,如果A进程等待B,那么会有存在一条从A指向B的边.当且仅当如出现循环时,则会出现死锁.沿着等待的边进行检索看看是否会回到出发点来判断是否出现循环,这里有3种可能的情况:1. All outgoing paths terminate at a running process (which has nooutgoing edge).1. 所有出发的路径都终止于一个正在运行的进程(而没有从该进程出发的边).2. A deadlock is detected by looping back to the start point.  Weresolve such a deadlock by canceling the start point's lock request andreporting an error in that transaction, which normally leads totransaction abort and release of that transaction's held locks.  Notethat it's sufficient to cancel one request to remove the cycle; we don'tneed to kill all the transactions involved.2. 通过判断是否回到开始点来进行死锁检测.通过取消开始点的锁请求并在该事务反馈一个错误来解决死锁,该事务会取消并释放持有的锁.注意:取消一个锁就可以消除循环了,而不需要杀掉所有相关的事务.3. Some path(s) loop back to a node other than the start point.  Thisindicates a deadlock, but one that does not involve our startingprocess. We ignore this condition on the grounds that resolving such adeadlock is the responsibility of the processes involved --- killing ourstart-point process would not resolve the deadlock.  So, cases 1 and 3both report "no deadlock".3. 某些路径回到某些节点而不是开始点.这意味着死锁,但不涉及到开始进程.基于由相关进程来解决死锁这一原则,PG会忽略此条件(杀掉开始点进程无助于解决死锁).因此,第1种和第3种情况会反馈"no deadlock".Postgres' situation is a little more complex than the standard discussionof deadlock detection, for two reasons:PG的情况比起标准的死锁检查有一点点的复杂,有2点原因:1. A process can be waiting for more than one other process, since theremight be multiple PROCLOCKs of (non-conflicting) lock types that allconflict with the waiter's request.  This creates no real difficultyhowever; we simply need to be prepared to trace more than one outgoingedge.1. 进程可等待超过1个其他进程,因为存在多个与等待请求相冲突的锁类型相应的PROCLOCKs.这不会造成实际的困难,仅需要跟踪多个出发的边即可.2. If a process A is behind a process B in some lock's wait queue, andtheir requested locks conflict, then we must say that A waits for B, sinceProcLockWakeup will never awaken A before B.  This creates additionaledges in the WFG.  We call these "soft" edges, as opposed to the "hard"edges induced by locks already held.  Note that if B already holds anylocks conflicting with A's request, then their relationship is a hard edgenot a soft edge.2. 如果进程A在等待队列中在B进程之后,而它们的请求锁冲突,这时候我们会认为A等待B,因为ProcLockWakeup在B之前不会唤醒A.这会在WFG中产生额外的我们称之为soft(相对于实际已持有的锁而言)的边.注意如果B已持有所有与A请求冲突的锁,那么它们的关系是硬边而不是软边.A "soft" block, or wait-priority block, has the same potential forinducing deadlock as a hard block.  However, we may be able to resolvea soft block without aborting the transactions involved: we can insteadrearrange the order of the wait queue.  This rearrangement reverses thedirection of the soft edge between two processes with conflicting requestswhose queue order is reversed.  If we can find a rearrangement thateliminates a cycle without creating new ones, then we can avoid an abort.Checking for such possible rearrangements is the trickiest part of thealgorithm."soft"阻塞或者等待优先级阻塞,与硬阻塞的处理一样.但是,我们可以不需要取消事务而解决软阻塞:重新排列等待队列的顺序即可.重排会调转相关进程的优先顺序.如果能够重排而解决循环,那么可以避免取消事务.检查是否存在这样的重排是算法中最棘手的部分.The workhorse of the deadlock detector is a routine FindLockCycle() whichis given a starting point process (which must be a waiting process).It recursively scans outward across waits-for edges as discussed above.If it finds no cycle involving the start point, it returns "false".(As discussed above, we can ignore cycles not involving the start point.)When such a cycle is found, FindLockCycle() returns "true", and as itunwinds it also builds a list of any "soft" edges involved in the cycle.If the resulting list is empty then there is a hard deadlock and theconfiguration cannot succeed.  However, if the list is not empty, thenreversing any one of the listed edges through wait-queue rearrangementwill eliminate that cycle.  Since such a reversal might create cycleselsewhere, we may need to try every possibility.  Therefore, we need tobe able to invoke FindLockCycle() on hypothetical configurations (waitorders) as well as the current real order.死锁检测器主要有例程FindLockCycle实现,该例程输入为起始点过程(正在等待的进程).如上所述,递归扫描指向到其他进程的边.如果从开始点没有发现循环,返回"false".(正如上述所讨论的,忽略与开始点无关的循环).如果发现了存在循环,FindLockCycle例程会返回"true",在展开(unwinds)时,它还构建了一个包含涉及所有软边的链表.如果结果链表为空,那么只存在硬死锁,重排不会成功.但是,如果链表非空,递归重排链表中的边检查是否可以消除循环.由于这样的重排可能会在其他地方产生新的循环,因此需要尝试各种可能.因此,我们需要具备在假设配置和实际顺序调用FindLockCycle的能力.The easiest way to handle this seems to be to have a lookaside table thatshows the proposed new queue order for each wait queue that we areconsidering rearranging.  This table is checked by FindLockCycle, and itbelieves the proposed queue order rather than the real order for each lockthat has an entry in the lookaside table.看起来最简单的方法是使用一个后备表,该表显示了我们正在考虑队列重排的每个建议的新顺序.该表通过FindLockCycle检测,并且该例程相信建议的队列顺序而不是实际顺序.We build a proposed new queue order by doing a "topological sort" of theexisting entries.  Each soft edge that we are currently consideringreversing creates a property of the partial order that the topological sorthas to enforce.  We must use a sort method that preserves the inputordering as much as possible, so as not to gratuitously break arrivalorder for processes not involved in a deadlock.  (This is not true of thetsort method shown in Knuth, for example, but it's easily done by a simpledoubly-nested-loop method that emits the first legal candidate at eachstep.  Fortunately, we don't need a highly efficient sort algorithm, sincethe number of partial order constraints is not likely to be large.)  Notethat failure of the topological sort tells us we have conflicting orderingconstraints, and therefore that the last-added soft edge reversalconflicts with a prior edge reversal.  We need to detect this case toavoid an infinite loop in the case where no possible rearrangement willwork: otherwise, we might try a reversal, find that it still leads toa cycle, then try to un-reverse the reversal while trying to get rid ofthat cycle, etc etc.  Topological sort failure tells us the un-reversalis not a legitimate move in this context.对每一存在的条目使用"topological sort"创建可能的新队列顺序.我们目前考虑反转的每个软边都创建了拓扑排序必须强制执行的偏序属性.我们必须使用一种尽可能保留输入顺序的排序方法,这样就不会无缘无故破坏不涉及死锁进程的到达顺序.注意拓扑排序如果失败则意味着存在冲突排序约束,因此最好添加的软边反转与先前的反转存在冲突.需要检测这种情况以避免出现死循环.可以试着反转,如果发现它仍然会导致循环,那么再反转,试图摆脱那个循环,等等.拓扑排序失败意味着在这种情况下反向操作是不合法的.So, the basic step in our rearrangement method is to take a list ofsoft edges in a cycle (as returned by FindLockCycle()) and successivelytry the reversal of each one as a topological-sort constraint added towhatever constraints we are already considering.  We recursively searchthrough all such sets of constraints to see if any one eliminates allthe deadlock cycles at once.  Although this might seem impossiblyinefficient, it shouldn't be a big problem in practice, because therewill normally be very few, and not very large, deadlock cycles --- ifany at all.  So the combinatorial inefficiency isn't going to hurt us.Besides, it's better to spend some time to guarantee that we've checkedall possible escape routes than to abort a transaction when we didn'treally have to.因此,重排方法的基础步骤是获取循环中的软边链表(FindLockCycle返回),依次尝试将每个约束的反转作为拓扑排序约束添加到已经考虑的其他约束中.递归检索所有这样的约束集合来看看是否存在可以消除循环的可能.虽然这看来不太可能很有效,但在实践中也没有什么问题,因为死锁循环的数量通常很小.因此这样的组合并不会有什么问题.话又说回来,最好花点时间来保证已检测所有可能的路径而不是什么都不做就取消事务.Each edge reversal constraint can be viewed as requesting that the waitingprocess A be moved to before the blocking process B in the wait queue theyare both in.  This action will reverse the desired soft edge, as well asany other soft edges between A and other processes it is advanced over.No other edges will be affected (note this is actually a constraint on ourtopological sort method to not re-order the queue more than necessary.)Therefore, we can be sure we have not created any new deadlock cycles ifneither FindLockCycle(A) nor FindLockCycle(B) discovers any cycle.  Giventhe above-defined behavior of FindLockCycle, each of these searches isnecessary as well as sufficient, since FindLockCycle starting at theoriginal start point will not complain about cycles that include A or Bbut not the original start point.每个边的反转约束都可以看做是请求等待进程A移到等待队列的阻塞进程B的前面.这样的做法会反转有向软边,现对于在A和其他进程之间的其他软边,它是advanced over的.没有其他边受影响,因此可以确保不会出现新的死锁循环.根据以上定义的FindLockCycle行为,这些搜索都是必要的,也是充分的,因为从起始点开始的FindLockCycle不会认为包含A或B但不包含初始起始点的循环有问题.In short then, a proposed rearrangement of the wait queue(s) is determinedby one or more broken soft edges A->B, fully specified by the output oftopological sorts of each wait queue involved, and then tested by invokingFindLockCycle() starting at the original start point as well as each ofthe mentioned processes (A's and B's).  If none of the tests detect acycle, then we have a valid configuration and can implement it byreordering the wait queues per the sort outputs (and then applyingProcLockWakeup on each reordered queue, in case a waiter has become wakable).If any test detects a soft cycle, we can try to resolve it by adding eachsoft link in that cycle, in turn, to the proposed rearrangement list.This is repeated recursively until we either find a workable rearrangementor determine that none exists.  In the latter case, the outer levelresolves the deadlock by aborting the original start-point transaction.简短的来说,等待队列的重排通过打破一个或多个A->B这样的软边来确定,由所涉及的每个等待队列的拓扑排序的输出完全指定,然后通过调用FindLockCycle进行测试,该函数从原始的起始点以及上面提到的每个进程(A&B)开始.如果没有一个测试检测到循环,那么我们有了一个有效的配置,通过重排每个重新排序的队列来实现这一点.如果测试发现了软循环,可以尝试通过将该循环中的每个软链接依次添加到建议的重排链表中来解决.递归处理直至找到了可用的重排或者确定重排不存在.在后续的情况中,外层通过取消开始点事务来解决死锁.The particular order in which rearrangements are tried depends on theorder FindLockCycle() happens to scan in, so if there are multipleworkable rearrangements of the wait queues, then it is unspecified whichone will be chosen.  What's more important is that we guarantee to tryevery queue rearrangement that could lead to success.  (For example,if we have A before B before C and the needed order constraints areC before A and B before C, we would first discover that A before Cdoesn't work and try the rearrangement C before A before B.  This wouldeventually lead to the discovery of the additional constraint B before C.)尝试重排的特定顺序取决于FindLockCycle碰巧扫描进来的顺序,因此如果存在多个等待队列上的重排,那么选择哪一个就不确定.更重要的是我们确保尝试每个队列重排可能会成功.(比如,如果A在B和C的前面,B在C的前面,排序约束是C在A之前和B在C之前,首先会发现A在C之前是不行的,这时候会重排C在A和B之前.这会导致额外的约束B在C之前)

感谢各位的阅读,以上就是"怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm"的内容了,经过本文的学习后,相信大家对怎么理解PostgreSQL Locks中的The Deadlock Detection Algorithm这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!

0