千家信息网

PostgreSQL 源码解读(33)- 查询语句#18(查询优化-表达式预处理#3)

发表于:2025-01-22 作者:千家信息网编辑
千家信息网最后更新 2025年01月22日,本节简单介绍了PG查询优化表达式预处理中的规范化过程。规范化具体的做法一是忽略NULL以及OR中的False,And中的True(实现函数find_duplicate_ors),二是拉平谓词(实现函数
千家信息网最后更新 2025年01月22日PostgreSQL 源码解读(33)- 查询语句#18(查询优化-表达式预处理#3)

本节简单介绍了PG查询优化表达式预处理中的规范化过程。规范化具体的做法一是忽略NULL以及OR中的False,And中的True(实现函数find_duplicate_ors),二是拉平谓词(实现函数:pull_ors/pull_ands),三是清除重复的ORs(实现函数process_duplicate_ors)。这些函数位于文件src/backend/optimizer/prep/prepqual.c中。

一、布尔代数基础

规范化处理基于布尔/逻辑代数运算的相关基本定律:
幂等律 
A∪A = A
A∩A = A
交换律
A∪B = B∪A
A∩B = B∩A
结合律 
A∪(B∪С) = (A∪B) ∪ С
A∩(B∩С) = (A∩B)∩ С
吸收律 
A∪(A∩B) = A
A∩(A∪B) = A
分配律 
A∪(B∩С)=(A∪B)∩(A∪С)
A∩(B∪С)=(A∩B)∪(A∩С)
幺元律 
0∪A = A
1∩A = A
1∪A = 1
0∩A = 0
补余律 
A∪A' = 1
A∩A' = 0

二、基本概念

PG源码对规范化表达式的注释如下:

 /*  * canonicalize_qual  *    Convert a qualification expression to the most useful form.  *    转换为最常规形式的表达式  *  * This is primarily intended to be used on top-level WHERE (or JOIN/ON)  * clauses.  It can also be used on top-level CHECK constraints, for which  * pass is_check = true.  DO NOT call it on any expression that is not known  * to be one or the other, as it might apply inappropriate simplifications.  * 规范化注意用于最高层的Where/Check(输入参数:is_check = true)语句中,如果不是这类语句,不要使用否则会产生不匹配的简化(PG这里加入了DEBUG信息)  *   * The name of this routine is a holdover from a time when it would try to  * force the expression into canonical AND-of-ORs or OR-of-ANDs form.  * Eventually, we recognized that that had more theoretical purity than  * actual usefulness, and so now the transformation doesn't involve any  * notion of reaching a canonical form.  *  * NOTE: we assume the input has already been through eval_const_expressions  * and therefore possesses AND/OR flatness.  Formerly this function included  * its own flattening logic, but that requires a useless extra pass over the  * tree.  *  * Returns the modified qualification.  */

忽略NULL以及OR中的False/AND中的TRUE
Where条件语句中的NULL/FALSE/TRUE,如能忽略,忽略之.如:NULL OR FALSE OR dwbh = '1001',则忽略NULL/ FALSE

testdb=# explain verbose select * from t_dwxx where NULL OR FALSE OR dwbh = '1001' ;                           QUERY PLAN                           ---------------------------------------------------------------- Seq Scan on public.t_dwxx  (cost=0.00..12.00 rows=1 width=474)   Output: dwmc, dwbh, dwdz   Filter: ((t_dwxx.dwbh)::text = '1001'::text)(3 rows)

拉平谓词
SQL语句中的X1 OR/AND (X2 OR/AND X3),拉平简化为OR/AND(X1,X2,X3)
X1 OR/AND (X2 OR/AND X3),在查询树中为树状结构,第一层节点是BoolExpr,该Node中的args链表有2个元素,args->1=X1,args->2=BoolExpr,args->2->1=X2,args->2->2=X3,组成树状结构.简化后args->1/2/3=X1/X2/X3,所有条件处于同一个层次上,并不是树状结构.

testdb=# explain select * from t_dwxx where dwbh = '1001' OR (dwbh = '1002' OR dwbh = '1003');                                                 QUERY PLAN                                                  ------------------------------------------------------------------------------------------------------------- Seq Scan on t_dwxx  (cost=0.00..12.80 rows=3 width=474)   Filter: (((dwbh)::text = '1001'::text) OR ((dwbh)::text = '1002'::text) OR ((dwbh)::text = '1003'::text))(2 rows)

清除重复ORs
清除重复ORs的数学基础是布尔(逻辑)代数:
(X1 AND X2) OR (X1 AND X3) 应用分配律可以改写为 X1 AND (X2 OR X3),这样改写的目的是把X1抽取出来,为后续下推谓词X1作准备.
如(dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003')条件,会改写为dwbh = '1001' AND (dwbh = '1002' OR dwbh = '1003')

testdb=# explain verbose select * from t_dwxx where (dwbh = '1001' AND dwbh = '1002')  OR (dwbh = '1001' AND dwbh = '1003');                                                             QUERY PLAN                                                              ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.t_dwxx  (cost=0.00..12.80 rows=1 width=474)   Output: dwmc, dwbh, dwdz   Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::text)))(3 rows)

三、源码解读

主函数入口:
subquery_planner

/*输入:    glob-PlannerGlobal    parse-Query结构体指针    parent_root-父PlannerInfo Root节点    hasRecursion-是否递归?    tuple_fraction-扫描Tuple比例输出:    PlannerInfo指针*/ PlannerInfo * subquery_planner(PlannerGlobal *glob, Query *parse,                  PlannerInfo *parent_root,                  bool hasRecursion, double tuple_fraction) {     PlannerInfo *root;//返回值     List       *newWithCheckOptions;//     List       *newHaving;//Having子句     bool        hasOuterJoins;//是否存在Outer Join?     RelOptInfo *final_rel;//     ListCell   *l;//临时变量      /* Create a PlannerInfo data structure for this subquery */     root = makeNode(PlannerInfo);//构造返回值     root->parse = parse;     root->glob = glob;     root->query_level = parent_root ? parent_root->query_level + 1 : 1;     root->parent_root = parent_root;     root->plan_params = NIL;     root->outer_params = NULL;     root->planner_cxt = CurrentMemoryContext;     root->init_plans = NIL;     root->cte_plan_ids = NIL;     root->multiexpr_params = NIL;     root->eq_classes = NIL;     root->append_rel_list = NIL;     root->rowMarks = NIL;     memset(root->upper_rels, 0, sizeof(root->upper_rels));     memset(root->upper_targets, 0, sizeof(root->upper_targets));     root->processed_tlist = NIL;     root->grouping_map = NULL;     root->minmax_aggs = NIL;     root->qual_security_level = 0;     root->inhTargetKind = INHKIND_NONE;     root->hasRecursion = hasRecursion;     if (hasRecursion)         root->wt_param_id = SS_assign_special_param(root);     else         root->wt_param_id = -1;     root->non_recursive_path = NULL;     root->partColsUpdated = false;      /*      * If there is a WITH list, process each WITH query and build an initplan      * SubPlan structure for it.      */     if (parse->cteList)         SS_process_ctes(root);//处理With 语句      /*      * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try      * to transform them into joins.  Note that this step does not descend      * into subqueries; if we pull up any subqueries below, their SubLinks are      * processed just before pulling them up.      */     if (parse->hasSubLinks)         pull_up_sublinks(root); //上拉子链接      /*      * Scan the rangetable for set-returning functions, and inline them if      * possible (producing subqueries that might get pulled up next).      * Recursion issues here are handled in the same way as for SubLinks.      */     inline_set_returning_functions(root);//      /*      * Check to see if any subqueries in the jointree can be merged into this      * query.      */     pull_up_subqueries(root);//上拉子查询      /*      * If this is a simple UNION ALL query, flatten it into an appendrel. We      * do this now because it requires applying pull_up_subqueries to the leaf      * queries of the UNION ALL, which weren't touched above because they      * weren't referenced by the jointree (they will be after we do this).      */     if (parse->setOperations)         flatten_simple_union_all(root);//扁平化处理UNION ALL      /*      * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can      * avoid the expense of doing flatten_join_alias_vars().  Also check for      * outer joins --- if none, we can skip reduce_outer_joins().  And check      * for LATERAL RTEs, too.  This must be done after we have done      * pull_up_subqueries(), of course.      */     //判断RTE中是否存在RTE_JOIN?     root->hasJoinRTEs = false;     root->hasLateralRTEs = false;     hasOuterJoins = false;     foreach(l, parse->rtable)     {         RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);          if (rte->rtekind == RTE_JOIN)         {             root->hasJoinRTEs = true;             if (IS_OUTER_JOIN(rte->jointype))                 hasOuterJoins = true;         }         if (rte->lateral)             root->hasLateralRTEs = true;     }      /*      * Preprocess RowMark information.  We need to do this after subquery      * pullup (so that all non-inherited RTEs are present) and before      * inheritance expansion (so that the info is available for      * expand_inherited_tables to examine and modify).      */      //预处理RowMark信息     preprocess_rowmarks(root);      /*      * Expand any rangetable entries that are inheritance sets into "append      * relations".  This can add entries to the rangetable, but they must be      * plain base relations not joins, so it's OK (and marginally more      * efficient) to do it after checking for join RTEs.  We must do it after      * pulling up subqueries, else we'd fail to handle inherited tables in      * subqueries.      */      //展开继承表     expand_inherited_tables(root);      /*      * Set hasHavingQual to remember if HAVING clause is present.  Needed      * because preprocess_expression will reduce a constant-true condition to      * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.      */     //是否存在Having表达式     root->hasHavingQual = (parse->havingQual != NULL);      /* Clear this flag; might get set in distribute_qual_to_rels */     root->hasPseudoConstantQuals = false;      /*      * Do expression preprocessing on targetlist and quals, as well as other      * random expressions in the querytree.  Note that we do not need to      * handle sort/group expressions explicitly, because they are actually      * part of the targetlist.      */     //预处理表达式:targetList(投影列)     parse->targetList = (List *)         preprocess__expression(root, (Node *) parse->targetList,                               EXPRKIND_TARGET);      /* Constant-folding might have removed all set-returning functions */     if (parse->hasTargetSRFs)         parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList);      newWithCheckOptions = NIL;     foreach(l, parse->withCheckOptions)//witch Check Options     {         WithCheckOption *wco = lfirst_node(WithCheckOption, l);          wco->qual = preprocess__expression(root, wco->qual,                                           EXPRKIND_QUAL);         if (wco->qual != NULL)             newWithCheckOptions = lappend(newWithCheckOptions, wco);     }     parse->withCheckOptions = newWithCheckOptions;     //返回列信息returningList     parse->returningList = (List *)         preprocess__expression(root, (Node *) parse->returningList,                               EXPRKIND_TARGET);     //预处理条件表达式     preprocess_qual_conditions(root, (Node *) parse->jointree);     //预处理Having表达式     parse->havingQual = preprocess__expression(root, parse->havingQual,                                               EXPRKIND_QUAL);     //窗口函数     foreach(l, parse->windowClause)     {         WindowClause *wc = lfirst_node(WindowClause, l);          /* partitionClause/orderClause are sort/group expressions */         wc->startOffset = preprocess__expression(root, wc->startOffset,                                                 EXPRKIND_LIMIT);         wc->endOffset = preprocess__expression(root, wc->endOffset,                                               EXPRKIND_LIMIT);     }     //Limit子句     parse->limitOffset = preprocess__expression(root, parse->limitOffset,                                                EXPRKIND_LIMIT);     parse->limitCount = preprocess__expression(root, parse->limitCount,                                               EXPRKIND_LIMIT);     //On Conflict子句     if (parse->onConflict)     {         parse->onConflict->arbiterElems = (List *)             preprocess__expression(root,                                   (Node *) parse->onConflict->arbiterElems,                                   EXPRKIND_ARBITER_ELEM);         parse->onConflict->arbiterWhere =             preprocess__expression(root,                                   parse->onConflict->arbiterWhere,                                   EXPRKIND_QUAL);         parse->onConflict->onConflictSet = (List *)             preprocess__expression(root,                                   (Node *) parse->onConflict->onConflictSet,                                   EXPRKIND_TARGET);         parse->onConflict->onConflictWhere =             preprocess__expression(root,                                   parse->onConflict->onConflictWhere,                                   EXPRKIND_QUAL);         /* exclRelTlist contains only Vars, so no preprocessing needed */     }     //集合操作(AppendRelInfo)     root->append_rel_list = (List *)         preprocess__expression(root, (Node *) root->append_rel_list,                               EXPRKIND_APPINFO);     //RTE     /* Also need to preprocess expressions within RTEs */     foreach(l, parse->rtable)     {         RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);         int         kind;         ListCell   *lcsq;          if (rte->rtekind == RTE_RELATION)         {             if (rte->tablesample)                 rte->tablesample = (TableSampleClause *)                     preprocess__expression(root,                                           (Node *) rte->tablesample,                                           EXPRKIND_TABLESAMPLE);//数据表采样语句         }         else if (rte->rtekind == RTE_SUBQUERY)//子查询         {             /*              * We don't want to do all preprocessing yet on the subquery's              * expressions, since that will happen when we plan it.  But if it              * contains any join aliases of our level, those have to get              * expanded now, because planning of the subquery won't do it.              * That's only possible if the subquery is LATERAL.              */             if (rte->lateral && root->hasJoinRTEs)                 rte->subquery = (Query *)                     flatten_join_alias_vars(root, (Node *) rte->subquery);         }         else if (rte->rtekind == RTE_FUNCTION)//函数         {             /* Preprocess the function _expression(s) fully */             kind = rte->lateral ? EXPRKIND_RTFUNC_LATERAL : EXPRKIND_RTFUNC;             rte->functions = (List *)                 preprocess__expression(root, (Node *) rte->functions, kind);         }         else if (rte->rtekind == RTE_TABLEFUNC)//TABLE FUNC         {             /* Preprocess the function _expression(s) fully */             kind = rte->lateral ? EXPRKIND_TABLEFUNC_LATERAL : EXPRKIND_TABLEFUNC;             rte->tablefunc = (TableFunc *)                 preprocess__expression(root, (Node *) rte->tablefunc, kind);         }         else if (rte->rtekind == RTE_VALUES)//VALUES子句         {             /* Preprocess the values lists fully */             kind = rte->lateral ? EXPRKIND_VALUES_LATERAL : EXPRKIND_VALUES;             rte->values_lists = (List *)                 preprocess__expression(root, (Node *) rte->values_lists, kind);         }          /*          * Process each element of the securityQuals list as if it were a          * separate qual _expression (as indeed it is).  We need to do it this          * way to get proper canonicalization of AND/OR structure.  Note that          * this converts each element into an implicit-AND sublist.          */         foreach(lcsq, rte->securityQuals)         {             lfirst(lcsq) = preprocess__expression(root,                                                  (Node *) lfirst(lcsq),                                                  EXPRKIND_QUAL);         }     }      ...//其他          return root; } 

preprocess_expression

 /*  * preprocess_expression  *      Do subquery_planner's preprocessing work for an expression,  *      which can be a targetlist, a WHERE clause (including JOIN/ON  *      conditions), a HAVING clause, or a few other things.  */ static Node * preprocess__expression(PlannerInfo *root, Node *expr, int kind) {     /*      * Fall out quickly if expression is empty.  This occurs often enough to      * be worth checking.  Note that null->null is the correct conversion for      * implicit-AND result format, too.      */     if (expr == NULL)         return NULL;      /*      * If the query has any join RTEs, replace join alias variables with      * base-relation variables.  We must do this first, since any expressions      * we may extract from the joinaliasvars lists have not been preprocessed.      * For example, if we did this after sublink processing, sublinks expanded      * out from join aliases would not get processed.  But we can skip this in      * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since      * they can't contain any Vars of the current query level.      */     if (root->hasJoinRTEs &&         !(kind == EXPRKIND_RTFUNC ||           kind == EXPRKIND_VALUES ||           kind == EXPRKIND_TABLESAMPLE ||           kind == EXPRKIND_TABLEFUNC))         expr = flatten_join_alias_vars(root, expr);//扁平化处理joinaliasvars,上节已介绍      /*      * Simplify constant expressions.      *      * Note: an essential effect of this is to convert named-argument function      * calls to positional notation and insert the current actual values of      * any default arguments for functions.  To ensure that happens, we *must*      * process all expressions here.  Previous PG versions sometimes skipped      * const-simplification if it didn't seem worth the trouble, but we can't      * do that anymore.      *      * Note: this also flattens nested AND and OR expressions into N-argument      * form.  All processing of a qual expression after this point must be      * careful to maintain AND/OR flatness --- that is, do not generate a tree      * with AND directly under AND, nor OR directly under OR.      */     expr = eval_const_expressions(root, expr);//简化常量表达式      /*      * If it's a qual or havingQual, canonicalize it.      */     if (kind == EXPRKIND_QUAL)     {         expr = (Node *) canonicalize_qual((Expr *) expr, false);//表达式规范化  #ifdef OPTIMIZER_DEBUG         printf("After canonicalize_qual()\n");         pprint(expr); #endif     }      /* Expand SubLinks to SubPlans */     if (root->parse->hasSubLinks)//扩展子链接为子计划         expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));      /*      * XXX do not insert anything here unless you have grokked the comments in      * SS_replace_correlation_vars ...      */      /* Replace uplevel vars with Param nodes (this IS possible in VALUES) */     if (root->query_level > 1)         expr = SS_replace_correlation_vars(root, expr);//使用Param节点替换上层的Vars      /*      * If it's a qual or havingQual, convert it to implicit-AND format. (We      * don't want to do this before eval_const_expressions, since the latter      * would be unable to simplify a top-level AND correctly. Also,      * SS_process_sublinks expects explicit-AND format.)      */     if (kind == EXPRKIND_QUAL)         expr = (Node *) make_ands_implicit((Expr *) expr);      return expr; }

canonicalize_qual

 /*  * canonicalize_qual  *    Convert a qualification expression to the most useful form.  *  * This is primarily intended to be used on top-level WHERE (or JOIN/ON)  * clauses.  It can also be used on top-level CHECK constraints, for which  * pass is_check = true.  DO NOT call it on any expression that is not known  * to be one or the other, as it might apply inappropriate simplifications.  *  * The name of this routine is a holdover from a time when it would try to  * force the expression into canonical AND-of-ORs or OR-of-ANDs form.  * Eventually, we recognized that that had more theoretical purity than  * actual usefulness, and so now the transformation doesn't involve any  * notion of reaching a canonical form.  *  * NOTE: we assume the input has already been through eval_const_expressions  * and therefore possesses AND/OR flatness.  Formerly this function included  * its own flattening logic, but that requires a useless extra pass over the  * tree.  *  * Returns the modified qualification.  */ Expr * canonicalize_qual(Expr *qual, bool is_check)//规范化表达式 {     Expr       *newqual;      /* Quick exit for empty qual */     if (qual == NULL)         return NULL;      /* This should not be invoked on quals in implicit-AND format */     Assert(!IsA(qual, List));      /*      * Pull up redundant subclauses in OR-of-AND trees.  We do this only      * within the top-level AND/OR structure; there's no point in looking      * deeper.  Also remove any NULL constants in the top-level structure.      */     newqual = find_duplicate_ors(qual, is_check);//执行实际处理逻辑      return newqual; }

find_duplicate_ors

 /*  * find_duplicate_ors  *    Given a qualification tree with the NOTs pushed down, search for  *    OR clauses to which the inverse OR distributive law might apply.  *    Only the top-level AND/OR structure is searched.  *  * While at it, we remove any NULL constants within the top-level AND/OR  * structure, eg in a WHERE clause, "x OR NULL::boolean" is reduced to "x".  * In general that would change the result, so eval_const_expressions can't  * do it; but at top level of WHERE, we don't need to distinguish between  * FALSE and NULL results, so it's valid to treat NULL::boolean the same  * as FALSE and then simplify AND/OR accordingly.  Conversely, in a top-level  * CHECK constraint, we may treat a NULL the same as TRUE.  *  * Returns the modified qualification.  AND/OR flatness is preserved.  */ static Expr * find_duplicate_ors(Expr *qual, bool is_check) {     if (or_clause((Node *) qual))//OR语句     {         List       *orlist = NIL;         ListCell   *temp;          /* Recurse */         foreach(temp, ((BoolExpr *) qual)->args)//遍历args链表         {             Expr       *arg = (Expr *) lfirst(temp);//获取链表中的元素              arg = find_duplicate_ors(arg, is_check);//递归调用              /* Get rid of any constant inputs */             if (arg && IsA(arg, Const))//arg为常量             {                 Const      *carg = (Const *) arg;                  if (is_check)//Check语句                 {                     /* Within OR in CHECK, drop constant FALSE */                     if (!carg->constisnull && !DatumGetBool(carg->constvalue))                         continue;//不为NULL而且为FALSE,继续循环                     //arg为NULL或者TURE,即NULL OR TRUE,简化为TRUE                     /* Constant TRUE or NULL, so OR reduces to TRUE */                     return (Expr *) makeBoolConst(true, false);                 }                 else//Where条件语句                 {                     /* Within OR in WHERE, drop constant FALSE or NULL */                     if (carg->constisnull || !DatumGetBool(carg->constvalue))                         continue;//arg为NULL或者FALSE,继续循环                     //arg不为NULL而且为TRUE,即TRUE常量,直接返回arg                     /* Constant TRUE, so OR reduces to TRUE */                     return arg;                 }             }             //加入结果链表             orlist = lappend(orlist, arg);         }          /* Flatten any ORs pulled up to just below here */         orlist = pull_ors(orlist);//扁平化ORs          /* Now we can look for duplicate ORs */         return process_duplicate_ors(orlist);//处理重复的ORs     }     else if (and_clause((Node *) qual))//AND语句     {         List       *andlist = NIL;         ListCell   *temp;          /* Recurse */         foreach(temp, ((BoolExpr *) qual)->args)//遍历链表         {             Expr       *arg = (Expr *) lfirst(temp);              arg = find_duplicate_ors(arg, is_check);              /* Get rid of any constant inputs */             if (arg && IsA(arg, Const))             {                 Const      *carg = (Const *) arg;                  if (is_check)                 {                     /* Within AND in CHECK, drop constant TRUE or NULL */                     if (carg->constisnull || DatumGetBool(carg->constvalue))                         continue;//CHECK语句                     /* Constant FALSE, so AND reduces to FALSE */                     //不为空且值为FALSE,返回FALSE                     return arg;                 }                 else                 {                     /* Within AND in WHERE, drop constant TRUE */                     if (!carg->constisnull && DatumGetBool(carg->constvalue))                         continue;                     /* Constant FALSE or NULL, so AND reduces to FALSE */                     //NULL OR FALSE,返回FALSE                     return (Expr *) makeBoolConst(false, false);                 }             }              andlist = lappend(andlist, arg);//加入到结果列表         }          /* Flatten any ANDs introduced just below here */         andlist = pull_ands(andlist);//扁平化处理AND          /* AND of no inputs reduces to TRUE */         if (andlist == NIL)//为空指针             return (Expr *) makeBoolConst(true, false);//返回TRUE          /* Single-expression AND just reduces to that expression */         if (list_length(andlist) == 1)//单个表达式             return (Expr *) linitial(andlist);//返回此表达式链表          /* Else we still need an AND node */         return make_andclause(andlist);//否则返回结果链表     }     else         return qual;//非AND/OR语句,直接返回结果 }

process_duplicate_ors

 /*  * process_duplicate_ors  *    Given a list of exprs which are ORed together, try to apply  *    the inverse OR distributive law.  *  * Returns the resulting _expression (could be an AND clause, an OR  * clause, or maybe even a single subexpression).  */ static Expr * process_duplicate_ors(List *orlist) {     List       *reference = NIL;     int         num_subclauses = 0;     List       *winners;     List       *neworlist;     ListCell   *temp;      /* OR of no inputs reduces to FALSE */     if (orlist == NIL)         return (Expr *) makeBoolConst(false, false);      /* Single-expression OR just reduces to that expression */     if (list_length(orlist) == 1)         return (Expr *) linitial(orlist);      /*      * Choose the shortest AND clause as the reference list --- obviously, any      * subclause not in this clause isn't in all the clauses. If we find a      * clause that's not an AND, we can treat it as a one-element AND clause,      * which necessarily wins as shortest.      */     //遍历OR链表,找到AND语句中约束条件最少的那个表达式     //与求解最小公约数同理,公共的谓词只可能在此最小的表达式中产生      foreach(temp, orlist)     {         Expr       *clause = (Expr *) lfirst(temp);          if (and_clause((Node *) clause))//AND语句         {             List       *subclauses = ((BoolExpr *) clause)->args;             int         nclauses = list_length(subclauses);              if (reference == NIL || nclauses < num_subclauses)             {                 reference = subclauses;                 num_subclauses = nclauses;             }         }         else//单个约束条件或者带有表达式的约束条件,如(X1 AND X2) OR X3等         {             reference = list_make1(clause);             break;         }     }      /*      * Just in case, eliminate any duplicates in the reference list.      */     reference = list_union(NIL, reference);//去掉重复的谓词      /*      * Check each element of the reference list to see if it's in all the OR      * clauses.  Build a new list of winning clauses.      */     winners = NIL;     foreach(temp, reference)//遍历链表     {         Expr       *refclause = (Expr *) lfirst(temp);         bool        win = true;         ListCell   *temp2;          foreach(temp2, orlist)         {             Expr       *clause = (Expr *) lfirst(temp2);              if (and_clause((Node *) clause))//该谓词是否在链表中存在?             {                 if (!list_member(((BoolExpr *) clause)->args, refclause))                 {                     win = false;                     break;                 }             }             else//该谓词是否与单个条件表达式等价?             {                 if (!equal(refclause, clause))                 {                     win = false;                     break;                 }             }         }          if (win)//找到了公共的谓词             winners = lappend(winners, refclause);//加入到结果中     }      /*      * If no winners, we can't transform the OR      */     if (winners == NIL)         return make_orclause(orlist);//如果找到,原样返回      /*      * Generate new OR list consisting of the remaining sub-clauses.      *      * 找到了,产生新的条件      *      * If any clause degenerates to empty, then we have a situation like (A      * AND B) OR (A), which can be reduced to just A --- that is, the      * additional conditions in other arms of the OR are irrelevant.      *      * Note that because we use list_difference, any multiple occurrences of a      * winning clause in an AND sub-clause will be removed automatically.      */     neworlist = NIL;     foreach(temp, orlist)//遍历OR链表     {         Expr       *clause = (Expr *) lfirst(temp);          if (and_clause((Node *) clause))//AND语句         {             List       *subclauses = ((BoolExpr *) clause)->args;//获取条件语句参数              subclauses = list_difference(subclauses, winners);//剔除相同部分             if (subclauses != NIL)//成功剔除,产生新的AND语句             {                 if (list_length(subclauses) == 1)                     neworlist = lappend(neworlist, linitial(subclauses));                 else                     neworlist = lappend(neworlist, make_andclause(subclauses));             }             else             {                 neworlist = NIL;    /* degenerate case, see above */                 break;             }         }         else//不是AND语句         {             if (!list_member(winners, clause))//单个条件语句,直接添加条件                 neworlist = lappend(neworlist, clause);             else             //OR语句?公共部分已提出,无需加入其他条件,直接返回             //根据吸收律,X AND (X OR B) 等价于X             {                 neworlist = NIL;    /* degenerate case, see above */                 break;             }         }     }      /*      * Append reduced OR to the winners list, if it's not degenerate, handling      * the special case of one element correctly (can that really happen?).      * Also be careful to maintain AND/OR flatness in case we pulled up a      * sub-sub-OR-clause.      */     if (neworlist != NIL)//新产生的链表     {         if (list_length(neworlist) == 1)             winners = lappend(winners, linitial(neworlist));         else             winners = lappend(winners, make_orclause(pull_ors(neworlist)));//拉平OR     }      /*      * And return the constructed AND clause, again being wary of a single      * element and AND/OR flatness.      */     //返回结果      if (list_length(winners) == 1)         return (Expr *) linitial(winners);     else         return make_andclause(pull_ands(winners));//拉平AND }

pull_ors

 /*  * pull_ors  *    Recursively flatten nested OR clauses into a single or-clause list.  *  * Input is the arglist of an OR clause.  * Returns the rebuilt arglist (note original list structure is not touched).  */ static List * pull_ors(List *orlist) {     List       *out_list = NIL;     ListCell   *arg;      foreach(arg, orlist)     {         Node       *subexpr = (Node *) lfirst(arg);          /*          * Note: we can destructively concat the subexpression's arglist          * because we know the recursive invocation of pull_ors will have          * built a new arglist not shared with any other expr. Otherwise we'd          * need a list_copy here.          */         if (or_clause(subexpr))             out_list = list_concat(out_list,                                    pull_ors(((BoolExpr *) subexpr)->args));//递归拉平         else             out_list = lappend(out_list, subexpr);     }     return out_list; }

pull_ands

 /*  * pull_ands  *    Recursively flatten nested AND clauses into a single and-clause list.  *  * Input is the arglist of an AND clause.  * Returns the rebuilt arglist (note original list structure is not touched).  */ static List * pull_ands(List *andlist) {     List       *out_list = NIL;     ListCell   *arg;      foreach(arg, andlist)     {         Node       *subexpr = (Node *) lfirst(arg);          /*          * Note: we can destructively concat the subexpression's arglist          * because we know the recursive invocation of pull_ands will have          * built a new arglist not shared with any other expr. Otherwise we'd          * need a list_copy here.          */         if (and_clause(subexpr))             out_list = list_concat(out_list,                                    pull_ands(((BoolExpr *) subexpr)->args));//递归拉平         else             out_list = lappend(out_list, subexpr);     }     return out_list; }

四、跟踪分析

测试脚本:

select * from t_dwxx where (FALSE OR dwbh = '1001')        AND ((dwbh = '1001' AND dwbh = '1002')               OR (dwbh = '1001' AND dwbh = '1003'));

该语句经规范化后与以下SQL语句无异:

select * from t_dwxx where  dwbh = '1001' AND (dwbh='1002' OR dwbh='1003');-- 执行计划testdb=# explain verbose select * from t_dwxx where (FALSE OR dwbh = '1001')        AND ((dwbh = '1001' AND dwbh = '1002')               OR (dwbh = '1001' AND dwbh = '1003'));                                                             QUERY PLAN                                                              ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.t_dwxx  (cost=0.00..12.80 rows=1 width=474)   Output: dwmc, dwbh, dwdz   Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::text)))(3 rows)

gdb跟踪:

(gdb) b find_duplicate_orsBreakpoint 1 at 0x7811b4: file prepqual.c, line 418.(gdb) cContinuing.Breakpoint 1, find_duplicate_ors (qual=0x2727528, is_check=false) at prepqual.c:418418   if (or_clause((Node *) qual))#输入参数,BoolExpr#args链表有两个元素,一个是(FALSE OR dwbh = '1001'),#另外一个是((dwbh = '1001' AND dwbh = '1002')               OR (dwbh = '1001' AND dwbh = '1003'))(gdb) p *(BoolExpr *)qual$2 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2726c88, location = -1}(gdb) p *((BoolExpr *)qual)->args$3 = {type = T_List, length = 2, head = 0x2726c68, tail = 0x2727508}...#进入AND分支462   else if (and_clause((Node *) qual))...#获得链表第一个元素470       Expr     *arg = (Expr *) lfirst(temp);#再次进入find_duplicate_ors#FALSE OR dwbh='1001'在此前已被简化为dwbh='1001',所以在这里直接返回Breakpoint 1, find_duplicate_ors (qual=0x2726bc8, is_check=false) at prepqual.c:418418   if (or_clause((Node *) qual))(gdb) n462   else if (and_clause((Node *) qual))(gdb) #515     return qual;(gdb) n516 }(gdb) 475       if (arg && IsA(arg, Const))...497       andlist = lappend(andlist, arg);(gdb) n468     foreach(temp, ((BoolExpr *) qual)->args)(gdb) n470       Expr     *arg = (Expr *) lfirst(temp);(gdb) 472       arg = find_duplicate_ors(arg, is_check);#链表的下一个元素,即#((dwbh = '1001' AND dwbh = '1002')               OR (dwbh = '1001' AND dwbh = '1003'))#由两个BoolExpr组成(gdb) p *(BoolExpr *)arg$23 = {xpr = {type = T_BoolExpr}, boolop = OR_EXPR, args = 0x27270d8, location = -1}(gdb) p *((BoolExpr *)arg)->args$24 = {type = T_List, length = 2, head = 0x27270b8, tail = 0x27274b8}(gdb) p *(Node *)((BoolExpr *)arg)->args->head->data.ptr_value$25 = {type = T_BoolExpr}(gdb) p *(Node *)((BoolExpr *)arg)->args->head->next->data.ptr_value$26 = {type = T_BoolExpr}...#args左右两边的的arg处理完毕424     foreach(temp, ((BoolExpr *) qual)->args)(gdb) 457     orlist = pull_ors(orlist);...#进入process_duplicate_ors460     return process_duplicate_ors(orlist);(gdb) stepprocess_duplicate_ors (orlist=0x2727858) at prepqual.c:529529   List     *reference = NIL;...#获得最少谓词的链表(gdb) p *reference$36 = {type = T_List, length = 2, head = 0x27278a8, tail = 0x27278f8}...#获得了公共的谓词winner!即dwbh = '1001'(gdb) p *winners$40 = {type = T_List, length = 1, head = 0x2727918, tail = 0x2727918}...(gdb) canonicalize_qual (qual=0x2727528, is_check=false) at prepqual.c:309309   return newqual;(gdb) p *newqual$43 = {type = T_BoolExpr}(gdb) p *(BoolExpr *)newqual$45 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2727c18, location = -1}#DONE!

五、小结

1、优化的数学基础:布尔代数以及相关的定律;
2、表达式规范化的过程:表达式扁平化处理以及公共谓词提取等的处理逻辑。

六、参考资料

布尔代数
prepqual.c

0