PostgreSQL 源码解读(33)- 查询语句#18(查询优化-表达式预处理#3)
本节简单介绍了PG查询优化表达式预处理中的规范化过程。规范化具体的做法一是忽略NULL以及OR中的False,And中的True(实现函数find_duplicate_ors),二是拉平谓词(实现函数:pull_ors/pull_ands),三是清除重复的ORs(实现函数process_duplicate_ors)。这些函数位于文件src/backend/optimizer/prep/prepqual.c中。
一、布尔代数基础
规范化处理基于布尔/逻辑代数运算的相关基本定律:
幂等律
A∪A = A
A∩A = A
交换律
A∪B = B∪A
A∩B = B∩A
结合律
A∪(B∪С) = (A∪B) ∪ С
A∩(B∩С) = (A∩B)∩ С
吸收律
A∪(A∩B) = A
A∩(A∪B) = A
分配律
A∪(B∩С)=(A∪B)∩(A∪С)
A∩(B∪С)=(A∩B)∪(A∩С)
幺元律
0∪A = A
1∩A = A
1∪A = 1
0∩A = 0
补余律
A∪A' = 1
A∩A' = 0
二、基本概念
PG源码对规范化表达式的注释如下:
/* * canonicalize_qual * Convert a qualification expression to the most useful form. * 转换为最常规形式的表达式 * * This is primarily intended to be used on top-level WHERE (or JOIN/ON) * clauses. It can also be used on top-level CHECK constraints, for which * pass is_check = true. DO NOT call it on any expression that is not known * to be one or the other, as it might apply inappropriate simplifications. * 规范化注意用于最高层的Where/Check(输入参数:is_check = true)语句中,如果不是这类语句,不要使用否则会产生不匹配的简化(PG这里加入了DEBUG信息) * * The name of this routine is a holdover from a time when it would try to * force the expression into canonical AND-of-ORs or OR-of-ANDs form. * Eventually, we recognized that that had more theoretical purity than * actual usefulness, and so now the transformation doesn't involve any * notion of reaching a canonical form. * * NOTE: we assume the input has already been through eval_const_expressions * and therefore possesses AND/OR flatness. Formerly this function included * its own flattening logic, but that requires a useless extra pass over the * tree. * * Returns the modified qualification. */
忽略NULL以及OR中的False/AND中的TRUE
Where条件语句中的NULL/FALSE/TRUE,如能忽略,忽略之.如:NULL OR FALSE OR dwbh = '1001',则忽略NULL/ FALSE
testdb=# explain verbose select * from t_dwxx where NULL OR FALSE OR dwbh = '1001' ; QUERY PLAN ---------------------------------------------------------------- Seq Scan on public.t_dwxx (cost=0.00..12.00 rows=1 width=474) Output: dwmc, dwbh, dwdz Filter: ((t_dwxx.dwbh)::text = '1001'::text)(3 rows)
拉平谓词
SQL语句中的X1 OR/AND (X2 OR/AND X3),拉平简化为OR/AND(X1,X2,X3)
X1 OR/AND (X2 OR/AND X3),在查询树中为树状结构,第一层节点是BoolExpr,该Node中的args链表有2个元素,args->1=X1,args->2=BoolExpr,args->2->1=X2,args->2->2=X3,组成树状结构.简化后args->1/2/3=X1/X2/X3,所有条件处于同一个层次上,并不是树状结构.
testdb=# explain select * from t_dwxx where dwbh = '1001' OR (dwbh = '1002' OR dwbh = '1003'); QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on t_dwxx (cost=0.00..12.80 rows=3 width=474) Filter: (((dwbh)::text = '1001'::text) OR ((dwbh)::text = '1002'::text) OR ((dwbh)::text = '1003'::text))(2 rows)
清除重复ORs
清除重复ORs的数学基础是布尔(逻辑)代数:
(X1 AND X2) OR (X1 AND X3) 应用分配律可以改写为 X1 AND (X2 OR X3),这样改写的目的是把X1抽取出来,为后续下推谓词X1作准备.
如(dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003')条件,会改写为dwbh = '1001' AND (dwbh = '1002' OR dwbh = '1003')
testdb=# explain verbose select * from t_dwxx where (dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.t_dwxx (cost=0.00..12.80 rows=1 width=474) Output: dwmc, dwbh, dwdz Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::text)))(3 rows)
三、源码解读
主函数入口:
subquery_planner
/*输入: glob-PlannerGlobal parse-Query结构体指针 parent_root-父PlannerInfo Root节点 hasRecursion-是否递归? tuple_fraction-扫描Tuple比例输出: PlannerInfo指针*/ PlannerInfo * subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root, bool hasRecursion, double tuple_fraction) { PlannerInfo *root;//返回值 List *newWithCheckOptions;// List *newHaving;//Having子句 bool hasOuterJoins;//是否存在Outer Join? RelOptInfo *final_rel;// ListCell *l;//临时变量 /* Create a PlannerInfo data structure for this subquery */ root = makeNode(PlannerInfo);//构造返回值 root->parse = parse; root->glob = glob; root->query_level = parent_root ? parent_root->query_level + 1 : 1; root->parent_root = parent_root; root->plan_params = NIL; root->outer_params = NULL; root->planner_cxt = CurrentMemoryContext; root->init_plans = NIL; root->cte_plan_ids = NIL; root->multiexpr_params = NIL; root->eq_classes = NIL; root->append_rel_list = NIL; root->rowMarks = NIL; memset(root->upper_rels, 0, sizeof(root->upper_rels)); memset(root->upper_targets, 0, sizeof(root->upper_targets)); root->processed_tlist = NIL; root->grouping_map = NULL; root->minmax_aggs = NIL; root->qual_security_level = 0; root->inhTargetKind = INHKIND_NONE; root->hasRecursion = hasRecursion; if (hasRecursion) root->wt_param_id = SS_assign_special_param(root); else root->wt_param_id = -1; root->non_recursive_path = NULL; root->partColsUpdated = false; /* * If there is a WITH list, process each WITH query and build an initplan * SubPlan structure for it. */ if (parse->cteList) SS_process_ctes(root);//处理With 语句 /* * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try * to transform them into joins. Note that this step does not descend * into subqueries; if we pull up any subqueries below, their SubLinks are * processed just before pulling them up. */ if (parse->hasSubLinks) pull_up_sublinks(root); //上拉子链接 /* * Scan the rangetable for set-returning functions, and inline them if * possible (producing subqueries that might get pulled up next). * Recursion issues here are handled in the same way as for SubLinks. */ inline_set_returning_functions(root);// /* * Check to see if any subqueries in the jointree can be merged into this * query. */ pull_up_subqueries(root);//上拉子查询 /* * If this is a simple UNION ALL query, flatten it into an appendrel. We * do this now because it requires applying pull_up_subqueries to the leaf * queries of the UNION ALL, which weren't touched above because they * weren't referenced by the jointree (they will be after we do this). */ if (parse->setOperations) flatten_simple_union_all(root);//扁平化处理UNION ALL /* * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can * avoid the expense of doing flatten_join_alias_vars(). Also check for * outer joins --- if none, we can skip reduce_outer_joins(). And check * for LATERAL RTEs, too. This must be done after we have done * pull_up_subqueries(), of course. */ //判断RTE中是否存在RTE_JOIN? root->hasJoinRTEs = false; root->hasLateralRTEs = false; hasOuterJoins = false; foreach(l, parse->rtable) { RangeTblEntry *rte = lfirst_node(RangeTblEntry, l); if (rte->rtekind == RTE_JOIN) { root->hasJoinRTEs = true; if (IS_OUTER_JOIN(rte->jointype)) hasOuterJoins = true; } if (rte->lateral) root->hasLateralRTEs = true; } /* * Preprocess RowMark information. We need to do this after subquery * pullup (so that all non-inherited RTEs are present) and before * inheritance expansion (so that the info is available for * expand_inherited_tables to examine and modify). */ //预处理RowMark信息 preprocess_rowmarks(root); /* * Expand any rangetable entries that are inheritance sets into "append * relations". This can add entries to the rangetable, but they must be * plain base relations not joins, so it's OK (and marginally more * efficient) to do it after checking for join RTEs. We must do it after * pulling up subqueries, else we'd fail to handle inherited tables in * subqueries. */ //展开继承表 expand_inherited_tables(root); /* * Set hasHavingQual to remember if HAVING clause is present. Needed * because preprocess_expression will reduce a constant-true condition to * an empty qual list ... but "HAVING TRUE" is not a semantic no-op. */ //是否存在Having表达式 root->hasHavingQual = (parse->havingQual != NULL); /* Clear this flag; might get set in distribute_qual_to_rels */ root->hasPseudoConstantQuals = false; /* * Do expression preprocessing on targetlist and quals, as well as other * random expressions in the querytree. Note that we do not need to * handle sort/group expressions explicitly, because they are actually * part of the targetlist. */ //预处理表达式:targetList(投影列) parse->targetList = (List *) preprocess__expression(root, (Node *) parse->targetList, EXPRKIND_TARGET); /* Constant-folding might have removed all set-returning functions */ if (parse->hasTargetSRFs) parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList); newWithCheckOptions = NIL; foreach(l, parse->withCheckOptions)//witch Check Options { WithCheckOption *wco = lfirst_node(WithCheckOption, l); wco->qual = preprocess__expression(root, wco->qual, EXPRKIND_QUAL); if (wco->qual != NULL) newWithCheckOptions = lappend(newWithCheckOptions, wco); } parse->withCheckOptions = newWithCheckOptions; //返回列信息returningList parse->returningList = (List *) preprocess__expression(root, (Node *) parse->returningList, EXPRKIND_TARGET); //预处理条件表达式 preprocess_qual_conditions(root, (Node *) parse->jointree); //预处理Having表达式 parse->havingQual = preprocess__expression(root, parse->havingQual, EXPRKIND_QUAL); //窗口函数 foreach(l, parse->windowClause) { WindowClause *wc = lfirst_node(WindowClause, l); /* partitionClause/orderClause are sort/group expressions */ wc->startOffset = preprocess__expression(root, wc->startOffset, EXPRKIND_LIMIT); wc->endOffset = preprocess__expression(root, wc->endOffset, EXPRKIND_LIMIT); } //Limit子句 parse->limitOffset = preprocess__expression(root, parse->limitOffset, EXPRKIND_LIMIT); parse->limitCount = preprocess__expression(root, parse->limitCount, EXPRKIND_LIMIT); //On Conflict子句 if (parse->onConflict) { parse->onConflict->arbiterElems = (List *) preprocess__expression(root, (Node *) parse->onConflict->arbiterElems, EXPRKIND_ARBITER_ELEM); parse->onConflict->arbiterWhere = preprocess__expression(root, parse->onConflict->arbiterWhere, EXPRKIND_QUAL); parse->onConflict->onConflictSet = (List *) preprocess__expression(root, (Node *) parse->onConflict->onConflictSet, EXPRKIND_TARGET); parse->onConflict->onConflictWhere = preprocess__expression(root, parse->onConflict->onConflictWhere, EXPRKIND_QUAL); /* exclRelTlist contains only Vars, so no preprocessing needed */ } //集合操作(AppendRelInfo) root->append_rel_list = (List *) preprocess__expression(root, (Node *) root->append_rel_list, EXPRKIND_APPINFO); //RTE /* Also need to preprocess expressions within RTEs */ foreach(l, parse->rtable) { RangeTblEntry *rte = lfirst_node(RangeTblEntry, l); int kind; ListCell *lcsq; if (rte->rtekind == RTE_RELATION) { if (rte->tablesample) rte->tablesample = (TableSampleClause *) preprocess__expression(root, (Node *) rte->tablesample, EXPRKIND_TABLESAMPLE);//数据表采样语句 } else if (rte->rtekind == RTE_SUBQUERY)//子查询 { /* * We don't want to do all preprocessing yet on the subquery's * expressions, since that will happen when we plan it. But if it * contains any join aliases of our level, those have to get * expanded now, because planning of the subquery won't do it. * That's only possible if the subquery is LATERAL. */ if (rte->lateral && root->hasJoinRTEs) rte->subquery = (Query *) flatten_join_alias_vars(root, (Node *) rte->subquery); } else if (rte->rtekind == RTE_FUNCTION)//函数 { /* Preprocess the function _expression(s) fully */ kind = rte->lateral ? EXPRKIND_RTFUNC_LATERAL : EXPRKIND_RTFUNC; rte->functions = (List *) preprocess__expression(root, (Node *) rte->functions, kind); } else if (rte->rtekind == RTE_TABLEFUNC)//TABLE FUNC { /* Preprocess the function _expression(s) fully */ kind = rte->lateral ? EXPRKIND_TABLEFUNC_LATERAL : EXPRKIND_TABLEFUNC; rte->tablefunc = (TableFunc *) preprocess__expression(root, (Node *) rte->tablefunc, kind); } else if (rte->rtekind == RTE_VALUES)//VALUES子句 { /* Preprocess the values lists fully */ kind = rte->lateral ? EXPRKIND_VALUES_LATERAL : EXPRKIND_VALUES; rte->values_lists = (List *) preprocess__expression(root, (Node *) rte->values_lists, kind); } /* * Process each element of the securityQuals list as if it were a * separate qual _expression (as indeed it is). We need to do it this * way to get proper canonicalization of AND/OR structure. Note that * this converts each element into an implicit-AND sublist. */ foreach(lcsq, rte->securityQuals) { lfirst(lcsq) = preprocess__expression(root, (Node *) lfirst(lcsq), EXPRKIND_QUAL); } } ...//其他 return root; }
preprocess_expression
/* * preprocess_expression * Do subquery_planner's preprocessing work for an expression, * which can be a targetlist, a WHERE clause (including JOIN/ON * conditions), a HAVING clause, or a few other things. */ static Node * preprocess__expression(PlannerInfo *root, Node *expr, int kind) { /* * Fall out quickly if expression is empty. This occurs often enough to * be worth checking. Note that null->null is the correct conversion for * implicit-AND result format, too. */ if (expr == NULL) return NULL; /* * If the query has any join RTEs, replace join alias variables with * base-relation variables. We must do this first, since any expressions * we may extract from the joinaliasvars lists have not been preprocessed. * For example, if we did this after sublink processing, sublinks expanded * out from join aliases would not get processed. But we can skip this in * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since * they can't contain any Vars of the current query level. */ if (root->hasJoinRTEs && !(kind == EXPRKIND_RTFUNC || kind == EXPRKIND_VALUES || kind == EXPRKIND_TABLESAMPLE || kind == EXPRKIND_TABLEFUNC)) expr = flatten_join_alias_vars(root, expr);//扁平化处理joinaliasvars,上节已介绍 /* * Simplify constant expressions. * * Note: an essential effect of this is to convert named-argument function * calls to positional notation and insert the current actual values of * any default arguments for functions. To ensure that happens, we *must* * process all expressions here. Previous PG versions sometimes skipped * const-simplification if it didn't seem worth the trouble, but we can't * do that anymore. * * Note: this also flattens nested AND and OR expressions into N-argument * form. All processing of a qual expression after this point must be * careful to maintain AND/OR flatness --- that is, do not generate a tree * with AND directly under AND, nor OR directly under OR. */ expr = eval_const_expressions(root, expr);//简化常量表达式 /* * If it's a qual or havingQual, canonicalize it. */ if (kind == EXPRKIND_QUAL) { expr = (Node *) canonicalize_qual((Expr *) expr, false);//表达式规范化 #ifdef OPTIMIZER_DEBUG printf("After canonicalize_qual()\n"); pprint(expr); #endif } /* Expand SubLinks to SubPlans */ if (root->parse->hasSubLinks)//扩展子链接为子计划 expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL)); /* * XXX do not insert anything here unless you have grokked the comments in * SS_replace_correlation_vars ... */ /* Replace uplevel vars with Param nodes (this IS possible in VALUES) */ if (root->query_level > 1) expr = SS_replace_correlation_vars(root, expr);//使用Param节点替换上层的Vars /* * If it's a qual or havingQual, convert it to implicit-AND format. (We * don't want to do this before eval_const_expressions, since the latter * would be unable to simplify a top-level AND correctly. Also, * SS_process_sublinks expects explicit-AND format.) */ if (kind == EXPRKIND_QUAL) expr = (Node *) make_ands_implicit((Expr *) expr); return expr; }
canonicalize_qual
/* * canonicalize_qual * Convert a qualification expression to the most useful form. * * This is primarily intended to be used on top-level WHERE (or JOIN/ON) * clauses. It can also be used on top-level CHECK constraints, for which * pass is_check = true. DO NOT call it on any expression that is not known * to be one or the other, as it might apply inappropriate simplifications. * * The name of this routine is a holdover from a time when it would try to * force the expression into canonical AND-of-ORs or OR-of-ANDs form. * Eventually, we recognized that that had more theoretical purity than * actual usefulness, and so now the transformation doesn't involve any * notion of reaching a canonical form. * * NOTE: we assume the input has already been through eval_const_expressions * and therefore possesses AND/OR flatness. Formerly this function included * its own flattening logic, but that requires a useless extra pass over the * tree. * * Returns the modified qualification. */ Expr * canonicalize_qual(Expr *qual, bool is_check)//规范化表达式 { Expr *newqual; /* Quick exit for empty qual */ if (qual == NULL) return NULL; /* This should not be invoked on quals in implicit-AND format */ Assert(!IsA(qual, List)); /* * Pull up redundant subclauses in OR-of-AND trees. We do this only * within the top-level AND/OR structure; there's no point in looking * deeper. Also remove any NULL constants in the top-level structure. */ newqual = find_duplicate_ors(qual, is_check);//执行实际处理逻辑 return newqual; }
find_duplicate_ors
/* * find_duplicate_ors * Given a qualification tree with the NOTs pushed down, search for * OR clauses to which the inverse OR distributive law might apply. * Only the top-level AND/OR structure is searched. * * While at it, we remove any NULL constants within the top-level AND/OR * structure, eg in a WHERE clause, "x OR NULL::boolean" is reduced to "x". * In general that would change the result, so eval_const_expressions can't * do it; but at top level of WHERE, we don't need to distinguish between * FALSE and NULL results, so it's valid to treat NULL::boolean the same * as FALSE and then simplify AND/OR accordingly. Conversely, in a top-level * CHECK constraint, we may treat a NULL the same as TRUE. * * Returns the modified qualification. AND/OR flatness is preserved. */ static Expr * find_duplicate_ors(Expr *qual, bool is_check) { if (or_clause((Node *) qual))//OR语句 { List *orlist = NIL; ListCell *temp; /* Recurse */ foreach(temp, ((BoolExpr *) qual)->args)//遍历args链表 { Expr *arg = (Expr *) lfirst(temp);//获取链表中的元素 arg = find_duplicate_ors(arg, is_check);//递归调用 /* Get rid of any constant inputs */ if (arg && IsA(arg, Const))//arg为常量 { Const *carg = (Const *) arg; if (is_check)//Check语句 { /* Within OR in CHECK, drop constant FALSE */ if (!carg->constisnull && !DatumGetBool(carg->constvalue)) continue;//不为NULL而且为FALSE,继续循环 //arg为NULL或者TURE,即NULL OR TRUE,简化为TRUE /* Constant TRUE or NULL, so OR reduces to TRUE */ return (Expr *) makeBoolConst(true, false); } else//Where条件语句 { /* Within OR in WHERE, drop constant FALSE or NULL */ if (carg->constisnull || !DatumGetBool(carg->constvalue)) continue;//arg为NULL或者FALSE,继续循环 //arg不为NULL而且为TRUE,即TRUE常量,直接返回arg /* Constant TRUE, so OR reduces to TRUE */ return arg; } } //加入结果链表 orlist = lappend(orlist, arg); } /* Flatten any ORs pulled up to just below here */ orlist = pull_ors(orlist);//扁平化ORs /* Now we can look for duplicate ORs */ return process_duplicate_ors(orlist);//处理重复的ORs } else if (and_clause((Node *) qual))//AND语句 { List *andlist = NIL; ListCell *temp; /* Recurse */ foreach(temp, ((BoolExpr *) qual)->args)//遍历链表 { Expr *arg = (Expr *) lfirst(temp); arg = find_duplicate_ors(arg, is_check); /* Get rid of any constant inputs */ if (arg && IsA(arg, Const)) { Const *carg = (Const *) arg; if (is_check) { /* Within AND in CHECK, drop constant TRUE or NULL */ if (carg->constisnull || DatumGetBool(carg->constvalue)) continue;//CHECK语句 /* Constant FALSE, so AND reduces to FALSE */ //不为空且值为FALSE,返回FALSE return arg; } else { /* Within AND in WHERE, drop constant TRUE */ if (!carg->constisnull && DatumGetBool(carg->constvalue)) continue; /* Constant FALSE or NULL, so AND reduces to FALSE */ //NULL OR FALSE,返回FALSE return (Expr *) makeBoolConst(false, false); } } andlist = lappend(andlist, arg);//加入到结果列表 } /* Flatten any ANDs introduced just below here */ andlist = pull_ands(andlist);//扁平化处理AND /* AND of no inputs reduces to TRUE */ if (andlist == NIL)//为空指针 return (Expr *) makeBoolConst(true, false);//返回TRUE /* Single-expression AND just reduces to that expression */ if (list_length(andlist) == 1)//单个表达式 return (Expr *) linitial(andlist);//返回此表达式链表 /* Else we still need an AND node */ return make_andclause(andlist);//否则返回结果链表 } else return qual;//非AND/OR语句,直接返回结果 }
process_duplicate_ors
/* * process_duplicate_ors * Given a list of exprs which are ORed together, try to apply * the inverse OR distributive law. * * Returns the resulting _expression (could be an AND clause, an OR * clause, or maybe even a single subexpression). */ static Expr * process_duplicate_ors(List *orlist) { List *reference = NIL; int num_subclauses = 0; List *winners; List *neworlist; ListCell *temp; /* OR of no inputs reduces to FALSE */ if (orlist == NIL) return (Expr *) makeBoolConst(false, false); /* Single-expression OR just reduces to that expression */ if (list_length(orlist) == 1) return (Expr *) linitial(orlist); /* * Choose the shortest AND clause as the reference list --- obviously, any * subclause not in this clause isn't in all the clauses. If we find a * clause that's not an AND, we can treat it as a one-element AND clause, * which necessarily wins as shortest. */ //遍历OR链表,找到AND语句中约束条件最少的那个表达式 //与求解最小公约数同理,公共的谓词只可能在此最小的表达式中产生 foreach(temp, orlist) { Expr *clause = (Expr *) lfirst(temp); if (and_clause((Node *) clause))//AND语句 { List *subclauses = ((BoolExpr *) clause)->args; int nclauses = list_length(subclauses); if (reference == NIL || nclauses < num_subclauses) { reference = subclauses; num_subclauses = nclauses; } } else//单个约束条件或者带有表达式的约束条件,如(X1 AND X2) OR X3等 { reference = list_make1(clause); break; } } /* * Just in case, eliminate any duplicates in the reference list. */ reference = list_union(NIL, reference);//去掉重复的谓词 /* * Check each element of the reference list to see if it's in all the OR * clauses. Build a new list of winning clauses. */ winners = NIL; foreach(temp, reference)//遍历链表 { Expr *refclause = (Expr *) lfirst(temp); bool win = true; ListCell *temp2; foreach(temp2, orlist) { Expr *clause = (Expr *) lfirst(temp2); if (and_clause((Node *) clause))//该谓词是否在链表中存在? { if (!list_member(((BoolExpr *) clause)->args, refclause)) { win = false; break; } } else//该谓词是否与单个条件表达式等价? { if (!equal(refclause, clause)) { win = false; break; } } } if (win)//找到了公共的谓词 winners = lappend(winners, refclause);//加入到结果中 } /* * If no winners, we can't transform the OR */ if (winners == NIL) return make_orclause(orlist);//如果找到,原样返回 /* * Generate new OR list consisting of the remaining sub-clauses. * * 找到了,产生新的条件 * * If any clause degenerates to empty, then we have a situation like (A * AND B) OR (A), which can be reduced to just A --- that is, the * additional conditions in other arms of the OR are irrelevant. * * Note that because we use list_difference, any multiple occurrences of a * winning clause in an AND sub-clause will be removed automatically. */ neworlist = NIL; foreach(temp, orlist)//遍历OR链表 { Expr *clause = (Expr *) lfirst(temp); if (and_clause((Node *) clause))//AND语句 { List *subclauses = ((BoolExpr *) clause)->args;//获取条件语句参数 subclauses = list_difference(subclauses, winners);//剔除相同部分 if (subclauses != NIL)//成功剔除,产生新的AND语句 { if (list_length(subclauses) == 1) neworlist = lappend(neworlist, linitial(subclauses)); else neworlist = lappend(neworlist, make_andclause(subclauses)); } else { neworlist = NIL; /* degenerate case, see above */ break; } } else//不是AND语句 { if (!list_member(winners, clause))//单个条件语句,直接添加条件 neworlist = lappend(neworlist, clause); else //OR语句?公共部分已提出,无需加入其他条件,直接返回 //根据吸收律,X AND (X OR B) 等价于X { neworlist = NIL; /* degenerate case, see above */ break; } } } /* * Append reduced OR to the winners list, if it's not degenerate, handling * the special case of one element correctly (can that really happen?). * Also be careful to maintain AND/OR flatness in case we pulled up a * sub-sub-OR-clause. */ if (neworlist != NIL)//新产生的链表 { if (list_length(neworlist) == 1) winners = lappend(winners, linitial(neworlist)); else winners = lappend(winners, make_orclause(pull_ors(neworlist)));//拉平OR } /* * And return the constructed AND clause, again being wary of a single * element and AND/OR flatness. */ //返回结果 if (list_length(winners) == 1) return (Expr *) linitial(winners); else return make_andclause(pull_ands(winners));//拉平AND }
pull_ors
/* * pull_ors * Recursively flatten nested OR clauses into a single or-clause list. * * Input is the arglist of an OR clause. * Returns the rebuilt arglist (note original list structure is not touched). */ static List * pull_ors(List *orlist) { List *out_list = NIL; ListCell *arg; foreach(arg, orlist) { Node *subexpr = (Node *) lfirst(arg); /* * Note: we can destructively concat the subexpression's arglist * because we know the recursive invocation of pull_ors will have * built a new arglist not shared with any other expr. Otherwise we'd * need a list_copy here. */ if (or_clause(subexpr)) out_list = list_concat(out_list, pull_ors(((BoolExpr *) subexpr)->args));//递归拉平 else out_list = lappend(out_list, subexpr); } return out_list; }
pull_ands
/* * pull_ands * Recursively flatten nested AND clauses into a single and-clause list. * * Input is the arglist of an AND clause. * Returns the rebuilt arglist (note original list structure is not touched). */ static List * pull_ands(List *andlist) { List *out_list = NIL; ListCell *arg; foreach(arg, andlist) { Node *subexpr = (Node *) lfirst(arg); /* * Note: we can destructively concat the subexpression's arglist * because we know the recursive invocation of pull_ands will have * built a new arglist not shared with any other expr. Otherwise we'd * need a list_copy here. */ if (and_clause(subexpr)) out_list = list_concat(out_list, pull_ands(((BoolExpr *) subexpr)->args));//递归拉平 else out_list = lappend(out_list, subexpr); } return out_list; }
四、跟踪分析
测试脚本:
select * from t_dwxx where (FALSE OR dwbh = '1001') AND ((dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003'));
该语句经规范化后与以下SQL语句无异:
select * from t_dwxx where dwbh = '1001' AND (dwbh='1002' OR dwbh='1003');-- 执行计划testdb=# explain verbose select * from t_dwxx where (FALSE OR dwbh = '1001') AND ((dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003')); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on public.t_dwxx (cost=0.00..12.80 rows=1 width=474) Output: dwmc, dwbh, dwdz Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::text)))(3 rows)
gdb跟踪:
(gdb) b find_duplicate_orsBreakpoint 1 at 0x7811b4: file prepqual.c, line 418.(gdb) cContinuing.Breakpoint 1, find_duplicate_ors (qual=0x2727528, is_check=false) at prepqual.c:418418 if (or_clause((Node *) qual))#输入参数,BoolExpr#args链表有两个元素,一个是(FALSE OR dwbh = '1001'),#另外一个是((dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003'))(gdb) p *(BoolExpr *)qual$2 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2726c88, location = -1}(gdb) p *((BoolExpr *)qual)->args$3 = {type = T_List, length = 2, head = 0x2726c68, tail = 0x2727508}...#进入AND分支462 else if (and_clause((Node *) qual))...#获得链表第一个元素470 Expr *arg = (Expr *) lfirst(temp);#再次进入find_duplicate_ors#FALSE OR dwbh='1001'在此前已被简化为dwbh='1001',所以在这里直接返回Breakpoint 1, find_duplicate_ors (qual=0x2726bc8, is_check=false) at prepqual.c:418418 if (or_clause((Node *) qual))(gdb) n462 else if (and_clause((Node *) qual))(gdb) #515 return qual;(gdb) n516 }(gdb) 475 if (arg && IsA(arg, Const))...497 andlist = lappend(andlist, arg);(gdb) n468 foreach(temp, ((BoolExpr *) qual)->args)(gdb) n470 Expr *arg = (Expr *) lfirst(temp);(gdb) 472 arg = find_duplicate_ors(arg, is_check);#链表的下一个元素,即#((dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003'))#由两个BoolExpr组成(gdb) p *(BoolExpr *)arg$23 = {xpr = {type = T_BoolExpr}, boolop = OR_EXPR, args = 0x27270d8, location = -1}(gdb) p *((BoolExpr *)arg)->args$24 = {type = T_List, length = 2, head = 0x27270b8, tail = 0x27274b8}(gdb) p *(Node *)((BoolExpr *)arg)->args->head->data.ptr_value$25 = {type = T_BoolExpr}(gdb) p *(Node *)((BoolExpr *)arg)->args->head->next->data.ptr_value$26 = {type = T_BoolExpr}...#args左右两边的的arg处理完毕424 foreach(temp, ((BoolExpr *) qual)->args)(gdb) 457 orlist = pull_ors(orlist);...#进入process_duplicate_ors460 return process_duplicate_ors(orlist);(gdb) stepprocess_duplicate_ors (orlist=0x2727858) at prepqual.c:529529 List *reference = NIL;...#获得最少谓词的链表(gdb) p *reference$36 = {type = T_List, length = 2, head = 0x27278a8, tail = 0x27278f8}...#获得了公共的谓词winner!即dwbh = '1001'(gdb) p *winners$40 = {type = T_List, length = 1, head = 0x2727918, tail = 0x2727918}...(gdb) canonicalize_qual (qual=0x2727528, is_check=false) at prepqual.c:309309 return newqual;(gdb) p *newqual$43 = {type = T_BoolExpr}(gdb) p *(BoolExpr *)newqual$45 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2727c18, location = -1}#DONE!
五、小结
1、优化的数学基础:布尔代数以及相关的定律;
2、表达式规范化的过程:表达式扁平化处理以及公共谓词提取等的处理逻辑。
六、参考资料
布尔代数
prepqual.c