导航：首页 > 数据库 >

PostgreSQL中create_sort_plan函数实现逻辑是什么

发表于：2024-11-24 作者：千家信息网编辑

千家信息网最后更新 2024年11月24日，这篇文章主要讲解了"PostgreSQL中create_sort_plan函数实现逻辑是什么"，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习"Postgr

千家信息网最后更新 2024年11月24日PostgreSQL中create_sort_plan函数实现逻辑是什么

这篇文章主要讲解了"PostgreSQL中create_sort_plan函数实现逻辑是什么"，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习"PostgreSQL中create_sort_plan函数实现逻辑是什么"吧！

一、数据结构

Plan
所有计划节点通过将Plan结构作为第一个字段从Plan结构"派生"。这确保了在将节点转换为计划节点时，一切都能正常工作。(在执行器中以通用方式传递时，节点指针经常被转换为Plan *)

/* ---------------- *      Plan node * * All plan nodes "derive" from the Plan structure by having the * Plan structure as the first field.  This ensures that everything works * when nodes are cast to Plan's.  (node pointers are frequently cast to Plan* * when passed around generically in the executor) * 所有计划节点通过将Plan结构作为第一个字段从Plan结构"派生"。 * 这确保了在将节点转换为计划节点时，一切都能正常工作。 * (在执行器中以通用方式传递时，节点指针经常被转换为Plan *) * * We never actually instantiate any Plan nodes; this is just the common * abstract superclass for all Plan-type nodes. * 从未实例化任何Plan节点;这只是所有Plan-type节点的通用抽象超类。 * ---------------- */typedef struct Plan{    NodeTag     type;//节点类型    /*     * 成本估算信息;estimated execution costs for plan (see costsize.c for more info)     */    Cost        startup_cost;   /* 启动成本;cost expended before fetching any tuples */    Cost        total_cost;     /* 总成本;total cost (assuming all tuples fetched) */    /*     * 优化器估算信息;planner's estimate of result size of this plan step     */    double      plan_rows;      /* 行数;number of rows plan is expected to emit */    int         plan_width;     /* 平均行大小(Byte为单位);average row width in bytes */    /*     * 并行执行相关的信息;information needed for parallel query     */    bool        parallel_aware; /* 是否参与并行执行逻辑?engage parallel-aware logic? */    bool        parallel_safe;  /* 是否并行安全;OK to use as part of parallel plan? */    /*     * Plan类型节点通用的信息.Common structural data for all Plan types.     */    int         plan_node_id;   /* unique across entire final plan tree */    List       *targetlist;     /* target list to be computed at this node */    List       *qual;           /* implicitly-ANDed qual conditions */    struct Plan *lefttree;      /* input plan tree(s) */    struct Plan *righttree;    List       *initPlan;       /* Init Plan nodes (un-correlated expr                                 * subselects) */    /*     * Information for management of parameter-change-driven rescanning     * parameter-change-driven重扫描的管理信息.     *      * extParam includes the paramIDs of all external PARAM_EXEC params     * affecting this plan node or its children.  setParam params from the     * node's initPlans are not included, but their extParams are.     *     * allParam includes all the extParam paramIDs, plus the IDs of local     * params that affect the node (i.e., the setParams of its initplans).     * These are _all_ the PARAM_EXEC params that affect this node.     */    Bitmapset  *extParam;    Bitmapset  *allParam;} Plan;

二、源码解读

create_plan->create_plan_recurse->create_projection_plan函数创建计划树,执行投影操作并通过递归的方式为子访问路径生成执行计划。create_sort_plan函数创建Sort计划节点。

//---------------------------------------------------------------- create_projection_plan/* * create_projection_plan * *    Create a plan tree to do a projection step and (recursively) plans *    for its subpaths.  We may need a Result node for the projection, *    but sometimes we can just let the subplan do the work. *    创建计划树,执行投影操作并通过递归的方式为子访问路径生成执行计划. *    一般来说需要一个Result节点用于投影操作,但有时候可以让子计划执行此项任务. */static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags){    Plan       *plan;    Plan       *subplan;    List       *tlist;    bool        needs_result_node = false;    /*     * Convert our subpath to a Plan and determine whether we need a Result     * node.     * 转换subpath为Plan,并确定是否需要Result节点.     *     * In most cases where we don't need to project, creation_projection_path     * will have set dummypp, but not always.  First, some createplan.c     * routines change the tlists of their nodes.  (An example is that     * create_merge_append_plan might add resjunk sort columns to a     * MergeAppend.)  Second, create_projection_path has no way of knowing     * what path node will be placed on top of the projection path and     * therefore can't predict whether it will require an exact tlist. For     * both of these reasons, we have to recheck here.     * 在大多数情况下，我们不需要投影运算，creation_projection_path将设置dummypp标志，但并不总是如此。     * 首先,一些createplan.c中的函数更改其节点的tlist。     * (例如，create_merge_append_plan可能会向MergeAppend添加resjunk sort列)。     * 其次，create_projection_path无法知道将在投影路径顶部放置哪些路径节点，因此无法预测它是否需要一个确切的tlist。     * 由于这两个原因，我们不得不在这里重新检查。     */    if (use_physical_tlist(root, &best_path->path, flags))    {        /*         * Our caller doesn't really care what tlist we return, so we don't         * actually need to project.  However, we may still need to ensure         * proper sortgroupref labels, if the caller cares about those.         * 如果我们的调用者并不关心返回的结果tlist，所以实际上不需要投影运算。         * 然而，如果调用者关心sortgroupref标签，可能仍然需要确保正确的sortgroupref数据。         */        subplan = create_plan_recurse(root, best_path->subpath, 0);        tlist = subplan->targetlist;        if (flags & CP_LABEL_TLIST)            apply_pathtarget_labeling_to_tlist(tlist,                                               best_path->path.pathtarget);    }    else if (is_projection_capable_path(best_path->subpath))    {        /*         * Our caller requires that we return the exact tlist, but no separate         * result node is needed because the subpath is projection-capable.         * Tell create_plan_recurse that we're going to ignore the tlist it         * produces.         * 调用者要求返回精确的tlist，但是不需要单独的Result节点，因为子路径支持投影。         * 调用create_plan_recurse时忽略它生成的tlist。         */        subplan = create_plan_recurse(root, best_path->subpath,                                      CP_IGNORE_TLIST);        tlist = build_path_tlist(root, &best_path->path);    }    else    {        /*         * It looks like we need a result node, unless by good fortune the         * requested tlist is exactly the one the child wants to produce.         * 看起来需要一个Result节点，除非幸运的是，请求的tlist正是子节点想要生成的。         */        subplan = create_plan_recurse(root, best_path->subpath, 0);        tlist = build_path_tlist(root, &best_path->path);        needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);    }    /*     * If we make a different decision about whether to include a Result node     * than create_projection_path did, we'll have made slightly wrong cost     * estimates; but label the plan with the cost estimates we actually used,     * not "corrected" ones.  (XXX this could be cleaned up if we moved more     * of the sortcolumn setup logic into Path creation, but that would add     * expense to creating Paths we might end up not using.)     * 如果我们对是否包含一个Result节点做出与create_projection_path不同的决定，     * 我们就会做出略微错误的成本估算;     * 但是在这个计划上标上我们实际使用的成本估算，而不是"修正的"成本估算。     * (如果我们将更多的sortcolumn设置逻辑移到路径创建中，这个问题就可以解决了，     *  但是这会增加创建路径的成本，而最终可能不会使用这些路径。)     */    if (!needs_result_node)    {        /* Don't need a separate Result, just assign tlist to subplan */        //不需要单独的Result节点,把tlist赋值给subplan        plan = subplan;        plan->targetlist = tlist;        /* Label plan with the estimated costs we actually used */        //标记估算成本        plan->startup_cost = best_path->path.startup_cost;        plan->total_cost = best_path->path.total_cost;        plan->plan_rows = best_path->path.rows;        plan->plan_width = best_path->path.pathtarget->width;        plan->parallel_safe = best_path->path.parallel_safe;        /* ... but don't change subplan's parallel_aware flag */    }    else    {        /* We need a Result node */        //需要Result节点        plan = (Plan *) make_result(tlist, NULL, subplan);        copy_generic_path_info(plan, (Path *) best_path);    }    return plan;}//---------------------------------------------------------------- create_sort_plan /*  * create_sort_plan  *  *    Create a Sort plan for 'best_path' and (recursively) plans  *    for its subpaths.  *    创建Sort计划节点  */ static Sort * create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags) {     Sort       *plan;     Plan       *subplan;      /*      * We don't want any excess columns in the sorted tuples, so request a      * smaller tlist.  Otherwise, since Sort doesn't project, tlist      * requirements pass through.      * 我们不希望在排序元组中有任何多余的列，所以希望得到一个更小的tlist。      * 否则，由于Sort不执行投影运算，tlist就会通过完整的保留传递。      */     subplan = create_plan_recurse(root, best_path->subpath,                                   flags | CP_SMALL_TLIST);      /*      * make_sort_from_pathkeys() indirectly calls find_ec_member_for_tle(),      * which will ignore any child EC members that don't belong to the given      * relids. Thus, if this sort path is based on a child relation, we must      * pass its relids.      * make_sort_from_pathkeys()间接调用find_ec_member_for_tle()，它将忽略不属于给定relid的任何子EC成员。      * 因此，如果这个排序路径是基于子关系的，我们必须传递它的relids。      */     plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,                                    IS_OTHER_REL(best_path->subpath->parent) ?                                    best_path->path.parent->relids : NULL);      copy_generic_path_info(&plan->plan, (Path *) best_path);      return plan; }//------------------------------------------------ build_path_tlist/* * Build a target list (ie, a list of TargetEntry) for the Path's output. * 构建用于输出的target链表(比如TargetEntry节点链表) * * This is almost just make_tlist_from_pathtarget(), but we also have to * deal with replacing nestloop params. * 该函数几乎就是make_tlist_from_pathtarget()的实现逻辑，但还必须处理替换nestloop参数。 */static List *build_path_tlist(PlannerInfo *root, Path *path){    List       *tlist = NIL;    Index      *sortgrouprefs = path->pathtarget->sortgrouprefs;    int         resno = 1;    ListCell   *v;    foreach(v, path->pathtarget->exprs)    {        Node       *node = (Node *) lfirst(v);        TargetEntry *tle;        /*         * If it's a parameterized path, there might be lateral references in         * the tlist, which need to be replaced with Params.  There's no need         * to remake the TargetEntry nodes, so apply this to each list item         * separately.         * 如果是参数化路径，那么tlist中可能有横向引用，需要用Params替换。         * 不需要重新构建TargetEntry节点，因此可以将其单独应用于每个链表项。         */        if (path->param_info)            node = replace_nestloop_params(root, node);        tle = makeTargetEntry((Expr *) node,                              resno,                              NULL,                              false);        if (sortgrouprefs)            tle->ressortgroupref = sortgrouprefs[resno - 1];        tlist = lappend(tlist, tle);        resno++;    }    return tlist;}

三、跟踪分析

测试脚本如下

testdb=# explain select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je testdb-# from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny,jf.je testdb(#                         from t_grxx gr inner join t_jfxx jf testdb(#                                        on gr.dwbh = dw.dwbh testdb(#                                           and gr.grbh = jf.grbh) grjftestdb-# order by dw.dwbh;                                        QUERY PLAN                                        ------------------------------------------------------------------------------------------ Sort  (cost=20070.93..20320.93 rows=100000 width=47)   Sort Key: dw.dwbh   ->  Hash Join  (cost=3754.00..8689.61 rows=100000 width=47)         Hash Cond: ((gr.dwbh)::text = (dw.dwbh)::text)         ->  Hash Join  (cost=3465.00..8138.00 rows=100000 width=31)               Hash Cond: ((jf.grbh)::text = (gr.grbh)::text)               ->  Seq Scan on t_jfxx jf  (cost=0.00..1637.00 rows=100000 width=20)               ->  Hash  (cost=1726.00..1726.00 rows=100000 width=16)                     ->  Seq Scan on t_grxx gr  (cost=0.00..1726.00 rows=100000 width=16)         ->  Hash  (cost=164.00..164.00 rows=10000 width=20)               ->  Seq Scan on t_dwxx dw  (cost=0.00..164.00 rows=10000 width=20)(11 rows)

启动gdb,设置断点,进入create_projection_plan函数

(gdb) b create_projection_planBreakpoint 2 at 0x7b95a8: file createplan.c, line 1627.(gdb) cContinuing.Breakpoint 2, create_projection_plan (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:16271627        bool        needs_result_node = false;

转换subpath为Plan,并确定是否需要Result节点,并且判断是否需要生成Result节点

...(gdb) n1642        if (use_physical_tlist(root, &best_path->path, flags))(gdb) n1655        else if (is_projection_capable_path(best_path->subpath))(gdb) 1673            subplan = create_plan_recurse(root, best_path->subpath, 0);

查看best_path&best_path->subpath变量

(gdb) p *best_path$3 = {path = {type = T_ProjectionPath, pathtype = T_Result, parent = 0x2722998, pathtarget = 0x27226f8, param_info = 0x0,     parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 20070.931487218411,     total_cost = 20320.931487218411, pathkeys = 0x26cfe98}, subpath = 0x2722c68, dummypp = true}(gdb) p *(SortPath *)best_path->subpath$16 = {path = {type = T_SortPath, pathtype = T_Sort, parent = 0x2722998, pathtarget = 0x27212d8, param_info = 0x0,     parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 20070.931487218411,     total_cost = 20320.931487218411, pathkeys = 0x26cfe98}, subpath = 0x2721e60}

创建subpath(SortPath)的执行计划

(gdb) stepcreate_plan_recurse (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:364364     check_stack_depth();(gdb) n366     switch (best_path->pathtype)(gdb) 447             plan = (Plan *) create_sort_plan(root,

进入create_sort_plan

(gdb) stepcreate_sort_plan (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:17591759        subplan = create_plan_recurse(root, best_path->subpath,

SortPath的subpath是HashPath

(gdb) p best_path->subpath->type$17 = T_HashPath(gdb) p *(HashPath *)best_path->subpath$18 = {jpath = {path = {type = T_HashPath, pathtype = T_HashJoin, parent = 0x27210c0, pathtarget = 0x27212d8,       param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000,       startup_cost = 3754, total_cost = 8689.6112499999999, pathkeys = 0x0}, jointype = JOIN_INNER, inner_unique = true,     outerjoinpath = 0x2720f68, innerjoinpath = 0x26d0598, joinrestrictinfo = 0x2722068}, path_hashclauses = 0x27223c0,   num_batches = 1, inner_rows_total = 10000}

完成SortPath执行计划的构建

(gdb) 1774        return plan;(gdb) 1775    }(gdb) p *plan$20 = {plan = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000,     plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2723208, qual = 0x0,     lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}, numCols = 1,   sortColIdx = 0x27222a0, sortOperators = 0x2724468, collations = 0x2724488, nullsFirst = 0x27244a8}

回到上一层

(gdb) ncreate_plan_recurse (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:450450             break;

回到create_projection_plan函数

(gdb) n504     return plan;(gdb) 505 }(gdb) create_projection_plan (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:16741674            tlist = build_path_tlist(root, &best_path->path);

执行完毕,返回create_plan,结果,最外层的Plan为Sort

(gdb) 1708        return plan;(gdb) 1709    }(gdb) p *plan$22 = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000,   plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2724548, qual = 0x0,   lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}(gdb) ncreate_plan_recurse (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:504504     return plan;(gdb) p *plan$23 = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000,   plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2724548, qual = 0x0,   lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}(gdb) n505 }(gdb) create_plan (root=0x26c1258, best_path=0x2722d00) at createplan.c:329329     if (!IsA(plan, ModifyTable))

感谢各位的阅读，以上就是"PostgreSQL中create_sort_plan函数实现逻辑是什么"的内容了，经过本文的学习后，相信大家对PostgreSQL中create_sort_plan函数实现逻辑是什么这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是，小编将为大家推送更多相关知识点的文章，欢迎关注！

很赞哦！