千家信息网

PostgreSQL 源码解读(232)- 查询#125(NOT IN实现#3)

发表于:2025-02-14 作者:千家信息网编辑
千家信息网最后更新 2025年02月14日,本节介绍了PostgreSQL含有NOT IN查询语句实现ExecMaterial函数中部分依赖的函数。一、数据结构SubPlanState子计划运行期状态/* ---------------- *
千家信息网最后更新 2025年02月14日PostgreSQL 源码解读(232)- 查询#125(NOT IN实现#3)

本节介绍了PostgreSQL含有NOT IN查询语句实现ExecMaterial函数中部分依赖的函数。

一、数据结构

SubPlanState
子计划运行期状态

/* ---------------- *        SubPlanState node * ---------------- */typedef struct SubPlanState{    NodeTag        type;    SubPlan    *subplan;        /* expression plan node */    struct PlanState *planstate;    /* subselect plan's state tree */    struct PlanState *parent;    /* parent plan node's state tree */    ExprState  *testexpr;        /* 组合表达式状态;state of combining expression */    List       *args;            /* 参数表达式状态;states of argument _expression(s) */    HeapTuple    curTuple;        /* subplan最近的元组;copy of most recent tuple from subplan */    Datum        curArray;        /* most recent array from ARRAY() subplan */    /* these are used when hashing the subselect's output: */    TupleDesc    descRight;        /* 投影后的子查询描述符;subselect desc after projection */    ProjectionInfo *projLeft;    /* for projecting lefthand exprs */    ProjectionInfo *projRight;    /* for projecting subselect output */    TupleHashTable hashtable;    /* hash table for no-nulls subselect rows */    TupleHashTable hashnulls;    /* hash table for rows with null(s) */    bool        havehashrows;    /* true if hashtable is not empty */    bool        havenullrows;    /* true if hashnulls is not empty */    MemoryContext hashtablecxt; /* memory context containing hash tables */    MemoryContext hashtempcxt;    /* temp memory context for hash tables */    ExprContext *innerecontext; /* econtext for computing inner tuples */    AttrNumber *keyColIdx;        /* control data for hash tables */    Oid           *tab_eq_funcoids;    /* equality func oids for table                                     * datatype(s) */    Oid           *tab_collations; /* collations for hash and comparison */    FmgrInfo   *tab_hash_funcs; /* hash functions for table datatype(s) */    FmgrInfo   *tab_eq_funcs;    /* equality functions for table datatype(s) */    FmgrInfo   *lhs_hash_funcs; /* hash functions for lefthand datatype(s) */    FmgrInfo   *cur_eq_funcs;    /* equality functions for LHS vs. table */    ExprState  *cur_eq_comp;    /* equality comparator for LHS vs. table */} SubPlanState;

SubPlan
子查询计划

/* * SubPlan - executable expression node for a subplan (sub-SELECT) * * The planner replaces SubLink nodes in expression trees with SubPlan * nodes after it has finished planning the subquery.  SubPlan references * a sub-plantree stored in the subplans list of the toplevel PlannedStmt. * (We avoid a direct link to make it easier to copy expression trees * without causing multiple processing of the subplan.) * 查询规划器在完成子查询的规划后使用SubPlan节点替换表达式树中的SubLink节点。 * SubPlan引用了存储在高层PlannedStmt中的subplans链表中的sub-plantree。 * (避免使用直接链接,从而使得拷贝表达式树相对比较简单) * * In an ordinary subplan, testexpr points to an executable expression * (OpExpr, an AND/OR tree of OpExprs, or RowCompareExpr) for the combining * operator(s); the left-hand arguments are the original lefthand expressions, * and the right-hand arguments are PARAM_EXEC Param nodes representing the * outputs of the sub-select.  (NOTE: runtime coercion functions may be * inserted as well.)  This is just the same expression tree as testexpr in * the original SubLink node, but the PARAM_SUBLINK nodes are replaced by * suitably numbered PARAM_EXEC nodes. * 常规情况下,testexpr指向用于组合操作的可执行表达式(OpExpr、OpExprs的AND/OR树或者RowCompareExpr); * 左参数是原始的左表达式,右参数是PARAM_EXEC参数节点用以表示子查询的输出。 * 与原始SubLink节点的testexpr具有相同的表达式树,但PARAM_SUBLINK节点则使用合适的已编号PARAM_EXEC节点替代。 * * If the sub-select becomes an initplan rather than a subplan, the executable * expression is part of the outer plan's expression tree (and the SubPlan * node itself is not, but rather is found in the outer plan's initPlan * list).  In this case testexpr is NULL to avoid duplication. * 如果子查询成了initplan而不是subplan,可执行的表达式是外层plan表达式树的一部分。 * 这种情况下,testexpr为NULL以避免重复。 * * The planner also derives lists of the values that need to be passed into * and out of the subplan.  Input values are represented as a list "args" of * expressions to be evaluated in the outer-query context (currently these * args are always just Vars, but in principle they could be any expression). * The values are assigned to the global PARAM_EXEC params indexed by parParam * (the parParam and args lists must have the same ordering).  setParam is a * list of the PARAM_EXEC params that are computed by the sub-select, if it * is an initplan; they are listed in order by sub-select output column * position.  (parParam and setParam are integer Lists, not Bitmapsets, * because their ordering is significant.) * 规划器还派生了需要传入和传出子计划的值的链表。 * 输入值标识位表达式的"args"链表,在外层查询上下文中进行解析。 * (这些args通常是Vars,但原则上它们可以是任意表达式) * 这些值以parParam为索引给全局PARAM_EXEC参数赋值。 * setParam是PARAM_EXEC参数链表,通过子查询(如为initplan)计算所得。 * 它们按子查询输出列的位置进行排序组织为链表形式。 * (parParam和setParam是整型链表,而不是Bitmapsets链表) * * Also, the planner computes startup and per-call costs for use of the * SubPlan.  Note that these include the cost of the subquery proper, * evaluation of the testexpr if any, and any hashtable management overhead. * 同时,规划器计算SubPlan启动和每次调用的成本。注意:包括子查询正常解析testexpr的成本以及哈希表管理成本。 */typedef struct SubPlan{    Expr        xpr;//表达式    /* Fields copied from original SubLink: */    //从SubLink中拷贝而来    SubLinkType subLinkType;    /* see above */    /* The combining operators, transformed to an executable expression: */    //组合操作符,转换为可执行的表达式    Node       *testexpr;        /* OpExpr or RowCompareExpr expression tree */    List       *paramIds;        /* 参数IDs;IDs of Params embedded in the above */    /* Identification of the Plan tree to use: */    //Plan tree标识    int            plan_id;        /* Index (from 1) in PlannedStmt.subplans */    /* Identification of the SubPlan for EXPLAIN and debugging purposes: */    //EXPLAIN和debug目的的SubPlan标识    char       *plan_name;        /* A name assigned during planning */    /* Extra data useful for determining subplan's output type: */    //用于确定subplan输出类型的额外信息    Oid            firstColType;    /* subplan结果的第一个列类型;Type of first column of subplan result */    int32        firstColTypmod; /* 第一列的Typmod;Typmod of first column of subplan result */    Oid            firstColCollation;    /* 第一列的Collation;Collation of first column of subplan                                     * result */    /* Information about execution strategy: */    //执行阶段的相关信息    bool        useHashTable;    /* 是否使用哈希表存储子查询输出;true to store subselect output in a hash                                 * table (implies we are doing "IN") */    bool        unknownEqFalse; /* 如OK为T,如为未知则为F;快速处理null值;true if it's okay to return FALSE when the                                 * spec result is UNKNOWN; this allows much                                 * simpler handling of null values */    bool        parallel_safe;    /* 是否并行安全?is the subplan parallel-safe? */    /* Note: parallel_safe does not consider contents of testexpr or args */    /* Information for passing params into and out of the subselect: */    //用于给子查询传入和传出参数的信息    /* setParam and parParam are lists of integers (param IDs) */    //setParam和parParam是整型链表(param IDs)    List       *setParam;        /* initplan subqueries have to set these                                 * Params for parent plan */    List       *parParam;        /* indices of input Params from parent plan */    List       *args;            /* 以parParam值进行传递的表达式;exprs to pass as parParam values */    /* Estimated execution costs: */    //估算执行成本    Cost        startup_cost;    /* one-time setup cost */    Cost        per_call_cost;    /* cost for each subplan evaluation */} SubPlan;

SubLinkType
SubLink类型

/* * SubLink * * A SubLink represents a subselect appearing in an expression, and in some * cases also the combining operator(s) just above it.  The subLinkType * indicates the form of the expression represented: *    EXISTS_SUBLINK        EXISTS(SELECT ...) *    ALL_SUBLINK            (lefthand) op ALL (SELECT ...) *    ANY_SUBLINK            (lefthand) op ANY (SELECT ...) *    ROWCOMPARE_SUBLINK    (lefthand) op (SELECT ...) *    EXPR_SUBLINK        (SELECT with single targetlist item ...) *    MULTIEXPR_SUBLINK    (SELECT with multiple targetlist items ...) *    ARRAY_SUBLINK        ARRAY(SELECT with single targetlist item ...) *    CTE_SUBLINK            WITH query (never actually part of an expression) *  我们使用SubLink表示在表达式中出现的子查询,在某些情况下组合操作符会出现在SubLink之上。 *  subLinkType表示表达式的形式: *    EXISTS_SUBLINK        EXISTS(SELECT ...) *    ALL_SUBLINK            (lefthand) op ALL (SELECT ...) *    ANY_SUBLINK            (lefthand) op ANY (SELECT ...) *    ROWCOMPARE_SUBLINK    (lefthand) op (SELECT ...) *    EXPR_SUBLINK        (SELECT with single targetlist item ...) *    MULTIEXPR_SUBLINK    (SELECT with multiple targetlist items ...) *    ARRAY_SUBLINK        ARRAY(SELECT with single targetlist item ...) *    CTE_SUBLINK            WITH query (never actually part of an expression)  * * For ALL, ANY, and ROWCOMPARE, the lefthand is a list of expressions of the * same length as the subselect's targetlist.  ROWCOMPARE will *always* have * a list with more than one entry; if the subselect has just one target * then the parser will create an EXPR_SUBLINK instead (and any operator * above the subselect will be represented separately). * ROWCOMPARE, EXPR, and MULTIEXPR require the subselect to deliver at most * one row (if it returns no rows, the result is NULL). * ALL, ANY, and ROWCOMPARE require the combining operators to deliver boolean * results.  ALL and ANY combine the per-row results using AND and OR * semantics respectively. * ARRAY requires just one target column, and creates an array of the target * column's type using any number of rows resulting from the subselect. * 对于ALL,ANY和ROWCOMPARE,左操作符是与子查询目标链表长度一致的表达式链表。 * ROWCOMPARE通常有超过一个条目的链表;如果子查询刚好只有一个目标列,那么解析器会创建EXPR_SUBLINK * (同时所有在子查询之上的操作符会单独表示) * ROWCOMPARE, EXPR, 和MULTIEXPR要求子查询至少输出一行(如返回0行,则结果为NULL)。 * ALL,ANY和ROWCOMPARE要求组合操作符输出布尔型结果。 * ALL/ANY使用AND/OR语义来组合每一行的结果。 * * SubLink is classed as an Expr node, but it is not actually executable; * it must be replaced in the expression tree by a SubPlan node during * planning. * SubLink归类为Expr节点,但实际上并不是可执行的,必须在计划阶段通过SubPlan替代。 * * NOTE: in the raw output of gram.y, testexpr contains just the raw form * of the lefthand _expression (if any), and operName is the String name of * the combining operator.  Also, subselect is a raw parsetree.  During parse * analysis, the parser transforms testexpr into a complete boolean expression * that compares the lefthand value(s) to PARAM_SUBLINK nodes representing the * output columns of the subselect.  And subselect is transformed to a Query. * This is the representation seen in saved rules and in the rewriter. * 注意:在gram.y的裸输出中,testexpr只包含左表达式的裸形式,operName是组合操作符的字符串名称。 * 同时,子查询是裸parsetree。在解析分析期间, * 解析器转换testexpr为完整的布尔表达式用于比较左操作符值与PARAM_SUBLINK节点所代表的子查询输出列值。 * 子查询会转换为Query结构体。 * 在已存储的规则和重写时可见的表示形式。 * * In EXISTS, EXPR, MULTIEXPR, and ARRAY SubLinks, testexpr and operName * are unused and are always null. * 在EXISTS/EXPR/MULTEXPR/ARRAY SubLinks中,testexpr和operName不再使用通常是NULL值。 * * subLinkId is currently used only for MULTIEXPR SubLinks, and is zero in * other SubLinks.  This number identifies different multiple-assignment * subqueries within an UPDATE statement's SET list.  It is unique only * within a particular targetlist.  The output column(s) of the MULTIEXPR * are referenced by PARAM_MULTIEXPR Params appearing elsewhere in the tlist. * subLinkId当前只用于MULTIEXPR,在其他SubLinks中取值为0. * 该数字标识了在UPDATE语句SET链表中不同的多个赋值子查询。 * 只有在特定的targetlist内是唯一的。 * 出现在tlist其他地方的PARAM_MULTIEXPR参数依赖于MULTIEXPR的输出列。 * * The CTE_SUBLINK case never occurs in actual SubLink nodes, but it is used * in SubPlans generated for WITH subqueries. * CTE_SUBLINK不会出现在实际的SubLink节点中,但用于WITH子查询所产生的SubPlans中。 */typedef enum SubLinkType{    EXISTS_SUBLINK,    ALL_SUBLINK,    ANY_SUBLINK,    ROWCOMPARE_SUBLINK,    EXPR_SUBLINK,    MULTIEXPR_SUBLINK,    ARRAY_SUBLINK,    CTE_SUBLINK                    /* 仅用于SubPlans中;for SubPlans only */} SubLinkType;

SubLink
SubLink结构体

typedef struct SubLink{    Expr        xpr;    SubLinkType subLinkType;    /* see above */    int            subLinkId;        /* ID (1..n); 0 if not MULTIEXPR */    Node       *testexpr;        /* outer-query test for ALL/ANY/ROWCOMPARE */    List       *operName;        /* originally specified operator name */    Node       *subselect;        /* subselect as Query* or raw parsetree */    int            location;        /* token location, or -1 if unknown */} SubLink;

MaterialState
Material状态

/* ---------------- *     MaterialState information * *        materialize nodes are used to materialize the results *        of a subplan into a temporary file. *        materialize节点用于物化subplan的结果为临时文件。 * *        ss.ss_ScanTupleSlot refers to output of underlying plan. *        ss.ss_ScanTupleSlot指向underlyling plan的输出(subplan) * ---------------- */typedef struct MaterialState{    ScanState    ss;                /* its first field is NodeTag */    int            eflags;            /* 传递给tuplestore的capability标记;capability flags to pass to tuplestore */    bool        eof_underlying; /* 已经到达underlying plan的末尾?reached end of underlying plan? */    Tuplestorestate *tuplestorestate;} MaterialState;

二、源码解读

ExecMaterial
执行物化操作。

/* ---------------------------------------------------------------- *        ExecMaterial * *        As long as we are at the end of the data collected in the tuplestore, *        we collect one new row from the subplan on each call, and stash it *        aside in the tuplestore before returning it.  The tuplestore is *        only read if we are asked to scan backwards, rescan, or mark/restore. *      只要在tuplestore中数据收集结束时,就会在每次调用时从subplan中收集一条新行, *      并在返回之前将其保存在tuplestore中。 *      只要在往后扫描、重新扫描或标记/恢复时tuplestore才会读取。 * * ---------------------------------------------------------------- */static TupleTableSlot *            /* 从subplan中返回的结果;result tuple from subplan */ExecMaterial(PlanState *pstate){    MaterialState *node = castNode(MaterialState, pstate);//物化节点    EState       *estate;//运行期状态    ScanDirection dir;//扫描方向    bool        forward;//是否往前扫描    Tuplestorestate *tuplestorestate;//Tuplestorestate结构体指针    bool        eof_tuplestore;//是否完成?    TupleTableSlot *slot;//存储元组的slot    CHECK_FOR_INTERRUPTS();    /*     * get state info from node     * 从物化节点中获取相关信息     */    estate = node->ss.ps.state;    dir = estate->es_direction;//方向    forward = ScanDirectionIsForward(dir);//是否往前扫描    tuplestorestate = node->tuplestorestate;    /*     * If first time through, and we need a tuplestore, initialize it.     * 第一次,需要tuplestore并初始化     */    if (tuplestorestate == NULL && node->eflags != 0)    {        tuplestorestate = tuplestore_begin_heap(true, false, work_mem);        tuplestore_set_eflags(tuplestorestate, node->eflags);        if (node->eflags & EXEC_FLAG_MARK)        {            /*             * Allocate a second read pointer to serve as the mark. We know it             * must have index 1, so needn't store that.             * 分配用于mark的读指针             */            int            ptrno PG_USED_FOR_ASSERTS_ONLY;            ptrno = tuplestore_alloc_read_pointer(tuplestorestate,                                                  node->eflags);            Assert(ptrno == 1);        }        node->tuplestorestate = tuplestorestate;    }    /*     * If we are not at the end of the tuplestore, or are going backwards, try     * to fetch a tuple from tuplestore.     * 如果不在tuplestore的末尾或者正在往后扫描,尝试从tuplestore中提取一个元组     */    eof_tuplestore = (tuplestorestate == NULL) ||        tuplestore_ateof(tuplestorestate);    if (!forward && eof_tuplestore)    {        if (!node->eof_underlying)        {            /*             * When reversing direction at tuplestore EOF, the first             * gettupleslot call will fetch the last-added tuple; but we want             * to return the one before that, if possible. So do an extra             * fetch.             * 在EOF处反转方向,第一次的gettupleslot调用会提取最后添加的元组;             * 但如可能,希望返回在此之前的元组,执行额外的提取操作。             */            if (!tuplestore_advance(tuplestorestate, forward))                return NULL;    /* the tuplestore must be empty */        }        eof_tuplestore = false;    }    /*     * If we can fetch another tuple from the tuplestore, return it.     * 如能从tuplestore中提取另外一个tuple,返回     */    slot = node->ss.ps.ps_ResultTupleSlot;    if (!eof_tuplestore)    {        if (tuplestore_gettupleslot(tuplestorestate, forward, false, slot))            return slot;        if (forward)            eof_tuplestore = true;    }    /*     * If necessary, try to fetch another row from the subplan.     * 如需要(tuplestore末尾),尝试从subplan中提取另外一行     *     * Note: the eof_underlying state variable exists to short-circuit further     * subplan calls.  It's not optional, unfortunately, because some plan     * node types are not robust about being called again when they've already     * returned NULL.     */    if (eof_tuplestore && !node->eof_underlying)    {        PlanState  *outerNode;        TupleTableSlot *outerslot;        /*         * We can only get here with forward==true, so no need to worry about         * which direction the subplan will go.         */        outerNode = outerPlanState(node);        outerslot = ExecProcNode(outerNode);        if (TupIsNull(outerslot))        {            node->eof_underlying = true;            return NULL;        }        /*         * Append a copy of the returned tuple to tuplestore.  NOTE: because         * the tuplestore is certainly in EOF state, its read position will         * move forward over the added tuple.  This is what we want.         * 追加返回的元组到tuplestore中。         * 注意:因为tuplestore当前处于EOF状态,读取的位置会前移至已添加的tuple前面,这是我们希望看到的。         */        if (tuplestorestate)            tuplestore_puttupleslot(tuplestorestate, outerslot);        ExecCopySlot(slot, outerslot);        return slot;    }    /*     * Nothing left ...     */    return ExecClearTuple(slot);}

tuplestore_begin_heap
初始化tuplestore

/* * tuplestore_begin_heap * * Create a new tuplestore; other types of tuple stores (other than * "heap" tuple stores, for heap tuples) are possible, but not presently * implemented. * 创建新的tuplestore:目前仅实现了heap tuples。 * * randomAccess: if true, both forward and backward accesses to the * tuple store are allowed. * randomAccess : 如为T,支持往前和往后访问。 * * interXact: if true, the files used for on-disk storage persist beyond the * end of the current transaction.  NOTE: It's the caller's responsibility to * create such a tuplestore in a memory context and resource owner that will * also survive transaction boundaries, and to ensure the tuplestore is closed * when it's no longer wanted. * interXact : 如为T,磁盘上的存储文件在当前事务结束后也会一直保持。 * 注意:调用者有责任在事务边界内存活的内存上下文和资源拥有者中创建tuplestore并确保不再使用时销毁tuplestore。 * * maxKBytes: how much data to store in memory (any data beyond this * amount is paged to disk).  When in doubt, use work_mem. * maxKBytes:有多少数据需要存储到内存中(超长此大小的会分页到磁盘上)。 * 如存在问题,则使用work_mem。 */Tuplestorestate *tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes){    Tuplestorestate *state;    int            eflags;    /*     * This interpretation of the meaning of randomAccess is compatible with     * the pre-8.3 behavior of tuplestores.     */    eflags = randomAccess ?        (EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND) :        (EXEC_FLAG_REWIND);    state = tuplestore_begin_common(eflags, interXact, maxKBytes);    state->copytup = copytup_heap;    state->writetup = writetup_heap;    state->readtup = readtup_heap;    return state;}/* *        tuplestore_begin_xxx * * Initialize for a tuple store operation. * 初始化tuplestore */static Tuplestorestate *tuplestore_begin_common(int eflags, bool interXact, int maxKBytes){    Tuplestorestate *state;    state = (Tuplestorestate *) palloc0(sizeof(Tuplestorestate));    state->status = TSS_INMEM;    state->eflags = eflags;    state->interXact = interXact;    state->truncated = false;    state->allowedMem = maxKBytes * 1024L;    state->availMem = state->allowedMem;    state->myfile = NULL;    state->context = CurrentMemoryContext;    state->resowner = CurrentResourceOwner;    state->memtupdeleted = 0;    state->memtupcount = 0;    state->tuples = 0;    /*     * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;     * see comments in grow_memtuples().     */    state->memtupsize = Max(16384 / sizeof(void *),                            ALLOCSET_SEPARATE_THRESHOLD / sizeof(void *) + 1);    state->growmemtuples = true;    state->memtuples = (void **) palloc(state->memtupsize * sizeof(void *));    USEMEM(state, GetMemoryChunkSpace(state->memtuples));    state->activeptr = 0;    state->readptrcount = 1;    state->readptrsize = 8;        /* arbitrary */    state->readptrs = (TSReadPointer *)        palloc(state->readptrsize * sizeof(TSReadPointer));    state->readptrs[0].eflags = eflags;    state->readptrs[0].eof_reached = false;    state->readptrs[0].current = 0;    return state;}

tuplestore_advance
从tuplestore前进一行

/* * tuplestore_advance - exported function to adjust position without fetching * * We could optimize this case to avoid palloc/pfree overhead, but for the * moment it doesn't seem worthwhile. */booltuplestore_advance(Tuplestorestate *state, bool forward){    void       *tuple;    bool        should_free;    tuple = tuplestore_gettuple(state, forward, &should_free);    if (tuple)    {        if (should_free)            pfree(tuple);        return true;    }    else    {        return false;    }}

tuplestore_gettupleslot
获取slot

/* * tuplestore_gettupleslot - exported function to fetch a MinimalTuple * 提取MinimalTuple * * If successful, put tuple in slot and return true; else, clear the slot * and return false. * 如成功,则把元组塞进slot中并返回T,否则清空slot返回F * * If copy is true, the slot receives a copied tuple (allocated in current * memory context) that will stay valid regardless of future manipulations of * the tuplestore's state.  If copy is false, the slot may just receive a * pointer to a tuple held within the tuplestore.  The latter is more * efficient but the slot contents may be corrupted if additional writes to * the tuplestore occur.  (If using tuplestore_trim, see comments therein.) * 如copy为T,则slot会接收拷贝之后的元组,独立于tuplestore的状态. * 如copy为F,则slot可能接收到tuplestore中的元组指针. */booltuplestore_gettupleslot(Tuplestorestate *state, bool forward,                        bool copy, TupleTableSlot *slot){    MinimalTuple tuple;    bool        should_free;    tuple = (MinimalTuple) tuplestore_gettuple(state, forward, &should_free);    if (tuple)    {        if (copy && !should_free)        {            tuple = heap_copy_minimal_tuple(tuple);            should_free = true;        }        ExecStoreMinimalTuple(tuple, slot, should_free);        return true;    }    else    {        ExecClearTuple(slot);        return false;    }}

tuplestore_gettuple
返回下一个元组

/* * Fetch the next tuple in either forward or back direction. * Returns NULL if no more tuples.  If should_free is set, the * caller must pfree the returned tuple when done with it. * 往前/后返回下一个元组。 * 如无更多元组,返回NULL。如should_free有值,调用者必须在处理完毕后释放返回的元组 * * Backward scan is only allowed if randomAccess was set true or * EXEC_FLAG_BACKWARD was specified to tuplestore_set_eflags(). * 在randomAccess设置为T或者指定EXEC_FLAG_BACKWARD时才允许。 */static void *tuplestore_gettuple(Tuplestorestate *state, bool forward,                    bool *should_free){    TSReadPointer *readptr = &state->readptrs[state->activeptr];//读取指针    unsigned int tuplen;    void       *tup;    Assert(forward || (readptr->eflags & EXEC_FLAG_BACKWARD));    switch (state->status)    {        case TSS_INMEM://内存中            *should_free = false;            if (forward)            {                if (readptr->eof_reached)                    return NULL;                if (readptr->current < state->memtupcount)                {                    /* We have another tuple, so return it */                    return state->memtuples[readptr->current++];                }                readptr->eof_reached = true;                return NULL;            }            else            {                /*                 * if all tuples are fetched already then we return last                 * tuple, else tuple before last returned.                 */                if (readptr->eof_reached)                {                    readptr->current = state->memtupcount;                    readptr->eof_reached = false;                }                else                {                    if (readptr->current <= state->memtupdeleted)                    {                        Assert(!state->truncated);                        return NULL;                    }                    readptr->current--; /* last returned tuple */                }                if (readptr->current <= state->memtupdeleted)                {                    Assert(!state->truncated);                    return NULL;                }                return state->memtuples[readptr->current - 1];            }            break;        case TSS_WRITEFILE://写文件            /* Skip state change if we'll just return NULL */            //如只需要返回NULL则跳过状态变换            if (readptr->eof_reached && forward)                return NULL;            /*             * Switch from writing to reading.             * 从写切换至读             */            BufFileTell(state->myfile,                        &state->writepos_file, &state->writepos_offset);            if (!readptr->eof_reached)                if (BufFileSeek(state->myfile,                                readptr->file, readptr->offset,                                SEEK_SET) != 0)                    ereport(ERROR,                            (errcode_for_file_access(),                             errmsg("could not seek in tuplestore temporary file: %m")));            state->status = TSS_READFILE;            /* FALLTHROUGH */            //进入读文件状态的处理逻辑        case TSS_READFILE:            *should_free = true;            if (forward)            {                //往前读                if ((tuplen = getlen(state, true)) != 0)                {                    tup = READTUP(state, tuplen);                    return tup;                }                else                {                    readptr->eof_reached = true;                    return NULL;                }            }            /*             * Backward.             * 往后读             *             * if all tuples are fetched already then we return last tuple,             * else tuple before last returned.             * 如果所有元组时已提取,则返回最后一个元组,否则返回先前最后返回的元组             *             * Back up to fetch previously-returned tuple's ending length             * word. If seek fails, assume we are at start of file.             * 往回向上提取先前已返回的元组结束长度字,如检索失败,假定处于文件的开始位置.             */            if (BufFileSeek(state->myfile, 0, -(long) sizeof(unsigned int),                            SEEK_CUR) != 0)            {                /* even a failed backwards fetch gets you out of eof state */                readptr->eof_reached = false;                Assert(!state->truncated);                return NULL;            }            tuplen = getlen(state, false);            if (readptr->eof_reached)            {                readptr->eof_reached = false;                /* We will return the tuple returned before returning NULL */                //在返回NULL前返回先前已返回的元组            }            else            {                /*                 * Back up to get ending length word of tuple before it.                 * 获取结束长度字                 */                if (BufFileSeek(state->myfile, 0,                                -(long) (tuplen + 2 * sizeof(unsigned int)),                                SEEK_CUR) != 0)                {                    /*                     * If that fails, presumably the prev tuple is the first                     * in the file.  Back up so that it becomes next to read                     * in forward direction (not obviously right, but that is                     * what in-memory case does).                     */                    if (BufFileSeek(state->myfile, 0,                                    -(long) (tuplen + sizeof(unsigned int)),                                    SEEK_CUR) != 0)                        ereport(ERROR,                                (errcode_for_file_access(),                                 errmsg("could not seek in tuplestore temporary file: %m")));                    Assert(!state->truncated);                    return NULL;                }                tuplen = getlen(state, false);            }            /*             * Now we have the length of the prior tuple, back up and read it.             * Note: READTUP expects we are positioned after the initial             * length word of the tuple, so back up to that point.             * 已获得优先元组的长度,读取之.             */            if (BufFileSeek(state->myfile, 0,                            -(long) tuplen,                            SEEK_CUR) != 0)                ereport(ERROR,                        (errcode_for_file_access(),                         errmsg("could not seek in tuplestore temporary file: %m")));            tup = READTUP(state, tuplen);            return tup;        default:            elog(ERROR, "invalid tuplestore state");            return NULL;        /* keep compiler quiet */    }}

三、跟踪分析

执行SQL:

[pg12@localhost ~]$ psql -d testdbTiming is on.Expanded display is used automatically.psql (12.0)Type "help" for help.[local]:5432 pg12@testdb=# [local]:5432 pg12@testdb=# select * from tbl; id | value ----+-------  1 |     2(1 row)Time: 2.678 ms[local]:5432 pg12@testdb=# select count(*) from t_big_null;  count   ---------- 10000001(1 row)Time: 679.972 ms[local]:5432 pg12@testdb=# analyze tbl;ANALYZETime: 64.442 ms[local]:5432 pg12@testdb=# analyze t_big_null;ANALYZETime: 434.702 ms[local]:5432 pg12@testdb=# [local]:5432 pg12@testdb=# select pg_backend_pid(); pg_backend_pid ----------------          18758(1 row)Time: 1.990 ms[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b);

启动gdb跟踪

(gdb) b ExecMaterialBreakpoint 1 at 0x720edb: file nodeMaterial.c, line 41.(gdb) cContinuing.Breakpoint 1, ExecMaterial (pstate=0x1230128) at nodeMaterial.c:4141        MaterialState *node = castNode(MaterialState, pstate);(gdb)

单步调试

(gdb) n49        CHECK_FOR_INTERRUPTS();(gdb) 54        estate = node->ss.ps.state;(gdb) 55        dir = estate->es_direction;(gdb) 56        forward = ScanDirectionIsForward(dir);(gdb) 57        tuplestorestate = node->tuplestorestate;(gdb) 62        if (tuplestorestate == NULL && node->eflags != 0)(gdb) 64            tuplestorestate = tuplestore_begin_heap(true, false, work_mem);(gdb) 65            tuplestore_set_eflags(tuplestorestate, node->eflags);(gdb) 66            if (node->eflags & EXEC_FLAG_MARK)(gdb) 78            node->tuplestorestate = tuplestorestate;(gdb) 85        eof_tuplestore = (tuplestorestate == NULL) ||(gdb) 86            tuplestore_ateof(tuplestorestate);(gdb) 85        eof_tuplestore = (tuplestorestate == NULL) ||(gdb) 88        if (!forward && eof_tuplestore)(gdb) p eof_tuplestore$1 = false(gdb)

进入tuplestore_gettupleslot

(gdb) n107        slot = node->ss.ps.ps_ResultTupleSlot;(gdb) 108        if (!eof_tuplestore)(gdb) 110            if (tuplestore_gettupleslot(tuplestorestate, forward, false, slot))(gdb) steptuplestore_gettupleslot (state=0x3069c18, forward=true, copy=false, slot=0x30687a8)    at tuplestore.c:10841084        tuple = (MinimalTuple) tuplestore_gettuple(state, forward, &should_free);(gdb)

进入tuplestore_gettuple

(gdb) steptuplestore_gettuple (state=0x3069c18, forward=true, should_free=0x7ffd18474ff7)    at tuplestore.c:906906        TSReadPointer *readptr = &state->readptrs[state->activeptr];(gdb)

tuplestore_gettuple->文件读写指针信息

(gdb) n910        Assert(forward || (readptr->eflags & EXEC_FLAG_BACKWARD));(gdb) p *readptr$2 = {eflags = 2, eof_reached = false, current = 0, file = 2139062143,   offset = 9187201950435737471}

tuplestore_gettuple->当前状态为TSS_INMEM

(gdb) n912        switch (state->status)(gdb) p *state$3 = {status = TSS_INMEM, eflags = 2, backward = false, interXact = false,   truncated = false, availMem = 4177896, allowedMem = 4194304, tuples = 0, myfile = 0x0,   context = 0x3067da0, resowner = 0x2fa62c8, copytup = 0xaba7bd ,   writetup = 0xaba811 , readtup = 0xaba9d9 ,   memtuples = 0x3051e90, memtupdeleted = 0, memtupcount = 0, memtupsize = 2048,   growmemtuples = true, readptrs = 0x3077f70, activeptr = 0, readptrcount = 1,   readptrsize = 8, writepos_file = 0, writepos_offset = 0}(gdb) p state->status$4 = TSS_INMEM(gdb)

tuplestore_gettuple->返回NULL

(gdb) n915                *should_free = false;(gdb) n916                if (forward)(gdb) 918                    if (readptr->eof_reached)(gdb) 920                    if (readptr->current < state->memtupcount)(gdb) p readptr->current$5 = 0(gdb) p state->memtupcount$6 = 0(gdb) n925                    readptr->eof_reached = true;(gdb) 926                    return NULL;(gdb) 1062    }(gdb)

tuplestore_gettupleslot->返回false

(gdb) ntuplestore_gettupleslot (state=0x3069c18, forward=true, copy=false, slot=0x30687a8)    at tuplestore.c:10861086        if (tuple)(gdb) 1098            ExecClearTuple(slot);(gdb) 1099            return false;(gdb)

回到ExecMaterial

(gdb) n1101    }(gdb) ExecMaterial (pstate=0x3068158) at nodeMaterial.c:112112            if (forward)(gdb) 113                eof_tuplestore = true;(gdb)

从outerPlan中获取一行(即从t_big_null中获取一行)

(gdb) n124        if (eof_tuplestore && !node->eof_underlying)(gdb) p node->eof_underlying$7 = false(gdb) n133            outerNode = outerPlanState(node);(gdb) ####define innerPlanState(node)        (((PlanState *)(node))->righttree)#define outerPlanState(node)        (((PlanState *)(node))->lefttree)###134            outerslot = ExecProcNode(outerNode);(gdb) p outerNode$8 = (PlanState *) 0x3068270(gdb) p *outerNode$9 = {type = T_SeqScanState, plan = 0x3037628, state = 0x3067eb8,   ExecProcNode = 0x6f802a , ExecProcNodeReal = 0x72b904 ,   instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0,   lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0,   ps_ResultTupleDesc = 0x3068578, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x3068388,   ps_ProjInfo = 0x0, scandesc = 0x7fab449cae98,   scanops = 0xc3e780 , outerops = 0x0, innerops = 0x0,   resultops = 0xc3e780 , scanopsfixed = true,   outeropsfixed = false, inneropsfixed = false, resultopsfixed = true, scanopsset = true,   outeropsset = false, inneropsset = false, resultopsset = true}(gdb) p *outerNode->state$10 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot = 0x2f9cd10,   es_crosscheck_snapshot = 0x0, es_range_table = 0x3042130,   es_range_table_array = 0x3068108, es_range_table_size = 2, es_relations = 0x3068130,   es_rowmarks = 0x0, es_plannedstmt = 0x3042438,   es_sourceText = 0x2f74d88 "select * from tbl a where a.id not in (select b.id from t_big_null b);", es_junkFilter = 0x0, es_output_cid = 0, es_result_relations = 0x0,   es_num_result_relations = 0, es_result_relation_info = 0x0,   es_root_result_relations = 0x0, es_num_root_result_relations = 0,   es_partition_directory = 0x0, es_tuple_routing_result_relations = 0x0,   es_trig_target_relations = 0x0, es_param_list_info = 0x0,   es_param_exec_vals = 0x30680d0, es_queryEnv = 0x0, es_query_cxt = 0x3067da0,   es_tupleTable = 0x3068540, es_processed = 0, es_top_eflags = 16, es_instrument = 0,   es_finished = false, es_exprcontexts = 0x3068448, es_subplanstates = 0x3068950,   es_auxmodifytables = 0x0, es_per_tuple_exprcontext = 0x0, es_epq_active = 0x0,   es_use_parallel_mode = false, es_query_dsa = 0x0, es_jit_flags = 25, es_jit = 0x0,   es_jit_worker_instr = 0x0}(gdb) p ((PlanState *)node)->righttree$21 = (struct PlanState *) 0x0(gdb)

回过头来看执行计划,Materialize Node的lefttree是Seq Scan on public.t_big_null b,righttree为NULL。

[local]:5432 pg12@testdb=# explain verbose select * from tbl a where a.id not in (select b.id from t_big_null b);                                         QUERY PLAN                                        -------------------------------------------------------------------------------------------- Seq Scan on public.tbl a  (cost=0.00..129156.33 rows=1 width=8)   Output: a.id, a.value   Filter: (NOT (SubPlan 1))   SubPlan 1     ->  Materialize  (cost=0.00..233310.68 rows=9999979 width=4)           Output: b.id           ->  Seq Scan on public.t_big_null b  (cost=0.00..144247.79 rows=9999979 width=4)                 Output: b.id(8 rows)Time: 7.681 ms

获取outerslot

(gdb) n135            if (TupIsNull(outerslot))(gdb) p *outerslot$16 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 0,   tts_ops = 0xc3e780 , tts_tupleDescriptor = 0x7fab449cae98,   tts_values = 0x30684f0, tts_isnull = 0x30684f8, tts_mcxt = 0x3067da0, tts_tid = {    ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, tts_tableOid = 49155}(gdb) p *outerslot->tts_values$17 = 0(gdb) p outerslot->tts_values[1]$18 = 0(gdb) p outerslot->tts_values[0]$19 = 0(gdb) p *outerslot->tts_tupleDescriptor$20 = {natts = 1, tdtypeid = 49157, tdtypmod = -1, tdrefcount = 2, constr = 0x0,   attrs = 0x7fab449caeb0}

获取outerslot后,put到tuplestore中

(gdb) p *node$22 = {ss = {ps = {type = T_MaterialState, plan = 0x3040a60, state = 0x3067eb8,       ExecProcNode = 0x720ecf , ExecProcNodeReal = 0x720ecf ,       instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0,       lefttree = 0x3068270, righttree = 0x0, initPlan = 0x0, subPlan = 0x0,       chgParam = 0x0, ps_ResultTupleDesc = 0x3068690, ps_ResultTupleSlot = 0x30687a8,       ps_ExprContext = 0x0, ps_ProjInfo = 0x0, scandesc = 0x3068578,       scanops = 0xc3e720 , outerops = 0x0, innerops = 0x0,       resultops = 0xc3e720 , scanopsfixed = true,       outeropsfixed = false, inneropsfixed = false, resultopsfixed = true,       scanopsset = true, outeropsset = false, inneropsset = false, resultopsset = true},     ss_currentRelation = 0x0, ss_currentScanDesc = 0x0, ss_ScanTupleSlot = 0x3068868},   eflags = 2, eof_underlying = false, tuplestorestate = 0x3069c18}(gdb) n146            if (tuplestorestate)(gdb) 147                tuplestore_puttupleslot(tuplestorestate, outerslot);(gdb) p outerslot->tts_values[0]$23 = 0(gdb) n149            ExecCopySlot(slot, outerslot);(gdb) p outerslot->tts_values[0]$24 = 0(gdb) n150            return slot;(gdb) p outerslot->tts_values[0]$25 = 0(gdb) p slot->tts_values[0]$26 = 0(gdb) n157    }(gdb)

继续"物化"

(gdb) nExecProcNodeFirst (node=0x3068158) at execProcnode.c:446446    }(gdb) cContinuing.Breakpoint 1, ExecMaterial (pstate=0x3068158) at nodeMaterial.c:4141        MaterialState *node = castNode(MaterialState, pstate);(gdb) n49        CHECK_FOR_INTERRUPTS();(gdb) 54        estate = node->ss.ps.state;(gdb) 55        dir = estate->es_direction;(gdb) 56        forward = ScanDirectionIsForward(dir);(gdb) 57        tuplestorestate = node->tuplestorestate;(gdb) 62        if (tuplestorestate == NULL && node->eflags != 0)(gdb) 85        eof_tuplestore = (tuplestorestate == NULL) ||(gdb) 86            tuplestore_ateof(tuplestorestate);(gdb) 85        eof_tuplestore = (tuplestorestate == NULL) ||(gdb) 88        if (!forward && eof_tuplestore)(gdb) 107        slot = node->ss.ps.ps_ResultTupleSlot;(gdb) 108        if (!eof_tuplestore)(gdb) 124        if (eof_tuplestore && !node->eof_underlying)(gdb) 133            outerNode = outerPlanState(node);(gdb) p eof_tuplestore$27 = true(gdb) n134            outerslot = ExecProcNode(outerNode);(gdb) 135            if (TupIsNull(outerslot))(gdb) 146            if (tuplestorestate)(gdb) 147                tuplestore_puttupleslot(tuplestorestate, outerslot);(gdb) 149            ExecCopySlot(slot, outerslot);(gdb) 150            return slot;(gdb) p slot->tts_values[0]$28 = 2(gdb)

第一次执行时间较久,第二次相对快2个数量级,需要继续研究。

[local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b); id | value ----+-------(0 rows)Time: 3633462.666 ms (01:00:33.463) --> 包括了debug的时间,实际时间是5s左右[local]:5432 pg12@testdb=# [local]:5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b); id | value ----+-------(0 rows)Time: 6.480 ms --> 第2+次就快很多[local]:5432 pg12@testdb=#

DONE

四、参考资料

N/A

查询 表达式 节点 状态 输出 参数 操作符 组合 一行 文件 结果 存储 信息 指针 形式 成本 标识 结构 长度 规划 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 超融合服务器最大特性是 sql语言编写一个数据库 软件开发是否提供源码 网络安全设备信息流程 软件开发4年工资 花点财网络技术 为什么数据库表字段要用英文 区块链技术 不是数据库 网络安全人员个人述职报告 徐汇区立体化网络技术应用范围 张掖市网络安全应急预案 在服务器制作粒子效果 软件开发需要人员和职责 用数据库管理电影票 信息网络安全宣传资料 中国计算机软件开发先驱者 黑板报题目网络安全手抄报 杭州道争网络技术有限公司 高港区环保网络技术联系方式 网络安全标语2020 数据库库基本概念 关于网络安全的一些事情 法师宝宝数据库 武汉优航网络技术有限公司 本地的东西怎么复制到服务器 小学生软件开发的步骤 进入网络安全密钥怎么连不上呢 云数据库备份技术路线 服务器普通账号资源管理器密码 软件开发进度横道图
0