千家信息网

PostgreSQL 源码解读(176)- 查询#94(语法分析:gram.y)#3

发表于:2025-01-27 作者:千家信息网编辑
千家信息网最后更新 2025年01月27日,本节继续介绍PostgreSQL的语法分析定义文件gram.y的第三部分Productions(产生式).Bison输入文件的组成:%{Declarations%}Definitions%%Produ
千家信息网最后更新 2025年01月27日PostgreSQL 源码解读(176)- 查询#94(语法分析:gram.y)#3

本节继续介绍PostgreSQL的语法分析定义文件gram.y的第三部分Productions(产生式).
Bison输入文件的组成:

%{Declarations%}Definitions%%Productions%%User subroutines

一、Productions

Productions即产生式,这是用户编写的语法产生式,产生式的书写格式如下:

S -> X \nX -> X + X | X - X | T_NUMBER

S -> X \n成为产生式,第一条产生式的最左边的符号成为起始符号,在这里是符号S.
为了避免出现递归解析,Bison因此会在最前面多添加一条产生式S' -> S,S'为起始符号.
在Bison中,符号":"表示一条"->",同一个非终结符的不同产生式用"|"隔开,用";"结束.每条产生式的后面花括号内是一段C代码,这些代码在该产生式被应用时执行,成为Action(动作),产生式的右边是ε(空集合)时,用注释/* empty */代替.
产生式中的非终结符不需要预先定义,Bison会自动根据所有产生式的左边符号来确定哪些符号是非终结符;终结符中,单字符token(token type值和字符的ASCII码相同)也不需要预先定义,在产生式内部直接用单引号括起来即可,其他类型的token则需要预先在 Definitions段中定义好,如%token ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER等,Bison会自动为这种token分配一个编号,再写到gram.h 文件中去,打开该文件,可以看到如下代码:

[root@localhost src]# vim ./include/parser/gram.h.../* Token type.  */ 44 #ifndef YYTOKENTYPE 45 # define YYTOKENTYPE 46   enum yytokentype 47   { 48     IDENT = 258, 49     FCONST = 259, 50     SCONST = 260, 51     BCONST = 261, 52     XCONST = 262, 53     Op = 263, 54     ICONST = 264, 55     PARAM = 265, ....

编号从258开始,根据gram.y中的顺序逐个定义.

...%token     IDENT FCONST SCONST BCONST XCONST Op%token     ICONST PARAM%token            TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER%token            LESS_EQUALS GREATER_EQUALS NOT_EQUALS%token  ABORT_P ABSOLUTE_P ACCESS ACTION ADD_P ADMIN AFTER    AGGREGATE ALL ALSO ALTER ALWAYS ANALYSE ANALYZE AND ANY ARRAY AS ASC    ASSERTION ASSIGNMENT ASYMMETRIC AT ATTACH ATTRIBUTE AUTHORIZATION...

这些token定义在scan.l中可直接使用.

#include "parser/gramparse.h" --> #include "parser/gram.h"

Bison会根据产生式以及符号优先级转化为LALR(1)动作表输出到gram.c文件中去.在gram.c文件中,PG根据自定义语法文件生成一个函数int base_yyparse (core_yyscan_t yyscanner);该函数按LR(1)解析流程对词法分析得到的token流进行解析,每当它需要读入下一个符号时,它就执行一次s = yylex() ,每当它要执行一个折叠(reduce)动作时,这个reduce所应用的产生式后面C代码将被执行,执行完后才将相应的状态出栈。
下面是gram.c中yyparse的部分代码:

/*----------.| yyparse.  |`----------*/intyyparse (core_yyscan_t yyscanner){/* The lookahead symbol.  */int yychar;/* The semantic value of the lookahead symbol.  *//* Default value used for initialization, for pacifying older GCCs   or non-GCC compilers.  */YY_INITIAL_VALUE (static YYSTYPE yyval_default;)YYSTYPE yylval YY_INITIAL_VALUE (= yyval_default);/* Location data for the lookahead symbol.  */static YYLTYPE yyloc_default# if defined YYLTYPE_IS_TRIVIAL && YYLTYPE_IS_TRIVIAL  = { 1, 1, 1, 1 }# endif;YYLTYPE yylloc = yyloc_default;    /* Number of syntax errors so far.  */    int yynerrs;    int yystate;    /* Number of tokens to shift before error messages enabled.  */    int yyerrstatus;    /* The stacks and their tools:       'yyss': related to states.       'yyvs': related to semantic values.       'yyls': related to locations.       Refer to the stacks through separate pointers, to allow yyoverflow       to reallocate them elsewhere.  */    /* The state stack.  */    yytype_int16 yyssa[YYINITDEPTH];    yytype_int16 *yyss;    yytype_int16 *yyssp;    /* The semantic value stack.  */    YYSTYPE yyvsa[YYINITDEPTH];    YYSTYPE *yyvs;    YYSTYPE *yyvsp;    /* The location stack.  */    YYLTYPE yylsa[YYINITDEPTH];    YYLTYPE *yyls;    YYLTYPE *yylsp;    /* The locations where the error started and ended.  */    YYLTYPE yyerror_range[3];    YYSIZE_T yystacksize;  int yyn;  int yyresult;  /* Lookahead token as an internal (translated) token number.  */  int yytoken = 0;  /* The variables used to return semantic value and location from the     action routines.  */  YYSTYPE yyval;  YYLTYPE yyloc;#if YYERROR_VERBOSE  /* Buffer for error messages, and its allocated size.  */  char yymsgbuf[128];  char *yymsg = yymsgbuf;  YYSIZE_T yymsg_alloc = sizeof yymsgbuf;#endif#define YYPOPSTACK(N)   (yyvsp -= (N), yyssp -= (N), yylsp -= (N))  /* The number of symbols on the RHS of the reduced rule.     Keep to zero when no symbol should be popped.  */  int yylen = 0;  yyssp = yyss = yyssa;  yyvsp = yyvs = yyvsa;  yylsp = yyls = yylsa;  yystacksize = YYINITDEPTH;...

二、源码

下面是gram.y产生式定义的部分源码

/* *    The target production for the whole parse. */stmtblock:    stmtmulti            {                pg_yyget_extra(yyscanner)->parsetree = $1;            }        ;/* * At top level, we wrap each stmt with a RawStmt node carrying start location * and length of the stmt's text.  Notice that the start loc/len are driven * entirely from semicolon locations (@2).  It would seem natural to use * @1 or @3 to get the true start location of a stmt, but that doesn't work * for statements that can start with empty nonterminals (opt_with_clause is * the main offender here); as noted in the comments for YYLLOC_DEFAULT, * we'd get -1 for the location in such cases. * We also take care to discard empty statements entirely. */stmtmulti:    stmtmulti ';' stmt                {                    if ($1 != NIL)                    {                        /* update length of previous stmt */                        updateRawStmtEnd(llast_node(RawStmt, $1), @2);                    }                    if ($3 != NULL)                        $$ = lappend($1, makeRawStmt($3, @2 + 1));                    else                        $$ = $1;                }            | stmt                {                    if ($1 != NULL)                        $$ = list_make1(makeRawStmt($1, 0));                    else                        $$ = NIL;                }        ;stmt :            AlterEventTrigStmt            | AlterCollationStmt            | AlterDatabaseStmt            | AlterDatabaseSetStmt            | AlterDefaultPrivilegesStmt            | AlterDomainStmt            | AlterEnumStmt            | AlterExtensionStmt            | AlterExtensionContentsStmt            | AlterFdwStmt            | AlterForeignServerStmt            | AlterForeignTableStmt            | AlterFunctionStmt            | AlterGroupStmt            | AlterObjectDependsStmt            | AlterObjectSchemaStmt            | AlterOwnerStmt            | AlterOperatorStmt            | AlterPolicyStmt            | AlterSeqStmt            | AlterSystemStmt            | AlterTableStmt            | AlterTblSpcStmt            | AlterCompositeTypeStmt            | AlterPublicationStmt            | AlterRoleSetStmt            | AlterRoleStmt            | AlterSubscriptionStmt            | AlterTSConfigurationStmt            | AlterTSDictionaryStmt            | AlterUserMappingStmt            | AnalyzeStmt            | CallStmt            | CheckPointStmt            | ClosePortalStmt            | ClusterStmt            | CommentStmt            | ConstraintsSetStmt            | CopyStmt            | CreateAmStmt            | CreateAsStmt            | CreateAssertStmt            | CreateCastStmt            | CreateConversionStmt            | CreateDomainStmt            | CreateExtensionStmt            | CreateFdwStmt            | CreateForeignServerStmt            | CreateForeignTableStmt            | CreateFunctionStmt            | CreateGroupStmt            | CreateMatViewStmt            | CreateOpClassStmt            | CreateOpFamilyStmt            | CreatePublicationStmt            | AlterOpFamilyStmt            | CreatePolicyStmt            | CreatePLangStmt            | CreateSchemaStmt            | CreateSeqStmt            | CreateStmt            | CreateSubscriptionStmt            | CreateStatsStmt            | CreateTableSpaceStmt            | CreateTransformStmt            | CreateTrigStmt            | CreateEventTrigStmt            | CreateRoleStmt            | CreateUserStmt            | CreateUserMappingStmt            | CreatedbStmt            | DeallocateStmt            | DeclareCursorStmt            | DefineStmt            | DeleteStmt            | DiscardStmt            | DoStmt            | DropAssertStmt            | DropCastStmt            | DropOpClassStmt            | DropOpFamilyStmt            | DropOwnedStmt            | DropPLangStmt            | DropStmt            | DropSubscriptionStmt            | DropTableSpaceStmt            | DropTransformStmt            | DropRoleStmt            | DropUserMappingStmt            | DropdbStmt            | ExecuteStmt            | ExplainStmt            | FetchStmt            | GrantStmt            | GrantRoleStmt            | ImportForeignSchemaStmt            | IndexStmt            | InsertStmt            | ListenStmt            | RefreshMatViewStmt            | LoadStmt            | LockStmt            | NotifyStmt            | PrepareStmt            | ReassignOwnedStmt            | ReindexStmt            | RemoveAggrStmt            | RemoveFuncStmt            | RemoveOperStmt            | RenameStmt            | RevokeStmt            | RevokeRoleStmt            | RuleStmt            | SecLabelStmt            | SelectStmt            | TransactionStmt            | TruncateStmt            | UnlistenStmt            | UpdateStmt            | VacuumStmt            | VariableResetStmt            | VariableSetStmt            | VariableShowStmt            | ViewStmt            | /*EMPTY*/                { $$ = NULL; }        ;/***************************************************************************** * * CALL statement * *****************************************************************************/CallStmt:    CALL func_application                {                    CallStmt *n = makeNode(CallStmt);                    n->funccall = castNode(FuncCall, $2);                    $$ = (Node *)n;                }        ;...

简单解析如下:
1.stmtblock
stmtblock: stmtmulti
stmtblock为起始符号,最终应折叠(reduce)为该符号,否则会有语法错误.
执行的逻辑是:pg_yyget_extra(yyscanner)->parsetree = $1;
亦即完成语法解析,生成语法解析树parsetree.

2.stmtmulti
tmtmulti: stmtmulti ';' stmt
左递归产生式,PG可接受多个以分号";"分隔的语句,每个语句的定义为stmt

3.stmt

stmt :            AlterEventTrigStmt            | AlterCollationStmt            ...            | SelectStmt            ...

stmt包括N多种语句,我们看最常见的SelectStmt语句

4.SelectStmt

SelectStmt: select_no_parens            %prec UMINUS            | select_with_parens        %prec UMINUS        ;...select_no_parens:            simple_select                        { $$ = $1; }            | select_clause sort_clause                {                    insertSelectOptions((SelectStmt \*) $1, $2, NIL,                                        NULL, NULL, NULL,                                        yyscanner);                    $$ = $1;                }...simple_select:            SELECT opt_all_clause opt_target_list            into_clause from_clause where_clause            group_clause having_clause window_clause                {                    SelectStmt \*n = makeNode(SelectStmt);                    n->targetList = $3;                    n->intoClause = $4;                    n->fromClause = $5;                    n->whereClause = $6;                    n->groupClause = $7;                    n->havingClause = $8;                    n->windowClause = $9;                    $$ = (Node \*)n;                }            | SELECT distinct_clause target_list...

三、参考资料

Flex&Bison

0