千家信息网

Oracle 学习之 数据仓库(二) Dimension 的理解

发表于:2024-10-21 作者:千家信息网编辑
千家信息网最后更新 2024年10月21日,在数据仓库中,有事实表、维度表两个概念。事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。维度表是维度属性的集合。是分析问
千家信息网最后更新 2024年10月21日Oracle 学习之 数据仓库(二) Dimension 的理解

在数据仓库中,有事实表、维度表两个概念。

事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。

维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。

如图示


我们以sh用户下的sales表和times表来看,

SALES为事实表

SQL> desc sales Name                                         Null?    Type ----------------------------------------- -------- ---------------------------- PROD_ID                              NOT NULL NUMBER CUST_ID                              NOT NULL NUMBER TIME_ID                              NOT NULL DATE CHANNEL_ID                                   NOT NULL NUMBER PROMO_ID                             NOT NULL NUMBER QUANTITY_SOLD                                NOT NULL NUMBER(10,2) AMOUNT_SOLD                                  NOT NULL NUMBER(10,2)

TIMES为维度表

SQL> desc times Name                                         Null?    Type ----------------------------------------- -------- ---------------------------- TIME_ID                              NOT NULL DATE DAY_NAME                             NOT NULL VARCHAR2(9) DAY_NUMBER_IN_WEEK                           NOT NULL NUMBER(1) DAY_NUMBER_IN_MONTH                          NOT NULL NUMBER(2) CALENDAR_WEEK_NUMBER                         NOT NULL NUMBER(2) FISCAL_WEEK_NUMBER                           NOT NULL NUMBER(2) WEEK_ENDING_DAY                      NOT NULL DATE WEEK_ENDING_DAY_ID                           NOT NULL NUMBER CALENDAR_MONTH_NUMBER                        NOT NULL NUMBER(2) FISCAL_MONTH_NUMBER                          NOT NULL NUMBER(2) CALENDAR_MONTH_DESC                          NOT NULL VARCHAR2(8) CALENDAR_MONTH_ID                    NOT NULL NUMBER FISCAL_MONTH_DESC                    NOT NULL VARCHAR2(8) FISCAL_MONTH_ID                      NOT NULL NUMBER DAYS_IN_CAL_MONTH                    NOT NULL NUMBER DAYS_IN_FIS_MONTH                    NOT NULL NUMBER END_OF_CAL_MONTH                     NOT NULL DATE END_OF_FIS_MONTH                     NOT NULL DATE CALENDAR_MONTH_NAME                          NOT NULL VARCHAR2(9) FISCAL_MONTH_NAME                    NOT NULL VARCHAR2(9) CALENDAR_QUARTER_DESC                        NOT NULL CHAR(7) CALENDAR_QUARTER_ID                          NOT NULL NUMBER FISCAL_QUARTER_DESC                          NOT NULL CHAR(7) FISCAL_QUARTER_ID                    NOT NULL NUMBER DAYS_IN_CAL_QUARTER                          NOT NULL NUMBER DAYS_IN_FIS_QUARTER                          NOT NULL NUMBER END_OF_CAL_QUARTER                           NOT NULL DATE END_OF_FIS_QUARTER                           NOT NULL DATE CALENDAR_QUARTER_NUMBER              NOT NULL NUMBER(1) FISCAL_QUARTER_NUMBER                        NOT NULL NUMBER(1) CALENDAR_YEAR                                NOT NULL NUMBER(4) CALENDAR_YEAR_ID                     NOT NULL NUMBER FISCAL_YEAR                                  NOT NULL NUMBER(4) FISCAL_YEAR_ID                          NOT NULL NUMBER DAYS_IN_CAL_YEAR                     NOT NULL NUMBER DAYS_IN_FIS_YEAR                     NOT NULL NUMBER END_OF_CAL_YEAR                      NOT NULL DATE END_OF_FIS_YEAR                      NOT NULL DATE


如果我们创建一个物化视图

create materialized view sales_month_sum enable query rewrite as   SELECT t.calendar_month_id,         prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_idGROUP BY prod_id,         channel_id,         promo_id,         t.calendar_month_id;

如果我们做如下按月的分组查询

SQL> alter session set query_rewrite_enabled=true;SQL> alter session set query_rewrite_integrity=trusted;SQL> set autotrace traceonlySQL> set line 200SQL>   SELECT t.calendar_month_id,         prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_idGROUP BY prod_id,         channel_id,         promo_id,         t.calendar_month_id; 9068 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3287305789------------------------------------------------------------------------------------------------| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT       |                |  9068 |   690K|    13      (0)| 00:00:01 ||   1 |  MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13  (0)| 00:00:01 |------------------------------------------------------------------------------------------------

可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?

  SELECT t.calendar_quarter_id,prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_idGROUP BY prod_id,         channel_id,         promo_id,         t.calendar_quarter_id;

这个查看肯定是不能使用物化视图的,执行计划如下

Execution Plan----------------------------------------------------------Plan hash value: 3221963832---------------------------------------------------------------------------------------------------------| Id  | Operation                 | Name | Rows     | Bytes | Cost (%CPU)| Time    | Pstart| Pstop |---------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT        |      |  2037 | 79443 |   569   (6)| 00:00:07 |      |       ||   1 |  HASH GROUP BY                |      |  2037 | 79443 |   569   (6)| 00:00:07 |      |       ||*  2 |   HASH JOIN                |      |   918K|    34M|   546   (2)| 00:00:07 |    |       ||   3 |    PART JOIN FILTER CREATE    | :BF0000 |  1826 | 21912 |    18   (0)| 00:00:01 |    |       ||   4 |     TABLE ACCESS FULL          | TIMES        |  1826 | 21912 |    18   (0)| 00:00:01 | |       ||   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000||   6 |     TABLE ACCESS FULL          | SALES        |   918K|    23M|   525   (2)| 00:00:07 |:BF0000|:BF0000|---------------------------------------------------------------------------------------------------------


Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。

CREATE DIMENSION SH.TIMES_DIM  LEVEL DAY                            IS     (SH.TIMES.TIME_ID)  LEVEL MONTH                          IS     (SH.TIMES.CALENDAR_MONTH_ID)  LEVEL QUARTER                        IS     (SH.TIMES.CALENDAR_QUARTER_ID)  LEVEL YEAR                           IS     (SH.TIMES.CALENDAR_YEAR_ID)  HIERARCHY CAL_ROLLUP    (DAY                               CHILD OF     MONTH                             CHILD OF     QUARTER                           CHILD OF     YEAR);

LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。

我们再次查询

SQL> SELECT t.calendar_quarter_id,prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_idGROUP BY prod_id,         channel_id,          promo_id,         t.calendar_quarter_id;3375 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id  | Operation                  | Name            | Rows  | Bytes | Cost (%CPU)| Time        |--------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT         |              |    20 |  1720 |    36  (14)| 00:00:01 ||   1 |  HASH GROUP BY                 |              |    20 |  1720 |    36  (14)| 00:00:01 ||*  2 |   HASH JOIN                 |              |   128K|    10M|    33   (7)| 00:00:01 ||   3 |    VIEW                   |              |   849 |  6792 |    19   (6)| 00:00:01 ||   4 |     HASH UNIQUE         |              |   849 |  6792 |    19   (6)| 00:00:01 ||   5 |      TABLE ACCESS FULL             | TIMES           |  1826 | 14608 |    18   (0)| 00:00:01 ||   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |--------------------------------------------------------------------------------------------------

这次是使用物化视图与times表做关联,性能更高了。

我们对比如下两个查询

SQL>   SELECT t.calendar_quarter_id,         prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769GROUP BY prod_id,         channel_id,         promo_id,         t.calendar_quarter_id; 168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id  | Operation                  | Name            | Rows  | Bytes | Cost (%CPU)| Time        |--------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT         |              |     1 |    86 |    33   (7)| 00:00:01 ||   1 |  HASH GROUP BY                 |              |     1 |    86 |    33   (7)| 00:00:01 ||*  2 |   HASH JOIN                 |              |  6423 |   539K|    32   (4)| 00:00:01 ||   3 |    VIEW                   |              |    34 |   272 |    19   (6)| 00:00:01 ||   4 |     HASH UNIQUE         |              |    34 |   272 |    19   (6)| 00:00:01 ||*  5 |      TABLE ACCESS FULL          | TIMES           |    90 |   720 |    18   (0)| 00:00:01 ||   6 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |--------------------------------------------------------------------------------------------------

使用了物化视图


SQL>SELECT t.calendar_quarter_id,         prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id,         channel_id,         promo_id,         t.calendar_quarter_id;168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3221963832-----------------------------------------------------------------------------------------------------------------| Id  | Operation                 | Name | Rows     | Bytes |TempSpc| Cost (%CPU)| Time    | Pstart| Pstop |-----------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT        |      |  8146 |   373K| |   848   (2)| 00:00:11 |       |       ||   1 |  HASH GROUP BY                |      |  8146 |   373K|  3632K|   848   (2)| 00:00:11 |       |       ||*  2 |   HASH JOIN                |      | 57459 |  2637K|   |   546   (2)| 00:00:07 |       |       ||   3 |    PART JOIN FILTER CREATE    | :BF0000 |    91 |  1820 |  |    18   (0)| 00:00:01 |  |       ||*  4 |     TABLE ACCESS FULL       | TIMES        |    91 |  1820 |       |    18   (0)| 00:00:01 |  |       ||   5 |    PARTITION RANGE JOIN-FILTER|  |   918K|    23M|    |   525   (2)| 00:00:07 |:BF0000|:BF0000||   6 |     TABLE ACCESS FULL          | SALES        |   918K|    23M|    |   525   (2)| 00:00:07 |:BF0000|:BF0000|-----------------------------------------------------------------------------------------------------------------

没有使用物化视图。

其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。

但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。

我们在创建Dimension时,可以为LEVEL指定属性值。

如下

CREATE DIMENSION SH.TIMES_DIM  LEVEL DAY                            IS     (SH.TIMES.TIME_ID)  LEVEL MONTH                          IS     (SH.TIMES.CALENDAR_MONTH_ID)  LEVEL QUARTER                        IS     (SH.TIMES.CALENDAR_QUARTER_ID)  LEVEL YEAR                           IS     (SH.TIMES.CALENDAR_YEAR_ID)  HIERARCHY CAL_ROLLUP    (DAY                               CHILD OF     MONTH                             CHILD OF     QUARTER                           CHILD OF     YEAR)  ATTRIBUTE QUARTER DETERMINES     (SH.TIMES.CALENDAR_QUARTER_DESC,     SH.TIMES.DAYS_IN_CAL_QUARTER,     SH.TIMES.END_OF_CAL_QUARTER,     SH.TIMES.CALENDAR_QUARTER_NUMBER)  ATTRIBUTE YEAR DETERMINES     (SH.TIMES.CALENDAR_YEAR,     SH.TIMES.DAYS_IN_CAL_YEAR,     SH.TIMES.END_OF_CAL_YEAR);


我们再次查询

SQL> SELECT t.calendar_quarter_id,         prod_id,         channel_id,         promo_id,         SUM (quantity_sold) quantity_sold,         SUM (amount_sold) amount_sold    FROM sales s, times t   WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id,         channel_id,         promo_id,         t.calendar_quarter_id;  168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3290467316--------------------------------------------------------------------------------------------------| Id  | Operation                  | Name            | Rows  | Bytes | Cost (%CPU)| Time        |--------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT         |              |    20 |  2240 |    33   (7)| 00:00:01 ||   1 |  HASH GROUP BY                 |              |    20 |  2240 |    33   (7)| 00:00:01 ||*  2 |   HASH JOIN                 |              | 17191 |  1880K|    32   (4)| 00:00:01 ||   3 |    VIEW                   | VW_GBF_5        |    91 |  3094 |    19   (6)| 00:00:01 ||   4 |     HASH GROUP BY               |              |    91 |   728 |    19   (6)| 00:00:01 ||   5 |      VIEW                    |              |    91 |   728 |    19   (6)| 00:00:01 ||   6 |       HASH UNIQUE          |              |    91 |  1456 |    19   (6)| 00:00:01 ||*  7 |        TABLE ACCESS FULL       | TIMES          |    91 |  1456 |    18   (0)| 00:00:01 ||   8 |    MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM |  9068 |   690K|    13   (0)| 00:00:01 |--------------------------------------------------------------------------------------------------

这次就使用了物化视图。

0