Oracle 学习之 数据仓库(二) Dimension 的理解
发表于:2025-02-01 作者:千家信息网编辑
千家信息网最后更新 2025年02月01日,在数据仓库中,有事实表、维度表两个概念。事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。维度表是维度属性的集合。是分析问
千家信息网最后更新 2025年02月01日Oracle 学习之 数据仓库(二) Dimension 的理解
在数据仓库中,有事实表、维度表两个概念。
事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。
维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。
如图示
我们以sh用户下的sales表和times表来看,
SALES为事实表
SQL> desc sales Name Null? Type ----------------------------------------- -------- ---------------------------- PROD_ID NOT NULL NUMBER CUST_ID NOT NULL NUMBER TIME_ID NOT NULL DATE CHANNEL_ID NOT NULL NUMBER PROMO_ID NOT NULL NUMBER QUANTITY_SOLD NOT NULL NUMBER(10,2) AMOUNT_SOLD NOT NULL NUMBER(10,2)
TIMES为维度表
SQL> desc times Name Null? Type ----------------------------------------- -------- ---------------------------- TIME_ID NOT NULL DATE DAY_NAME NOT NULL VARCHAR2(9) DAY_NUMBER_IN_WEEK NOT NULL NUMBER(1) DAY_NUMBER_IN_MONTH NOT NULL NUMBER(2) CALENDAR_WEEK_NUMBER NOT NULL NUMBER(2) FISCAL_WEEK_NUMBER NOT NULL NUMBER(2) WEEK_ENDING_DAY NOT NULL DATE WEEK_ENDING_DAY_ID NOT NULL NUMBER CALENDAR_MONTH_NUMBER NOT NULL NUMBER(2) FISCAL_MONTH_NUMBER NOT NULL NUMBER(2) CALENDAR_MONTH_DESC NOT NULL VARCHAR2(8) CALENDAR_MONTH_ID NOT NULL NUMBER FISCAL_MONTH_DESC NOT NULL VARCHAR2(8) FISCAL_MONTH_ID NOT NULL NUMBER DAYS_IN_CAL_MONTH NOT NULL NUMBER DAYS_IN_FIS_MONTH NOT NULL NUMBER END_OF_CAL_MONTH NOT NULL DATE END_OF_FIS_MONTH NOT NULL DATE CALENDAR_MONTH_NAME NOT NULL VARCHAR2(9) FISCAL_MONTH_NAME NOT NULL VARCHAR2(9) CALENDAR_QUARTER_DESC NOT NULL CHAR(7) CALENDAR_QUARTER_ID NOT NULL NUMBER FISCAL_QUARTER_DESC NOT NULL CHAR(7) FISCAL_QUARTER_ID NOT NULL NUMBER DAYS_IN_CAL_QUARTER NOT NULL NUMBER DAYS_IN_FIS_QUARTER NOT NULL NUMBER END_OF_CAL_QUARTER NOT NULL DATE END_OF_FIS_QUARTER NOT NULL DATE CALENDAR_QUARTER_NUMBER NOT NULL NUMBER(1) FISCAL_QUARTER_NUMBER NOT NULL NUMBER(1) CALENDAR_YEAR NOT NULL NUMBER(4) CALENDAR_YEAR_ID NOT NULL NUMBER FISCAL_YEAR NOT NULL NUMBER(4) FISCAL_YEAR_ID NOT NULL NUMBER DAYS_IN_CAL_YEAR NOT NULL NUMBER DAYS_IN_FIS_YEAR NOT NULL NUMBER END_OF_CAL_YEAR NOT NULL DATE END_OF_FIS_YEAR NOT NULL DATE
如果我们创建一个物化视图
create materialized view sales_month_sum enable query rewrite as SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_month_id;
如果我们做如下按月的分组查询
SQL> alter session set query_rewrite_enabled=true;SQL> alter session set query_rewrite_integrity=trusted;SQL> set autotrace traceonlySQL> set line 200SQL> SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_month_id; 9068 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3287305789------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 9068 | 690K| 13 (0)| 00:00:01 || 1 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |------------------------------------------------------------------------------------------------
可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?
SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;
这个查看肯定是不能使用物化视图的,执行计划如下
Execution Plan----------------------------------------------------------Plan hash value: 3221963832---------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |---------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 2037 | 79443 | 569 (6)| 00:00:07 | | || 1 | HASH GROUP BY | | 2037 | 79443 | 569 (6)| 00:00:07 | | ||* 2 | HASH JOIN | | 918K| 34M| 546 (2)| 00:00:07 | | || 3 | PART JOIN FILTER CREATE | :BF0000 | 1826 | 21912 | 18 (0)| 00:00:01 | | || 4 | TABLE ACCESS FULL | TIMES | 1826 | 21912 | 18 (0)| 00:00:01 | | || 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|---------------------------------------------------------------------------------------------------------
Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR);
LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。
我们再次查询
SQL> SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;3375 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20 | 1720 | 36 (14)| 00:00:01 || 1 | HASH GROUP BY | | 20 | 1720 | 36 (14)| 00:00:01 ||* 2 | HASH JOIN | | 128K| 10M| 33 (7)| 00:00:01 || 3 | VIEW | | 849 | 6792 | 19 (6)| 00:00:01 || 4 | HASH UNIQUE | | 849 | 6792 | 19 (6)| 00:00:01 || 5 | TABLE ACCESS FULL | TIMES | 1826 | 14608 | 18 (0)| 00:00:01 || 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
这次是使用物化视图与times表做关联,性能更高了。
我们对比如下两个查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 86 | 33 (7)| 00:00:01 || 1 | HASH GROUP BY | | 1 | 86 | 33 (7)| 00:00:01 ||* 2 | HASH JOIN | | 6423 | 539K| 32 (4)| 00:00:01 || 3 | VIEW | | 34 | 272 | 19 (6)| 00:00:01 || 4 | HASH UNIQUE | | 34 | 272 | 19 (6)| 00:00:01 ||* 5 | TABLE ACCESS FULL | TIMES | 90 | 720 | 18 (0)| 00:00:01 || 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
使用了物化视图
SQL>SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3221963832-----------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |-----------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 8146 | 373K| | 848 (2)| 00:00:11 | | || 1 | HASH GROUP BY | | 8146 | 373K| 3632K| 848 (2)| 00:00:11 | | ||* 2 | HASH JOIN | | 57459 | 2637K| | 546 (2)| 00:00:07 | | || 3 | PART JOIN FILTER CREATE | :BF0000 | 91 | 1820 | | 18 (0)| 00:00:01 | | ||* 4 | TABLE ACCESS FULL | TIMES | 91 | 1820 | | 18 (0)| 00:00:01 | | || 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|-----------------------------------------------------------------------------------------------------------------
没有使用物化视图。
其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。
但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。
我们在创建Dimension时,可以为LEVEL指定属性值。
如下
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR) ATTRIBUTE QUARTER DETERMINES (SH.TIMES.CALENDAR_QUARTER_DESC, SH.TIMES.DAYS_IN_CAL_QUARTER, SH.TIMES.END_OF_CAL_QUARTER, SH.TIMES.CALENDAR_QUARTER_NUMBER) ATTRIBUTE YEAR DETERMINES (SH.TIMES.CALENDAR_YEAR, SH.TIMES.DAYS_IN_CAL_YEAR, SH.TIMES.END_OF_CAL_YEAR);
我们再次查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3290467316--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20 | 2240 | 33 (7)| 00:00:01 || 1 | HASH GROUP BY | | 20 | 2240 | 33 (7)| 00:00:01 ||* 2 | HASH JOIN | | 17191 | 1880K| 32 (4)| 00:00:01 || 3 | VIEW | VW_GBF_5 | 91 | 3094 | 19 (6)| 00:00:01 || 4 | HASH GROUP BY | | 91 | 728 | 19 (6)| 00:00:01 || 5 | VIEW | | 91 | 728 | 19 (6)| 00:00:01 || 6 | HASH UNIQUE | | 91 | 1456 | 19 (6)| 00:00:01 ||* 7 | TABLE ACCESS FULL | TIMES | 91 | 1456 | 18 (0)| 00:00:01 || 8 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
这次就使用了物化视图。
维度
数据
视图
查询
事实
属性
层级
仓库
两个
再次
概念
问题
分组
相同
业务
中央
中表
之间
事件
产品
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
软件开发要优化什么
负责网络技术开发的岗位
服务器主板和普通主板的区别
互联网金融科技汇
网络安全等保测评分几级
国外服务器商
专业警察网络安全
软件开发的三个主要阶段
jsp往数据库添加数据库
网络安全好还是程序猿好
软件开发商员工待遇
义乌鹏睿网络技术有限公司
国家网络安全宣传周好的做法
记事本软件开发
西安蓉强网络技术有限公司
安徽服务器电源生产商
网络技术的发展及应用
复制数据库的结构
技术运营数据库题
数据库比较子查询
网络安全维护班会总结
学院网络安全责任制细则
厦门煌隆 软件开发
上海安装软件开发解决方案
建筑能耗控制全链条软件开发
app安全加固服务器
大学网络安全项目有哪些
哪里能接到一些软件开发外单
服务器内存与普通内存
复制数据库的结构