Oracle 学习之 数据仓库(二) Dimension 的理解
发表于:2024-10-21 作者:千家信息网编辑
千家信息网最后更新 2024年10月21日,在数据仓库中,有事实表、维度表两个概念。事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。维度表是维度属性的集合。是分析问
千家信息网最后更新 2024年10月21日Oracle 学习之 数据仓库(二) Dimension 的理解
在数据仓库中,有事实表、维度表两个概念。
事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。
维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。
如图示
我们以sh用户下的sales表和times表来看,
SALES为事实表
SQL> desc sales Name Null? Type ----------------------------------------- -------- ---------------------------- PROD_ID NOT NULL NUMBER CUST_ID NOT NULL NUMBER TIME_ID NOT NULL DATE CHANNEL_ID NOT NULL NUMBER PROMO_ID NOT NULL NUMBER QUANTITY_SOLD NOT NULL NUMBER(10,2) AMOUNT_SOLD NOT NULL NUMBER(10,2)
TIMES为维度表
SQL> desc times Name Null? Type ----------------------------------------- -------- ---------------------------- TIME_ID NOT NULL DATE DAY_NAME NOT NULL VARCHAR2(9) DAY_NUMBER_IN_WEEK NOT NULL NUMBER(1) DAY_NUMBER_IN_MONTH NOT NULL NUMBER(2) CALENDAR_WEEK_NUMBER NOT NULL NUMBER(2) FISCAL_WEEK_NUMBER NOT NULL NUMBER(2) WEEK_ENDING_DAY NOT NULL DATE WEEK_ENDING_DAY_ID NOT NULL NUMBER CALENDAR_MONTH_NUMBER NOT NULL NUMBER(2) FISCAL_MONTH_NUMBER NOT NULL NUMBER(2) CALENDAR_MONTH_DESC NOT NULL VARCHAR2(8) CALENDAR_MONTH_ID NOT NULL NUMBER FISCAL_MONTH_DESC NOT NULL VARCHAR2(8) FISCAL_MONTH_ID NOT NULL NUMBER DAYS_IN_CAL_MONTH NOT NULL NUMBER DAYS_IN_FIS_MONTH NOT NULL NUMBER END_OF_CAL_MONTH NOT NULL DATE END_OF_FIS_MONTH NOT NULL DATE CALENDAR_MONTH_NAME NOT NULL VARCHAR2(9) FISCAL_MONTH_NAME NOT NULL VARCHAR2(9) CALENDAR_QUARTER_DESC NOT NULL CHAR(7) CALENDAR_QUARTER_ID NOT NULL NUMBER FISCAL_QUARTER_DESC NOT NULL CHAR(7) FISCAL_QUARTER_ID NOT NULL NUMBER DAYS_IN_CAL_QUARTER NOT NULL NUMBER DAYS_IN_FIS_QUARTER NOT NULL NUMBER END_OF_CAL_QUARTER NOT NULL DATE END_OF_FIS_QUARTER NOT NULL DATE CALENDAR_QUARTER_NUMBER NOT NULL NUMBER(1) FISCAL_QUARTER_NUMBER NOT NULL NUMBER(1) CALENDAR_YEAR NOT NULL NUMBER(4) CALENDAR_YEAR_ID NOT NULL NUMBER FISCAL_YEAR NOT NULL NUMBER(4) FISCAL_YEAR_ID NOT NULL NUMBER DAYS_IN_CAL_YEAR NOT NULL NUMBER DAYS_IN_FIS_YEAR NOT NULL NUMBER END_OF_CAL_YEAR NOT NULL DATE END_OF_FIS_YEAR NOT NULL DATE
如果我们创建一个物化视图
create materialized view sales_month_sum enable query rewrite as SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_month_id;
如果我们做如下按月的分组查询
SQL> alter session set query_rewrite_enabled=true;SQL> alter session set query_rewrite_integrity=trusted;SQL> set autotrace traceonlySQL> set line 200SQL> SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_month_id; 9068 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3287305789------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 9068 | 690K| 13 (0)| 00:00:01 || 1 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |------------------------------------------------------------------------------------------------
可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?
SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;
这个查看肯定是不能使用物化视图的,执行计划如下
Execution Plan----------------------------------------------------------Plan hash value: 3221963832---------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |---------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 2037 | 79443 | 569 (6)| 00:00:07 | | || 1 | HASH GROUP BY | | 2037 | 79443 | 569 (6)| 00:00:07 | | ||* 2 | HASH JOIN | | 918K| 34M| 546 (2)| 00:00:07 | | || 3 | PART JOIN FILTER CREATE | :BF0000 | 1826 | 21912 | 18 (0)| 00:00:01 | | || 4 | TABLE ACCESS FULL | TIMES | 1826 | 21912 | 18 (0)| 00:00:01 | | || 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|---------------------------------------------------------------------------------------------------------
Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR);
LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。
我们再次查询
SQL> SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_idGROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;3375 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20 | 1720 | 36 (14)| 00:00:01 || 1 | HASH GROUP BY | | 20 | 1720 | 36 (14)| 00:00:01 ||* 2 | HASH JOIN | | 128K| 10M| 33 (7)| 00:00:01 || 3 | VIEW | | 849 | 6792 | 19 (6)| 00:00:01 || 4 | HASH UNIQUE | | 849 | 6792 | 19 (6)| 00:00:01 || 5 | TABLE ACCESS FULL | TIMES | 1826 | 14608 | 18 (0)| 00:00:01 || 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
这次是使用物化视图与times表做关联,性能更高了。
我们对比如下两个查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3397140165--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 86 | 33 (7)| 00:00:01 || 1 | HASH GROUP BY | | 1 | 86 | 33 (7)| 00:00:01 ||* 2 | HASH JOIN | | 6423 | 539K| 32 (4)| 00:00:01 || 3 | VIEW | | 34 | 272 | 19 (6)| 00:00:01 || 4 | HASH UNIQUE | | 34 | 272 | 19 (6)| 00:00:01 ||* 5 | TABLE ACCESS FULL | TIMES | 90 | 720 | 18 (0)| 00:00:01 || 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
使用了物化视图
SQL>SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3221963832-----------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |-----------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 8146 | 373K| | 848 (2)| 00:00:11 | | || 1 | HASH GROUP BY | | 8146 | 373K| 3632K| 848 (2)| 00:00:11 | | ||* 2 | HASH JOIN | | 57459 | 2637K| | 546 (2)| 00:00:07 | | || 3 | PART JOIN FILTER CREATE | :BF0000 | 91 | 1820 | | 18 (0)| 00:00:01 | | ||* 4 | TABLE ACCESS FULL | TIMES | 91 | 1820 | | 18 (0)| 00:00:01 | | || 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|-----------------------------------------------------------------------------------------------------------------
没有使用物化视图。
其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。
但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。
我们在创建Dimension时,可以为LEVEL指定属性值。
如下
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR) ATTRIBUTE QUARTER DETERMINES (SH.TIMES.CALENDAR_QUARTER_DESC, SH.TIMES.DAYS_IN_CAL_QUARTER, SH.TIMES.END_OF_CAL_QUARTER, SH.TIMES.CALENDAR_QUARTER_NUMBER) ATTRIBUTE YEAR DETERMINES (SH.TIMES.CALENDAR_YEAR, SH.TIMES.DAYS_IN_CAL_YEAR, SH.TIMES.END_OF_CAL_YEAR);
我们再次查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected.Execution Plan----------------------------------------------------------Plan hash value: 3290467316--------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20 | 2240 | 33 (7)| 00:00:01 || 1 | HASH GROUP BY | | 20 | 2240 | 33 (7)| 00:00:01 ||* 2 | HASH JOIN | | 17191 | 1880K| 32 (4)| 00:00:01 || 3 | VIEW | VW_GBF_5 | 91 | 3094 | 19 (6)| 00:00:01 || 4 | HASH GROUP BY | | 91 | 728 | 19 (6)| 00:00:01 || 5 | VIEW | | 91 | 728 | 19 (6)| 00:00:01 || 6 | HASH UNIQUE | | 91 | 1456 | 19 (6)| 00:00:01 ||* 7 | TABLE ACCESS FULL | TIMES | 91 | 1456 | 18 (0)| 00:00:01 || 8 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |--------------------------------------------------------------------------------------------------
这次就使用了物化视图。
维度
数据
视图
查询
事实
属性
层级
仓库
两个
再次
概念
问题
分组
相同
业务
中央
中表
之间
事件
产品
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
数据库 主主 同步 冲突
网络安全法有哪些作用
数据库设置存储中文
SQL数据库的查找与筛选
php 数据库 查看
安卓软件开发工程师考试
加强网络安全防范的主题
中山大学密码学与网络安全试卷
软件开发培训照片
广州华为网络技术公司
科技巨头 网络安全
互联网科技公司a股
生物医学数据库注册
安卓手机网易账号服务器设置
西安新城区软件开发
软件开发工程师 招聘
未转变者进服务器弹出
四川省道网数据库什么时候开工
oracle数据库收费么
软件开发研发费用资本化时点
海陵区个性化网络技术哪家好
阿里云服务器 按量计费
网络安全法条例 题目
北京erp软件开发报价
分布式数据库技术大全
太仓专业网络技术
数据库英文
软件开发成本怎样衡量
京东服务器为什么这么火
app系统软件开发服务多少钱