hive on tez sql 优化
select t.id end_usr_id,
t.end_user_name end_usr_name,
t.end_user_real_name end_usr_nknm,
t.end_user_password end_usr_pswd,
t.end_user_birthday end_usr_bthd,
t.mobile mobl_num,
t.phone phone_num,
t.valid_mobile_phone_num bindg_mobl_num,
t.end_user_email end_usr_email,
substr(t.end_user_email,
instr(t.end_user_email, '@'),
instr(t.end_user_email, '.') - instr(t.end_user_email, '@')) email_class,
t.end_user_create_time creat_time,
t.end_user_last_login_date end_usr_last_login_date,
t.end_user_last_bought_date end_usr_last_buy_date,
t.end_user_login_times end_usr_login_times,
t.end_user_bought_times end_usr_buy_times,
t.end_user_bought_amount end_usr_buy_amt,
t.end_user_type end_usr_type,
t.ip ip,
t.end_user_points end_usr_intgl,
t.id_card id_card,
t.end_user_sex sex,
t.mc_site_id mc_site_id,
case
when t.is_email_activate = 1 then
1
else
0
end as email_verfd_flg,
case
when t.valid_mobile_phone_num is not null then
1
else
0
end as mobl_bding_flg,
case
when t.end_user_name like '%@sina%' then
1
when t.end_user_name like '%@pingan%' then
2
when t.end_user_name like '%@alipay%' and
not (t2.user_grade in (1, 2) and t2.user_grade_type = 1) then
3
when t.end_user_name like '%@alipay%' and t2.user_grade in (1, 2) and
t2.user_grade_type = 1 then
4
when t.end_user_name like '%@163%' then
5
when t.end_user_name like '%@kaixin001%' then
6
when t.end_user_name like '%@139%' then
7
when t.end_user_name like '%@msn.com%' then
8
when t.end_user_name like '%@anyue%' then
9
when t.end_user_name like '%@qq%' then
10
else
0
end as commn_login_type,
case
when t.end_user_name like '%@b2b%' or t.end_user_name = 'yuxiaolan' then
1
else
0
end as b2b_flg,
case
when t3.send_to is not null then
1
else
0
end as email_ret_flg,
case
when t4.email is not null then
1
else
0
end as email_blacklist_flg,
case
when t5.car = '1' then
1
else
0
end as car_flg,
t5.shopping_habit shppg_prefr,
t6.mail_level email_lvl,
'' whsl_flg,
'' xh_flg,
'' exptn_flg,
'' b2b_vip_flg,
t.member_grade end_usr_grd_id,
case
when t.co_code is null then
null
when t.co_code is not null and t7.usr_union_logon_id is null then
-999999
else
t7.usr_union_logon_id
end usr_union_logon_id,
t.co_code usr_union_logon_code,
t8.exp end_usr_grow_val
from tandem.end_user t
left outer join tandem.alipay_user t2
on t.id = t2.end_user_id
left outer join (select distinct lower(send_to) as send_to
from tandem.yhd_send_message_blacklist) t3
on lower(t.end_user_email) = t3.send_to
left outer join (select distinct lower(email) as email
from tandem.edm_error_email) t4
on lower(t.end_user_email) = t4.email
left outer join tmp_end_usr_info t5
on t.id = t5.end_usr_id
left outer join tandem.crm_user_quality t6
on t.end_user_email = t6.email
and t6.user_id is not null
left outer join dw.usr_union_logon t7
on t.co_code = t7.usr_union_logon_code
and t7.usr_union_logon_code is not null
left outer join tandem.member t8
on t.id = t8.end_user_id
and to_date(t8.update_time) = '2015-05-04'
where t.update_time >= '2015-05-04'
or length(t8.end_user_id)>0;
运行速度慢
执行计划如下:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)
Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 6 <- Map 5 (SIMPLE_EDGE)
Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)
DagName: zhoushugang_20150505143131_1821ced5-3723-4094-b6a3-c2aae136cd32:2
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: yhd_send_message_blacklist
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: send_to (type: string)
outputColumnNames: send_to
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: lower(send_to) (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Map 10
Map Operator Tree:
TableScan
alias: t5
Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_usr_id (type: bigint)
sort order: +
Map-reduce partition columns: end_usr_id (type: bigint)
Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE
value expressions: car (type: string), shopping_habit (type: string)
Map 11
Map Operator Tree:
TableScan
alias: t6
Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: user_id is not null (type: boolean)
Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: email (type: string)
sort order: +
Map-reduce partition columns: email (type: string)
Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE
value expressions: mail_level (type: double)
Map 12
Map Operator Tree:
TableScan
alias: t7
Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: usr_union_logon_code is not null (type: boolean)
Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: usr_union_logon_code (type: string)
sort order: +
Map-reduce partition columns: usr_union_logon_code (type: string)
Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE
value expressions: usr_union_logon_id (type: bigint)
Map 13
Map Operator Tree:
TableScan
alias: t8
Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (to_date(update_time) = '2015-05-04') (type: boolean)
Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_user_id (type: bigint)
sort order: +
Map-reduce partition columns: end_user_id (type: bigint)
Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE
value expressions: exp (type: int)
Map 5
Map Operator Tree:
TableScan
alias: edm_error_email
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: email (type: string)
outputColumnNames: email
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: lower(email) (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Map 7
Map Operator Tree:
TableScan
alias: t
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: id (type: bigint)
sort order: +
Map-reduce partition columns: id (type: bigint)
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE
value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), update_time (type: string), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)
Map 9
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_user_id (type: bigint)
sort order: +
Map-reduce partition columns: end_user_id (type: bigint)
Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE
value expressions: user_grade (type: double), user_grade_type (type: double)
Reducer 2
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE
Reducer 3
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
keys:
0 lower(_col107) (type: string)
1 _col0 (type: string)
2 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146
Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col107 (type: string)
sort order: +
Map-reduce partition columns: _col107 (type: string)
Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int), _col145 (type: string), _col146 (type: string)
Reducer 4
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col107 (type: string)
1 email (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157
Statistics: Num rows: 867752490 Data size: 396910364421 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col72 (type: string)
1 usr_union_logon_code (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157, _col163
input vertices:
1 Map 12
Statistics: Num rows: 954527759 Data size: 436601410326 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: ((_col84 >= '2015-05-04') or (length(_col131) > 0)) (type: boolean)
Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107'@'), (instr(_col107'.') - instr(_col107'@'))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like '%@sina%')) THEN (1) WHEN ((_col1 like '%@pingan%')) THEN (2) WHEN (((_col1 like '%@alipay%') and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like '%@alipay%') and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like '%@163%')) THEN (5) WHEN ((_col1 like '%@kaixin001%')) THEN (6) WHEN ((_col1 like '%@139%')) THEN (7) WHEN ((_col1 like '%@msn.com%')) THEN (8) WHEN ((_col1 like '%@anyue%')) THEN (9) WHEN ((_col1 like '%@qq%')) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like '%@b2b%') or (_col1 = 'yuxiaolan'))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = '1')) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), '' (type: string), '' (type: string), '' (type: string), '' (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 6
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE
Reducer 8
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
Left Outer Join0 to 3
keys:
0 id (type: bigint)
1 end_user_id (type: bigint)
2 end_usr_id (type: bigint)
3 end_user_id (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133
Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: lower(_col107) (type: string)
sort order: +
Map-reduce partition columns: lower(_col107) (type: string)
Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Tez 合并了多个join给一个reduce,避免了启动过多的job(见粗体部分)已经有所优化,但是为啥此sql执行慢呢?
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 10 ......... SUCCEEDED 3 3 0 0 0 0
Map 11 ......... SUCCEEDED 151 151 0 0 0 0
Map 12 ......... SUCCEEDED 1 1 0 0 0 0
Map 13 ......... SUCCEEDED 76 76 0 0 0 0
Map 5 .......... SUCCEEDED 11 11 0 0 0 0
Map 7 .......... SUCCEEDED 156 156 0 0 0 0
Map 9 .......... SUCCEEDED 10 10 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ..... RUNNING 642 641 1 0 0 0
Reducer 4 RUNNING 1009 0 89 920 0 0
Reducer 6 ...... SUCCEEDED 3 3 0 0 0 0
Reducer 8 ...... SUCCEEDED 203 203 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 11/13 [==============>>------------] 55% ELAPSED TIME: 307.54 s
Reduce 3 中有1个任务始终RUNNING
查看SQL中最后的where 条件发现
where t.update_time>= '2015-05-04' or t8.end_user_id>0; 导致无法提前过滤掉t.update_time< '2015-05-04'的数据,导致后续join产生的reduce数据过大(见上面执行计划蓝色部分),可能产生了数据倾斜,导致任务时长拉长。
去掉or t8.end_user_id>0的时候,end_user TS返回36219736 行
Map 7
MapOperator Tree:
TableScan
alias: t
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE
Filter Operator
predicate: (update_time >='2015-05-04') (type: boolean)
Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE
Reduce Output Operator
key expressions: id (type: bigint)
sort order: +
Map-reduce partition columns: id (type: bigint)
Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE
value expressions: end_user_name (type: string),
没有去掉or t8.end_user_id>0的时候,end_user TS返回108659208行
Map 7
MapOperator Tree:
TableScan
alias: t
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE
Reduce Output Operator
key expressions: id (type: bigint)
sort order: +
Map-reduce partition columns: id (type: bigint)
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE
value expressions: end_user_name (type: string),end_user_password (type: string), end_user_real_name (type: string),end_user_birthday (type: string), end_user_last_login_date (type: string),end_user_last_bought_date (type: string), end_user_login_times (type: double),end_user_bought_amount (type: double), end_user_bought_times (type: double),end_user_sex (type: double), end_user_create_time (type: string), end_user_type(type: double), ip (type: string), end_user_points (type: double), co_code(type: string), is_email_activate (type: double), mc_site_id (type: bigint),update_time (type: string), member_grade (type: double), end_user_email (type:string), mobile (type: string), phone (type: string), valid_mobile_phone_num(type: int), id_card (type: string)
业务部门说可以去掉or t8.end_user_id>0 这样就大大减少了 reduce数据的结果集。
最后附上去掉or t8.end_user_id>0的执行计划:
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)
Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 6 <- Map 5 (SIMPLE_EDGE)
Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)
DagName: zhoushugang_20150505143030_fda7333f-2147-4a65-a63f-7c08c8a3630b:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: yhd_send_message_blacklist
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: send_to (type: string)
outputColumnNames: send_to
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: lower(send_to) (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE
Map 10
Map Operator Tree:
TableScan
alias: t5
Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_usr_id (type: bigint)
sort order: +
Map-reduce partition columns: end_usr_id (type: bigint)
Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE
value expressions: car (type: string), shopping_habit (type: string)
Map 11
Map Operator Tree:
TableScan
alias: t6
Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: user_id is not null (type: boolean)
Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: email (type: string)
sort order: +
Map-reduce partition columns: email (type: string)
Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE
value expressions: mail_level (type: double)
Map 12
Map Operator Tree:
TableScan
alias: t7
Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: usr_union_logon_code is not null (type: boolean)
Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: usr_union_logon_code (type: string)
sort order: +
Map-reduce partition columns: usr_union_logon_code (type: string)
Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE
value expressions: usr_union_logon_id (type: bigint)
Map 13
Map Operator Tree:
TableScan
alias: t8
Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (to_date(update_time) = '2015-05-04') (type: boolean)
Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_user_id (type: bigint)
sort order: +
Map-reduce partition columns: end_user_id (type: bigint)
Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE
value expressions: exp (type: int)
Map 5
Map Operator Tree:
TableScan
alias: edm_error_email
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: email (type: string)
outputColumnNames: email
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Group By Operator
keys: lower(email) (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE
Map 7
Map Operator Tree:
TableScan
alias: t
Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (update_time >= '2015-05-04') (type: boolean)
Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: id (type: bigint)
sort order: +
Map-reduce partition columns: id (type: bigint)
Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE
value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)
Map 9
Map Operator Tree:
TableScan
alias: t2
Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: end_user_id (type: bigint)
sort order: +
Map-reduce partition columns: end_user_id (type: bigint)
Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE
value expressions: user_grade (type: double), user_grade_type (type: double)
Reducer 2
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE
Reducer 3
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
keys:
0 lower(_col107) (type: string)
1 _col0 (type: string)
2 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146
Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col107 (type: string)
sort order: +
Map-reduce partition columns: _col107 (type: string)
Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int), _col145 (type: string), _col146 (type: string)
Reducer 4
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col107 (type: string)
1 email (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157
Statistics: Num rows: 289250828 Data size: 132303454799 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col72 (type: string)
1 usr_union_logon_code (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157, _col163
input vertices:
1 Map 12
Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107'@'), (instr(_col107'.') - instr(_col107'@'))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like '%@sina%')) THEN (1) WHEN ((_col1 like '%@pingan%')) THEN (2) WHEN (((_col1 like '%@alipay%') and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like '%@alipay%') and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like '%@163%')) THEN (5) WHEN ((_col1 like '%@kaixin001%')) THEN (6) WHEN ((_col1 like '%@139%')) THEN (7) WHEN ((_col1 like '%@msn.com%')) THEN (8) WHEN ((_col1 like '%@anyue%')) THEN (9) WHEN ((_col1 like '%@qq%')) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like '%@b2b%') or (_col1 = 'yuxiaolan'))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = '1')) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), '' (type: string), '' (type: string), '' (type: string), '' (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 6
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE
Reducer 8
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
Left Outer Join0 to 3
keys:
0 id (type: bigint)
1 end_user_id (type: bigint)
2 end_usr_id (type: bigint)
3 end_user_id (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133
Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: lower(_col107) (type: string)
sort order: +
Map-reduce partition columns: lower(_col107) (type: string)
Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink