千家信息网

hive on tez sql 优化

发表于:2024-10-04 作者:千家信息网编辑
千家信息网最后更新 2024年10月04日,select t.id end_usr_id,t.end_user_name end_usr_name,t.end_user_real_name end_usr_nknm,t.end_user_pas
千家信息网最后更新 2024年10月04日hive on tez sql 优化

select t.id end_usr_id,

t.end_user_name end_usr_name,

t.end_user_real_name end_usr_nknm,

t.end_user_password end_usr_pswd,

t.end_user_birthday end_usr_bthd,

t.mobile mobl_num,

t.phone phone_num,

t.valid_mobile_phone_num bindg_mobl_num,

t.end_user_email end_usr_email,

substr(t.end_user_email,

instr(t.end_user_email, '@'),

instr(t.end_user_email, '.') - instr(t.end_user_email, '@')) email_class,

t.end_user_create_time creat_time,

t.end_user_last_login_date end_usr_last_login_date,

t.end_user_last_bought_date end_usr_last_buy_date,

t.end_user_login_times end_usr_login_times,

t.end_user_bought_times end_usr_buy_times,

t.end_user_bought_amount end_usr_buy_amt,

t.end_user_type end_usr_type,

t.ip ip,

t.end_user_points end_usr_intgl,

t.id_card id_card,

t.end_user_sex sex,

t.mc_site_id mc_site_id,

case

when t.is_email_activate = 1 then

1

else

0

end as email_verfd_flg,

case

when t.valid_mobile_phone_num is not null then

1

else

0

end as mobl_bding_flg,

case

when t.end_user_name like '%@sina%' then

1

when t.end_user_name like '%@pingan%' then

2

when t.end_user_name like '%@alipay%' and

not (t2.user_grade in (1, 2) and t2.user_grade_type = 1) then

3

when t.end_user_name like '%@alipay%' and t2.user_grade in (1, 2) and

t2.user_grade_type = 1 then

4

when t.end_user_name like '%@163%' then

5

when t.end_user_name like '%@kaixin001%' then

6

when t.end_user_name like '%@139%' then

7

when t.end_user_name like '%@msn.com%' then

8

when t.end_user_name like '%@anyue%' then

9

when t.end_user_name like '%@qq%' then

10

else

0

end as commn_login_type,

case

when t.end_user_name like '%@b2b%' or t.end_user_name = 'yuxiaolan' then

1

else

0

end as b2b_flg,

case

when t3.send_to is not null then

1

else

0

end as email_ret_flg,

case

when t4.email is not null then

1

else

0

end as email_blacklist_flg,

case

when t5.car = '1' then

1

else

0

end as car_flg,

t5.shopping_habit shppg_prefr,

t6.mail_level email_lvl,

'' whsl_flg,

'' xh_flg,

'' exptn_flg,

'' b2b_vip_flg,

t.member_grade end_usr_grd_id,

case

when t.co_code is null then

null

when t.co_code is not null and t7.usr_union_logon_id is null then

-999999

else

t7.usr_union_logon_id

end usr_union_logon_id,

t.co_code usr_union_logon_code,

t8.exp end_usr_grow_val

from tandem.end_user t

left outer join tandem.alipay_user t2

on t.id = t2.end_user_id

left outer join (select distinct lower(send_to) as send_to

from tandem.yhd_send_message_blacklist) t3

on lower(t.end_user_email) = t3.send_to

left outer join (select distinct lower(email) as email

from tandem.edm_error_email) t4

on lower(t.end_user_email) = t4.email

left outer join tmp_end_usr_info t5

on t.id = t5.end_usr_id

left outer join tandem.crm_user_quality t6

on t.end_user_email = t6.email

and t6.user_id is not null

left outer join dw.usr_union_logon t7

on t.co_code = t7.usr_union_logon_code

and t7.usr_union_logon_code is not null

left outer join tandem.member t8

on t.id = t8.end_user_id

and to_date(t8.update_time) = '2015-05-04'

where t.update_time >= '2015-05-04'

or length(t8.end_user_id)>0;


运行速度慢

执行计划如下:

STAGE DEPENDENCIES:

Stage-1 is a root stage

Stage-0 depends on stages: Stage-1


STAGE PLANS:

Stage: Stage-1

Tez

Edges:

Reducer 2 <- Map 1 (SIMPLE_EDGE)

Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)

Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)

Reducer 6 <- Map 5 (SIMPLE_EDGE)

Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)

DagName: zhoushugang_20150505143131_1821ced5-3723-4094-b6a3-c2aae136cd32:2

Vertices:

Map 1

Map Operator Tree:

TableScan

alias: yhd_send_message_blacklist

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: send_to (type: string)

outputColumnNames: send_to

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Group By Operator

keys: lower(send_to) (type: string)

mode: hash

outputColumnNames: _col0

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Map 10

Map Operator Tree:

TableScan

alias: t5

Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_usr_id (type: bigint)

sort order: +

Map-reduce partition columns: end_usr_id (type: bigint)

Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

value expressions: car (type: string), shopping_habit (type: string)

Map 11

Map Operator Tree:

TableScan

alias: t6

Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: user_id is not null (type: boolean)

Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: email (type: string)

sort order: +

Map-reduce partition columns: email (type: string)

Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

value expressions: mail_level (type: double)

Map 12

Map Operator Tree:

TableScan

alias: t7

Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: usr_union_logon_code is not null (type: boolean)

Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: usr_union_logon_code (type: string)

sort order: +

Map-reduce partition columns: usr_union_logon_code (type: string)

Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

value expressions: usr_union_logon_id (type: bigint)

Map 13

Map Operator Tree:

TableScan

alias: t8

Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: (to_date(update_time) = '2015-05-04') (type: boolean)

Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_user_id (type: bigint)

sort order: +

Map-reduce partition columns: end_user_id (type: bigint)

Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

value expressions: exp (type: int)

Map 5

Map Operator Tree:

TableScan

alias: edm_error_email

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: email (type: string)

outputColumnNames: email

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Group By Operator

keys: lower(email) (type: string)

mode: hash

outputColumnNames: _col0

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Map 7

Map Operator Tree:

TableScan

alias: t

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: id (type: bigint)

sort order: +

Map-reduce partition columns: id (type: bigint)

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), update_time (type: string), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)

Map 9

Map Operator Tree:

TableScan

alias: t2

Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_user_id (type: bigint)

sort order: +

Map-reduce partition columns: end_user_id (type: bigint)

Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

value expressions: user_grade (type: double), user_grade_type (type: double)

Reducer 2

Reduce Operator Tree:

Group By Operator

keys: KEY._col0 (type: string)

mode: mergepartial

outputColumnNames: _col0

Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

Reducer 3

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

Left Outer Join0 to 2

keys:

0 lower(_col107) (type: string)

1 _col0 (type: string)

2 _col0 (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146

Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col107 (type: string)

sort order: +

Map-reduce partition columns: _col107 (type: string)

Statistics: Num rows: 788865883 Data size: 360827596199 Basic stats: COMPLETE Column stats: NONE

value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int), _col145 (type: string), _col146 (type: string)

Reducer 4

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

keys:

0 _col107 (type: string)

1 email (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157

Statistics: Num rows: 867752490 Data size: 396910364421 Basic stats: COMPLETE Column stats: NONE

Map Join Operator

condition map:

Left Outer Join0 to 1

keys:

0 _col72 (type: string)

1 usr_union_logon_code (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133, _col145, _col146, _col157, _col163

input vertices:

1 Map 12

Statistics: Num rows: 954527759 Data size: 436601410326 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: ((_col84 >= '2015-05-04') or (length(_col131) > 0)) (type: boolean)

Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107'@'), (instr(_col107'.') - instr(_col107'@'))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like '%@sina%')) THEN (1) WHEN ((_col1 like '%@pingan%')) THEN (2) WHEN (((_col1 like '%@alipay%') and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like '%@alipay%') and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like '%@163%')) THEN (5) WHEN ((_col1 like '%@kaixin001%')) THEN (6) WHEN ((_col1 like '%@139%')) THEN (7) WHEN ((_col1 like '%@msn.com%')) THEN (8) WHEN ((_col1 like '%@anyue%')) THEN (9) WHEN ((_col1 like '%@qq%')) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like '%@b2b%') or (_col1 = 'yuxiaolan'))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = '1')) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), '' (type: string), '' (type: string), '' (type: string), '' (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38

Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

Statistics: Num rows: 636351838 Data size: 291067606274 Basic stats: COMPLETE Column stats: NONE

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Reducer 6

Reduce Operator Tree:

Group By Operator

keys: KEY._col0 (type: string)

mode: mergepartial

outputColumnNames: _col0

Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

Reducer 8

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

Left Outer Join0 to 2

Left Outer Join0 to 3

keys:

0 id (type: bigint)

1 end_user_id (type: bigint)

2 end_usr_id (type: bigint)

3 end_user_id (type: bigint)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col84, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col131, _col133

Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: lower(_col107) (type: string)

sort order: +

Map-reduce partition columns: lower(_col107) (type: string)

Statistics: Num rows: 358575394 Data size: 164012540172 Basic stats: COMPLETE Column stats: NONE

value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col84 (type: string), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col131 (type: bigint), _col133 (type: int)


Stage: Stage-0

Fetch Operator

limit: -1

Processor Tree:

ListSink

Tez 合并了多个join给一个reduce,避免了启动过多的job(见粗体部分)已经有所优化,但是为啥此sql执行慢呢?

--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 10 ......... SUCCEEDED 3 3 0 0 0 0
Map 11 ......... SUCCEEDED 151 151 0 0 0 0
Map 12 ......... SUCCEEDED 1 1 0 0 0 0
Map 13 ......... SUCCEEDED 76 76 0 0 0 0
Map 5 .......... SUCCEEDED 11 11 0 0 0 0
Map 7 .......... SUCCEEDED 156 156 0 0 0 0
Map 9 .......... SUCCEEDED 10 10 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ..... RUNNING 642 641 1 0 0 0
Reducer 4 RUNNING 1009 0 89 920 0 0
Reducer 6 ...... SUCCEEDED 3 3 0 0 0 0
Reducer 8 ...... SUCCEEDED 203 203 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 11/13 [==============>>------------] 55% ELAPSED TIME: 307.54 s

Reduce 3 中有1个任务始终RUNNING

查看SQL中最后的where 条件发现

where t.update_time>= '2015-05-04' or t8.end_user_id>0; 导致无法提前过滤掉t.update_time< '2015-05-04'的数据,导致后续join产生的reduce数据过大(见上面执行计划蓝色部分),可能产生了数据倾斜,导致任务时长拉长。

去掉or t8.end_user_id>0的时候,end_user TS返回36219736

Map 7

MapOperator Tree:

TableScan

alias: t

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

Filter Operator

predicate: (update_time >='2015-05-04') (type: boolean)

Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE

Reduce Output Operator

key expressions: id (type: bigint)

sort order: +

Map-reduce partition columns: id (type: bigint)

Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETEColumn stats: NONE

value expressions: end_user_name (type: string),

没有去掉or t8.end_user_id>0的时候,end_user TS返回108659208

Map 7

MapOperator Tree:

TableScan

alias: t

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

Reduce Output Operator

key expressions: id (type: bigint)

sort order: +

Map-reduce partition columns: id (type: bigint)

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETEColumn stats: NONE

value expressions: end_user_name (type: string),end_user_password (type: string), end_user_real_name (type: string),end_user_birthday (type: string), end_user_last_login_date (type: string),end_user_last_bought_date (type: string), end_user_login_times (type: double),end_user_bought_amount (type: double), end_user_bought_times (type: double),end_user_sex (type: double), end_user_create_time (type: string), end_user_type(type: double), ip (type: string), end_user_points (type: double), co_code(type: string), is_email_activate (type: double), mc_site_id (type: bigint),update_time (type: string), member_grade (type: double), end_user_email (type:string), mobile (type: string), phone (type: string), valid_mobile_phone_num(type: int), id_card (type: string)


业务部门说可以去掉or t8.end_user_id>0 这样就大大减少了 reduce数据的结果集。

最后附上去掉or t8.end_user_id>0的执行计划:

STAGE DEPENDENCIES:

Stage-1 is a root stage

Stage-0 depends on stages: Stage-1


STAGE PLANS:

Stage: Stage-1

Tez

Edges:

Reducer 2 <- Map 1 (SIMPLE_EDGE)

Reducer 3 <- Reducer 2 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE), Reducer 8 (SIMPLE_EDGE)

Reducer 4 <- Map 11 (SIMPLE_EDGE), Map 12 (BROADCAST_EDGE), Reducer 3 (SIMPLE_EDGE)

Reducer 6 <- Map 5 (SIMPLE_EDGE)

Reducer 8 <- Map 10 (SIMPLE_EDGE), Map 13 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE), Map 9 (SIMPLE_EDGE)

DagName: zhoushugang_20150505143030_fda7333f-2147-4a65-a63f-7c08c8a3630b:1

Vertices:

Map 1

Map Operator Tree:

TableScan

alias: yhd_send_message_blacklist

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: send_to (type: string)

outputColumnNames: send_to

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Group By Operator

keys: lower(send_to) (type: string)

mode: hash

outputColumnNames: _col0

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 870860 Data size: 46859629 Basic stats: COMPLETE Column stats: NONE

Map 10

Map Operator Tree:

TableScan

alias: t5

Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_usr_id (type: bigint)

sort order: +

Map-reduce partition columns: end_usr_id (type: bigint)

Statistics: Num rows: 4414357 Data size: 67807378 Basic stats: COMPLETE Column stats: NONE

value expressions: car (type: string), shopping_habit (type: string)

Map 11

Map Operator Tree:

TableScan

alias: t6

Statistics: Num rows: 38930100 Data size: 4515891712 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: user_id is not null (type: boolean)

Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: email (type: string)

sort order: +

Map-reduce partition columns: email (type: string)

Statistics: Num rows: 19465050 Data size: 2257945856 Basic stats: COMPLETE Column stats: NONE

value expressions: mail_level (type: double)

Map 12

Map Operator Tree:

TableScan

alias: t7

Statistics: Num rows: 16 Data size: 1752 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: usr_union_logon_code is not null (type: boolean)

Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: usr_union_logon_code (type: string)

sort order: +

Map-reduce partition columns: usr_union_logon_code (type: string)

Statistics: Num rows: 8 Data size: 876 Basic stats: COMPLETE Column stats: NONE

value expressions: usr_union_logon_id (type: bigint)

Map 13

Map Operator Tree:

TableScan

alias: t8

Statistics: Num rows: 25660729 Data size: 3178607508 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: (to_date(update_time) = '2015-05-04') (type: boolean)

Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_user_id (type: bigint)

sort order: +

Map-reduce partition columns: end_user_id (type: bigint)

Statistics: Num rows: 12830364 Data size: 1589303692 Basic stats: COMPLETE Column stats: NONE

value expressions: exp (type: int)

Map 5

Map Operator Tree:

TableScan

alias: edm_error_email

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: email (type: string)

outputColumnNames: email

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Group By Operator

keys: lower(email) (type: string)

mode: hash

outputColumnNames: _col0

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 5137907 Data size: 513790784 Basic stats: COMPLETE Column stats: NONE

Map 7

Map Operator Tree:

TableScan

alias: t

Statistics: Num rows: 108659208 Data size: 49700768672 Basic stats: COMPLETE Column stats: NONE

Filter Operator

predicate: (update_time >= '2015-05-04') (type: boolean)

Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: id (type: bigint)

sort order: +

Map-reduce partition columns: id (type: bigint)

Statistics: Num rows: 36219736 Data size: 16566922890 Basic stats: COMPLETE Column stats: NONE

value expressions: end_user_name (type: string), end_user_password (type: string), end_user_real_name (type: string), end_user_birthday (type: string), end_user_last_login_date (type: string), end_user_last_bought_date (type: string), end_user_login_times (type: double), end_user_bought_amount (type: double), end_user_bought_times (type: double), end_user_sex (type: double), end_user_create_time (type: string), end_user_type (type: double), ip (type: string), end_user_points (type: double), co_code (type: string), is_email_activate (type: double), mc_site_id (type: bigint), member_grade (type: double), end_user_email (type: string), mobile (type: string), phone (type: string), valid_mobile_phone_num (type: int), id_card (type: string)

Map 9

Map Operator Tree:

TableScan

alias: t2

Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: end_user_id (type: bigint)

sort order: +

Map-reduce partition columns: end_user_id (type: bigint)

Statistics: Num rows: 11871989 Data size: 422420127 Basic stats: COMPLETE Column stats: NONE

value expressions: user_grade (type: double), user_grade_type (type: double)

Reducer 2

Reduce Operator Tree:

Group By Operator

keys: KEY._col0 (type: string)

mode: mergepartial

outputColumnNames: _col0

Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 435430 Data size: 23429814 Basic stats: COMPLETE Column stats: NONE

Reducer 3

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

Left Outer Join0 to 2

keys:

0 lower(_col107) (type: string)

1 _col0 (type: string)

2 _col0 (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146

Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col107 (type: string)

sort order: +

Map-reduce partition columns: _col107 (type: string)

Statistics: Num rows: 262955293 Data size: 120275865393 Basic stats: COMPLETE Column stats: NONE

value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int), _col145 (type: string), _col146 (type: string)

Reducer 4

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

keys:

0 _col107 (type: string)

1 email (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157

Statistics: Num rows: 289250828 Data size: 132303454799 Basic stats: COMPLETE Column stats: NONE

Map Join Operator

condition map:

Left Outer Join0 to 1

keys:

0 _col72 (type: string)

1 usr_union_logon_code (type: string)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133, _col145, _col146, _col157, _col163

input vertices:

1 Map 12

Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

Select Operator

expressions: _col0 (type: bigint), _col1 (type: string), _col3 (type: string), _col2 (type: string), _col4 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col107 (type: string), substr(_col107, instr(_col107'@'), (instr(_col107'.') - instr(_col107'@'))) (type: string), _col44 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col32 (type: double), _col31 (type: double), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col111 (type: string), _col34 (type: double), _col83 (type: bigint), CASE WHEN ((_col78 = 1.0)) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col110 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col1 like '%@sina%')) THEN (1) WHEN ((_col1 like '%@pingan%')) THEN (2) WHEN (((_col1 like '%@alipay%') and (not ((_col117) IN (1, 2) and (_col118 = 1.0))))) THEN (3) WHEN ((((_col1 like '%@alipay%') and (_col117) IN (1, 2)) and (_col118 = 1.0))) THEN (4) WHEN ((_col1 like '%@163%')) THEN (5) WHEN ((_col1 like '%@kaixin001%')) THEN (6) WHEN ((_col1 like '%@139%')) THEN (7) WHEN ((_col1 like '%@msn.com%')) THEN (8) WHEN ((_col1 like '%@anyue%')) THEN (9) WHEN ((_col1 like '%@qq%')) THEN (10) ELSE (0) END (type: int), CASE WHEN (((_col1 like '%@b2b%') or (_col1 = 'yuxiaolan'))) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col145 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col146 is not null) THEN (1) ELSE (0) END (type: int), CASE WHEN ((_col125 = '1')) THEN (1) ELSE (0) END (type: int), _col126 (type: string), _col157 (type: double), '' (type: string), '' (type: string), '' (type: string), '' (type: string), _col103 (type: double), CASE WHEN (_col72 is null) THEN (null) WHEN ((_col72 is not null and _col163 is null)) THEN (-999999) ELSE (_col163) END (type: bigint), _col72 (type: string), _col133 (type: int)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38

Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

File Output Operator

compressed: false

Statistics: Num rows: 318175917 Data size: 145533803433 Basic stats: COMPLETE Column stats: NONE

table:

input format: org.apache.hadoop.mapred.TextInputFormat

output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Reducer 6

Reduce Operator Tree:

Group By Operator

keys: KEY._col0 (type: string)

mode: mergepartial

outputColumnNames: _col0

Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: _col0 (type: string)

sort order: +

Map-reduce partition columns: _col0 (type: string)

Statistics: Num rows: 2568953 Data size: 256895341 Basic stats: COMPLETE Column stats: NONE

Reducer 8

Reduce Operator Tree:

Merge Join Operator

condition map:

Left Outer Join0 to 1

Left Outer Join0 to 2

Left Outer Join0 to 3

keys:

0 id (type: bigint)

1 end_user_id (type: bigint)

2 end_usr_id (type: bigint)

3 end_user_id (type: bigint)

outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col22, _col24, _col30, _col31, _col32, _col34, _col44, _col67, _col68, _col69, _col72, _col78, _col83, _col103, _col107, _col108, _col109, _col110, _col111, _col117, _col118, _col125, _col126, _col133

Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE

Reduce Output Operator

key expressions: lower(_col107) (type: string)

sort order: +

Map-reduce partition columns: lower(_col107) (type: string)

Statistics: Num rows: 119525131 Data size: 54670846721 Basic stats: COMPLETE Column stats: NONE

value expressions: _col0 (type: bigint), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col22 (type: string), _col24 (type: string), _col30 (type: double), _col31 (type: double), _col32 (type: double), _col34 (type: double), _col44 (type: string), _col67 (type: double), _col68 (type: string), _col69 (type: double), _col72 (type: string), _col78 (type: double), _col83 (type: bigint), _col103 (type: double), _col107 (type: string), _col108 (type: string), _col109 (type: string), _col110 (type: int), _col111 (type: string), _col117 (type: double), _col118 (type: double), _col125 (type: string), _col126 (type: string), _col133 (type: int)


Stage: Stage-0

Fetch Operator

limit: -1

Processor Tree:

ListSink


0