千家信息网

hive中的lateral view结合udtf函数的使用解决生产问题

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,===============================================================================create table psn(
千家信息网最后更新 2025年01月23日hive中的lateral view结合udtf函数的使用解决生产问题
===============================================================================create table psn(    id int,    name string,    likes array,    address map)partitioned by (age int)row format delimitedfields terminated by '\t'collection items terminated by '-'map keys terminated by ':'lines terminated by '\n'; ====================================================================================hive> load data local inpath '/root/a.txt' overwrite into table psn partition(age=10);Loading data to table default.psn partition (age=10)OKTime taken: 3.817 seconds=================================================================================hive> select * from psn;OK1   zhang3  ["sing","tennis","running"] {"beijing":"daxing"}    102   li4 ["sing","pingpong","swim"]  {"shanghai":"baoshan"}  103   wang5   ["read","joke","football"]  {"guangzou":"baiyun"}   10==============================================================================需求:    一次性统计每种爱好出现了多少次,每个城市出现了多少次,每个区出现多少次。分析:    这个需求有点像hive实现wordcount案例,或者说它就是两个wc案例的聚合,只不过现在这个不用split。    在wc案例中,我们使用explode完美地解决了一列记录wc操作。    但是在hive中的udtf函数(split/explode)中,select子句只能单独出现一个udtf函数,且udtf函数不能与其它字段和函数一并使用。    #####只能select explode(..) from emp;    #####不能select explode(..), explode(..) from emp;    #####不能select id,explode(..) from emp;    这就会造成对于一些复杂逻辑就会出现无法处理的问题,就比如以上这个两列记录的wc操作。    这时候就需要用到lateral view了,它可以将udtf函数产生的多行结果组织成一张虚拟表。===================================================================================hive> select count(distinct c1),count(distinct c2),count(distinct c3)from psn     >lateral view explode(likes)t1 as c1     >lateral view explode(address)t2 as c2,c3;#####t1和t2为经过udtf函数产生的虚拟表的表名,c1/c2/c3为字段别名#####数组经过explode会产生一列数据,map集合产生两列。Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12019-04-24 22:59:16,471 Stage-1 map = 0%,  reduce = 0%2019-04-24 22:59:25,681 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.76 sec2019-04-24 22:59:36,268 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.15 secMapReduce Total cumulative CPU time: 4 seconds 150 msecEnded Job = job_1556088929464_0004MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.15 sec   HDFS Read: 14429 HDFS Write: 105 SUCCESSTotal MapReduce CPU Time Spent: 4 seconds 150 msecOK8   3   3Time taken: 35.986 seconds, Fetched: 1 row(s)=============================================================================
0