Hive内部函数简介及查询语法
发表于:2025-02-01 作者:千家信息网编辑
千家信息网最后更新 2025年02月01日,1.Hive内置函数:在Hive中 系统给我们内置了很多函数 具体参考官方地址看下官网给我们的介绍:SHOW FUNCTIONS; --查看所有内置函数DESCRIBE FUNCTION ; --查看
千家信息网最后更新 2025年02月01日Hive内部函数简介及查询语法
1.Hive内置函数:
在Hive中 系统给我们内置了很多函数 具体参考官方地址
- 看下官网给我们的介绍:
SHOW FUNCTIONS; --查看所有内置函数DESCRIBE FUNCTION ; --查看某个函数的描述DESCRIBE FUNCTION EXTENDED ; --查看某个函数的具体使用方法
hive> DESCRIBE FUNCTION case;OKCASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return fTime taken: 0.006 seconds, Fetched: 1 row(s)hive> DESCRIBE FUNCTION EXTENDED case;OKCASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return fExample: SELECT CASE deptno WHEN 1 THEN Engineering WHEN 2 THEN Finance ELSE admin END, CASE zone WHEN 7 THEN Americas ELSE Asia-Pac END FROM emp_detailsTime taken: 0.008 seconds, Fetched: 13 row(s)# DESCRIBE 可简写为deschive> desc FUNCTION EXTENDED case;OKCASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return fExample: SELECT CASE deptno WHEN 1 THEN Engineering WHEN 2 THEN Finance ELSE admin END, CASE zone WHEN 7 THEN Americas ELSE Asia-Pac END FROM emp_detailsTime taken: 0.009 seconds, Fetched: 13 row(s)
下面我们了解下常用函数的使用方法:
# 为了方便测试 我们创建常用的dual表hive> create table dual(x string);OKTime taken: 0.11 secondshive> insert into table dual values('');Query ID = hadoop_20180702100505_f0566585-06b2-4c53-910a-b6a58791fc2dTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 29.535 secondshive> select * from dual;OKTime taken: 0.147 seconds, Fetched: 1 row(s)# 测试当前时间hive> select current_date from dual;OK2018-07-02Time taken: 0.111 seconds, Fetched: 1 row(s)# 测试当前时间戳hive> select current_timestamp from dual;OK2018-07-02 15:03:28.919Time taken: 0.117 seconds, Fetched: 1 row(s)# 测试substr函数 用于截取字符串hive> desc function extended substr;OKsubstr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length lenSynonyms: substringpos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str.Example: > SELECT substr('Facebook', 5) FROM src LIMIT 1; 'book' > SELECT substr('Facebook', -5) FROM src LIMIT 1; 'ebook' > SELECT substr('Facebook', 5, 1) FROM src LIMIT 1; 'b'Time taken: 0.016 seconds, Fetched: 10 row(s)hive> SELECT substr('helloworld',-5) FROM dual;OKworldTime taken: 0.171 seconds, Fetched: 1 row(s)hive> SELECT substr('helloworld',5) FROM dual;OKoworldTime taken: 0.12 seconds, Fetched: 1 row(s)hive> SELECT substr('helloworld',5,3) FROM dual;OKowoTime taken: 0.142 seconds, Fetched: 1 row(s)# 测试函数concat 用于将字符连接起来hive> desc function extended concat_ws;OKconcat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator.Example: > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1; 'www.facebook.com'Time taken: 0.019 seconds, Fetched: 4 row(s)hive> select concat_ws(".","192","168","199","151") from dual;OK192.168.199.151Time taken: 0.152 seconds, Fetched: 1 row(s)# 测试函数split 用于拆分hive> desc function extended split;OKsplit(str, regex) - Splits str around occurances that match regexExample: > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1; ["one", "two", "three"]Time taken: 0.021 seconds, Fetched: 4 row(s)hive> select split("192.168.199.151","\\.") from dual;OK["192","168","199","151"]Time taken: 0.169 seconds, Fetched: 1 row(s)
2.Hive查询语法:
- 简单select语法:
# 简单select语法hive> select * from emp where deptno=10;OK7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10Time taken: 0.899 seconds, Fetched: 3 row(s)hive> select * from emp where empno <= 7800;OK7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20Time taken: 0.277 seconds, Fetched: 8 row(s)hive> select * from emp where salary between 1000 and 1500;OK7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 307844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 307876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 207934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10Time taken: 0.187 seconds, Fetched: 5 row(s)hive> select * from emp limit 5;OK7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30Time taken: 0.154 seconds, Fetched: 5 row(s)hive> select * from emp where empno in(7566,7499);OK7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20Time taken: 0.153 seconds, Fetched: 2 row(s)hive> select * from emp where comm is not null;OK7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 307844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30Time taken: 0.291 seconds, Fetched: 4 row(s)
- 聚合函数及分组函数:
# 聚合函数及分组函数# max/min/count/sum/avg 特点:多进一出,进来很多条记录出去只有一条记录# 查询部门编号为10的有多少条记录hive> select count(1) from emp where deptno=10;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK3Time taken: 38.951 seconds, Fetched: 1 row(s)# 求最大工资,最小工资,平均工资,工资的和hive> select max(salary),min(salary),avg(salary),sum(salary) from emp;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK5000.0 800.0 2073.214285714286 29025.0Time taken: 23.748 seconds, Fetched: 1 row(s)# 分组函数 group by# 求部门的平均工资# 注:select中出现的字段,如果没有出现在组函数/聚合函数中,必须出现在group by里面,否则就会产生报错hive> select deptno,avg(salary) from emp group by deptno;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK10 2916.666666666666520 2175.030 1566.6666666666667Time taken: 36.502 seconds, Fetched: 3 row(s)# 求每个部门(deptno)、工作岗位(job)的最高工资(salary)hive> select deptno,job,max(salary) from emp group by deptno,job;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK10 CLERK 1300.010 MANAGER 2450.010 PRESIDENT 5000.020 ANALYST 3000.020 CLERK 1100.020 MANAGER 2975.030 CLERK 950.030 MANAGER 2850.030 SALESMAN 1600.0Time taken: 36.096 seconds, Fetched: 9 row(s)# 查询平均工资大于2000的部门(使用having子句限定分组查询)hive> select deptno,avg(salary) from emp group by deptno having avg(salary) >2000;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK10 2916.666666666666520 2175.0Time taken: 24.71 seconds, Fetched: 2 row(s)# case when then end(不会跑mr)hive> select ename, salary, > case > when salary > 1 and salary <= 1000 then 'LOWER' > when salary > 1000 and salary <= 2000 then 'MIDDLE' > when salary > 2000 and salary <= 4000 then 'HIGH' > ELSE 'HIGHEST' > end > from emp;OKSMITH 800.0 LOWERALLEN 1600.0 MIDDLEWARD 1250.0 MIDDLEJONES 2975.0 HIGHMARTIN 1250.0 MIDDLEBLAKE 2850.0 HIGHCLARK 2450.0 HIGHSCOTT 3000.0 HIGHKING 5000.0 HIGHESTTURNER 1500.0 MIDDLEADAMS 1100.0 MIDDLEJAMES 950.0 LOWERFORD 3000.0 HIGHMILLER 1300.0 MIDDLETime taken: 0.096 seconds, Fetched: 14 row(s)
- 多表join查询:
# 创建测试表hive> create table a( > id int, name string > ) row format delimited fields terminated by '\t';OKTime taken: 0.311 secondshive> create table b( > id int, age int > ) row format delimited fields terminated by '\t';OKTime taken: 0.142 seconds# insert或load数据 最后表数据如下hive> select * from a;OK1 zhangsan2 lisi3 wangwuhive> select * from b;OK1 202 304 40Time taken: 0.2 seconds, Fetched: 3 row(s)# 内连接 inner join = join 仅列出表1和表2符合连接条件的数据hive> select a.id,a.name,b.age from a join b on a.id=b.id;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK1 zhangsan 202 lisi 30Time taken: 24.415 seconds, Fetched: 2 row(s)# 左外连接(left join) 以左边的为基准,左边的数据全部数据全部出现,如果没有出现就赋null值hive> select a.id,a.name,b.age from a left join b on a.id=b.id;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK1 zhangsan 202 lisi 303 wangwu NULLTime taken: 26.218 seconds, Fetched: 3 row(s)# 右外连接(right join) 以右表为基准hive> select a.id,a.name,b.age from a right join b on a.id=b.id;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK1 zhangsan 202 lisi 30NULL NULL 40Time taken: 24.027 seconds, Fetched: 3 row(s)# 全连接(full join)相当于表1和表2的数据都显示,如果没有对应的数据,则显示Null.hive> select a.id,a.name,b.age from a full join b on a.id=b.id;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK1 zhangsan 202 lisi 303 wangwu NULLNULL NULL 40Time taken: 32.94 seconds, Fetched: 4 row(s)# 笛卡尔积(cross join) 没有连接条件 会针对表1和表2的每条数据做连接hive> select a.id,a.name,b.age from a cross join b;Warning: Map Join MAPJOIN[7][bigTable=a] in task 'Stage-3:MAPRED' is a cross productQuery ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OK1 zhangsan 201 zhangsan 301 zhangsan 402 lisi 202 lisi 302 lisi 403 wangwu 203 wangwu 303 wangwu 40Time taken: 29.825 seconds, Fetched: 9 row(s)
3.利用Hive sql实现wordcount:
# 创建表 加载测试数据hive> create table hive_wc(sentence string);OKTime taken: 0.149 seconds[hadoop@hadoop000 ~]$ cat hive-wc.txthello,world,welcomehello,welcomehive> load data local inpath '/home/hadoop/hive-wc.txt' into table hive_wc;Loading data to table default.hive_wcTable default.hive_wc stats: [numFiles=1, totalSize=34]OKTime taken: 0.729 secondshive> select * from hive_wc;OKhello,world,welcomehello,welcomeTime taken: 0.13 seconds, Fetched: 2 row(s)# 获取每个单词 利用split分割hive> select split(sentence,",") from hive_wc;OK["hello","world","welcome"]["hello","welcome"]Time taken: 0.163 seconds, Fetched: 2 row(s)# explode把数组转成多行 结合split使用如下hive> select explode(split(sentence,",")) from hive_wc;OKhelloworldwelcomehellowelcomeTime taken: 0.068 seconds, Fetched: 5 row(s)# 做group by操作 一条语句即可实现wordcount统计hive> select word, count(1) as c > from (select explode(split(sentence,",")) as word from hive_wc) t > group by word ;Query ID = hadoop_20180703142525_af460dc7-287b-41b2-8af3-ba27cc0ea6ceTotal jobs = 1...OKhello 2welcome 2world 1Time taken: 34.168 seconds, Fetched: 3 row(s)
函数
数据
测试
工资
查询
部门
分组
语法
使用方法
基准
字符
常用
方法
时间
条件
最大
最小
最高
下官
全连
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
消逝的光芒2显示无法连接服务器
数据库修改记录方式
大话24上服务器排名
网络安全和云计算公司
web服务器地址查询器
延边大学网络安全研究室
数据库连接的服务器
多媒体服务器
高中网络技术是学啥的
飞花逐月服务器是四下区吗
大润发软件开发
网络安全与网络弹性的区别
数据库跟数据表管理
lol连接不上服务器
架设ftp服务器软件排名
江苏综合软件开发哪里好
中国支付宝中国网络技术有限公司
中国网络安全的战略应对包括
免费数据库账号
龙人之锤数据库
春考网络安全技术学那本书
微信 pc 代理服务器
打印机服务器作用
oa服务器架设
公安网络安全开展情况
河北网络安全公约
ibm服务器管理控制台
sql 批量备份数据库
网络安全与网络弹性的区别
psv重构数据库教程