Hive笔记整理(三)
发表于:2025-01-31 作者:千家信息网编辑
千家信息网最后更新 2025年01月31日,[TOC]Hive笔记整理(三)Hive的函数Hive函数分类函数的定义和java、mysql一样,有三种。UDF(User Definition Function 用户定义函数)一路输入,一路输出s
千家信息网最后更新 2025年01月31日Hive笔记整理(三)
[TOC]
Hive笔记整理(三)
Hive的函数
Hive函数分类
函数的定义和java、mysql一样,有三种。
UDF(User Definition Function 用户定义函数)
一路输入,一路输出sin(30°)=1/2
UDAF(User Definition Aggregation Function 聚合函数)
多路输入,一路输出max min count sum avg等等
UDTF(User Definition Table Function 表函数)
一路输入,多路输出explode
常用函数
show functions; 列出hive中可用的函数列表desc function func_name; 查看函数的帮助说明case when ---->switch或if elseif ---->三元运算符explode ---->将数组中的元素转换成多行数据a = [1, 2, 3, 4] explode(a) ===> 1 2 3 4split ---->就是字符串中的split函数array ---->collect_setcollect_listconcat_ws ---->使用给定的字符串来连接元素--------------row_number ---->分组排序或者二次排序
函数案例
wordcount
分析: hello you hello me hello he使用mr的的过程 step1----->split("\t")---> ["hello", "you"] ["hello", "me"] ["hello", "he"] step2----->遍历每一个数组,将数组中的每一个值,作为key,value为1写出去 <"hello", 1> <"you", 1> <"hello", 1> <"me", 1> <"hello", 1> <"he", 1> step3,shuffle---> <"hello", [1, 1, 1]> <"you", 1> <"me", 1> <"he", 1> step 4, reduce ====>reduceByKey使用hql step 1 (mydb1)> select split(line, "\t") from test; ["hello","you"] ["hello","he"] ["hello","me"] step 2 将数组中的每一行数据转化为多行 (mydb1)> select explode(split(line, "\t")) from test; hello you hello he hello me step 3 在step2的基础之上进行group by 即可 select w.word, count(w.word) as count from (select explode(split(line, "\t")) word from test) w group by w.word order by count desc;
case when
case when将一下对应的部门名称显示出来:
1--->学工组,2--->行政组,3---->销售组,4---->研发组,5---->其它hive (mydb1)> select * from t1;12345select id,case id when 1 then "学工组" when 2 then "行政组" when 3 then "销售组" when 4 then "研发组" else "行政组"endfrom t1; 分类显示1 学工组2 行政组3 销售组4 研发组5 其它
row_number 二次排序
三种连接 交叉连接 across join,会有笛卡尔积,所以不用 内连接(等值连接) inner join 将左表和右表中能够匹配的上的数据做输出 外链接 outer join 左外连接(left outer join) 右外链接(right outer join)根据员工、部分、薪资,这三张表, 1、分组显示每一个部分员工的信息(启动显示部分名称,员工姓名,员工性别[男|女],员工薪资),同时分组按照员工薪资降序排序 select e.name, if(sex == 0, '女', '男') as gender, d.name, s.salary, row_number() over(partition by e.deptid order by s.salary desc) rank from t_dept d left join t_employee e on d.id = e.deptid left join t_salary s on e.id = s.empid where s.salary is not null; 2、获取显示部门薪资top2的员工信息 select tmp.* from (select e.name, if(sex == 0, '女', '男') as gender, d.name, s.salary, row_number() over(partition by e.deptid order by s.salary desc) rank from t_dept d left join t_employee e on d.id = e.deptid left join t_salary s on e.id = s.empid where s.salary is not null) tmp where tmp.rank < 3; 如果查询的是单表,则可以不用子查询,只用用having来获取即可(having rank < 3)
直接看下面的一个例子就可以知道row_number的使用方法了:
hive (mydb2)> create table t9( > id int, > province string, > salary float > );hive (mydb2)> insert into t9 values(1,'gd',18000),(2,'gd',16000),(3,'bj',13000),(4,'gd',15000),(5,'bj',17000),(6,'bj',19000);hive (mydb2)> select * from t9;OK1 gd 18000.02 gd 16000.03 bj 13000.04 gd 15000.05 bj 17000.06 bj 19000.0Time taken: 0.097 seconds, Fetched: 6 row(s)hive (mydb2)> select > id, > province, > salary, > row_number() over(partition by province order by salary desc) as rank > from t9;OK6 bj 19000.0 15 bj 17000.0 23 bj 13000.0 31 gd 18000.0 12 gd 16000.0 24 gd 15000.0 3Time taken: 1.578 seconds, Fetched: 6 row(s)
Hive自定义函数
自定义函数步骤
自定义函数需要遵循的6个步骤:
1°、自定义一个Java类来继承UDF类2°、覆盖其中的evaluate()的函数,有系统去调用3°、将写好的程序打成一个jar,上传至服务器4°、将3°中的jar加载到hive的classpathhive终端执行add jar jar_path;5°、给自定义函数设置一个临时的名称,也就是说要创建一个临时的函数create temporary function 函数名 as '写的evalutor所在类的全类名';6°、执行函数结束之后,可以手动销毁临时函数,或者不用管,因为当前会话消失,函数自动销毁
UDF案例:要根据用户的birthday,统计对应的×××和星座
程序代码如下:
package com.uplooking.bigdata.hive.udf;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import java.text.ParseException;import java.text.SimpleDateFormat;import java.util.Calendar;import java.util.Date;@Description(name = "z_c", value = "_FUNC_(param1, param2) - 返回给定日期对应的×××或者星座", extended = "param1,param2参数可以是一下:\n" + "1. param1 is A string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.\n" + "2. param1 date value\n" + "3. param1 timestamp value\n" + "3. param2 0 or 1, 0 means constellation, 1 means zodica\n" + "Example:\n " + " > SELECT _FUNC_('2009-07-30', 0) FROM src LIMIT 1;\n" + " 狮子座")public class ZodicaAndConstellationUDF extends UDF { public Text evaluate(java.sql.Date date, int type) { if(type == 0) {//星座 return new Text(getConstellation(new Date(date.getTime()))); } else if(type == 1) { //××× return new Text(getZodica(new Date(date.getTime()))); } return null; } public String[] zodiacArr = { "猴", "鸡", "狗", "猪", "鼠", "牛", "虎", "兔", "龙", "蛇", "马", "羊" }; public String[] constellationArr = { "水瓶座", "双鱼座", "白羊座", "金牛座", "双子座", "巨蟹座", "狮子座", "×××座", "天秤座", "天蝎座", "射手座", "魔羯座" }; public int[] constellationEdgeDay = { 20, 19, 21, 21, 21, 22, 23, 23, 23, 23, 22, 22 }; /** * 根据日期获取××× * @return */ public String getZodica(Date date) { Calendar cal = Calendar.getInstance(); cal.setTime(date); return zodiacArr[cal.get(Calendar.YEAR) % 12]; } /** * 根据日期获取星座 * @return */ public String getConstellation(Date date) { if (date == null) { return ""; } Calendar cal = Calendar.getInstance(); cal.setTime(date); int month = cal.get(Calendar.MONTH); int day = cal.get(Calendar.DAY_OF_MONTH); if (day < constellationEdgeDay[month]) { month = month - 1; } if (month >= 0) { return constellationArr[month]; } // default to return 魔羯 return constellationArr[11]; }}
注意依赖在笔记最后面。
上传到服务器后,在hive终端中加载到hive的classpath:
add jar /home/uplooking/jars/hive/udf-zc.jar
自定义函数:
create temporary function zc as 'com.uplooking.bigdata.hive.udf.ZodicaAndConstellationUDF';
创建测试用的临时表:
hive (mydb1)> > create temporary table tmp( > birthday date);
插入测试用的数据:
hive (mydb1)> insert into tmp values('1994-06-21');
在查询中使用函数:
hive (mydb1)> select zc(birthday,0) from tmp;OKc0双子座Time taken: 0.084 seconds, Fetched: 1 row(s)hive (mydb1)> select zc(birthday,1) from tmp;OKc0狗Time taken: 0.044 seconds, Fetched: 1 row(s)
下面是一个更简单的UDF函数,可以参考进行测试:
package cn.xpleaf.hive.udf;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;/** * @author Leaf * @date 2018/9/18 下午11:11 */@Description(name = "addUDF", value = "_FUNC_(num1, num2) - 返回给定两个数的和")public class AddUDF extends UDF { public Text evaluate(int num1, int num2) { return new Text(String.valueOf(num1 + num2)); }}
Hive之jdbc
Hive除了提供前面的cli用户接口,还提供了jdbc的用户接口,但是如果需要使用该接口,则需要先启动hiveserver2服务,启动该服务后,可以通过hive提供的beeline继续以cli的方式操作hive(不过需要注意的是,此时是通过jdbc接口进行操作hive的),也可以通过手工编写java代码来进行操作。
启动hiveserver2服务
[uplooking@uplooking01 ~]$ hiveserver2
通过beeline连接hiveserver进行操作
[uplooking@uplooking01 hive]$ beelinewhich: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/jdk/bin:/home/uplooking/bin:/home/uplooking/app/zookeeper/bin:/home/uplooking/app/hadoop/bin:/home/uplooking/app/hadoop/sbin:/home/uplooking/app/hive/bin)ls: 无法访问/home/uplooking/app/hive/lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录Beeline version 2.1.0 by Apache Hivebeeline> !connect jdbc:hive2://uplooking01:10000/mydb1Connecting to jdbc:hive2://uplooking01:10000/mydb1Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplookingEnter password for jdbc:hive2://uplooking01:10000/mydb1: *********SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/uplooking/app/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: uplooking is not allowed to impersonate uplooking (state=,code=0)
可以看到出现错误,解决方案如下:
在执行JDBC的时候,访问不了远程的Hive的ThriftServer服务报的错误:uplooking不能伪装为uplooking 是因为版本在进行升级的时候考虑到的安全策略,需要我们手动对uplooking进行配置,需要将hadoop中的uplooking用户和hive中的uplooking用户进行打通,配置在$HADOOP_HOME/etc/hadoop/core-site.xml中进行配置:添加一下配置项 hadoop.proxyuser.uplooking.hosts * 这是uplooking用户访问的本机地址 hadoop.proxyuser.uplooking.groups root 代理uplooking设置的组用户 配置成功之后,需要同步到集群中的各个节点,要想让集群重新加载配置信息,至少hdfs需要重启
这样之后就可以正常使用beeline通过hive提供的jdbc接口来操作hive了:
beeline> !connect jdbc:hive2://uplooking01:10000/mydb1Connecting to jdbc:hive2://uplooking01:10000/mydb1Enter username for jdbc:hive2://uplooking01:10000/mydb1: uplookingEnter password for jdbc:hive2://uplooking01:10000/mydb1: *********SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/uplooking/app/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/uplooking/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Connected to: Apache Hive (version 2.1.0)Driver: Hive JDBC (version 2.1.0)18/03/23 08:00:15 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://uplooking01:10000/mydb1> show databases;+----------------+--+| database_name |+----------------+--+| default || mydb1 |+----------------+--+2 rows selected (2.164 seconds)0: jdbc:hive2://uplooking01:10000/mydb1> show tables;+-----------+--+| tab_name |+-----------+--+| t1 || t2 |+-----------+--+2 rows selected (0.118 seconds)0: jdbc:hive2://uplooking01:10000/mydb1> select * from t1;+------------+--+| t1.line |+------------+--+| hello you || hello he || hello me |+------------+--+3 rows selected (2.143 seconds)0: jdbc:hive2://uplooking01:10000/mydb1>
通过java代码连接hiveserver进行操作
程序代码如下:
package com.uplooking.bigdata.hive.jdbc;import java.sql.*;public class HiveJDBC { public static void main(String[] args) throws Exception { Class.forName("org.apache.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection("jdbc:hive2://uplooking01:10000/mydb1", "uplooking", "uplooking"); String sql = "select t.word,count(t.word) as count from (select explode(split(line, ' ')) as word from t1) t group by t.word"; PreparedStatement ps = conn.prepareStatement(sql); ResultSet rs = ps.executeQuery(); while (rs.next()) { String word = rs.getString("word"); int count = rs.getInt("count"); System.out.println(word + "\t" + count); } rs.close(); ps.close(); conn.close(); }}
程序执行结果如下:
18/03/23 00:48:16 INFO jdbc.Utils: Supplied authorities: uplooking01:1000018/03/23 00:48:16 INFO jdbc.Utils: Resolved authority: uplooking01:10000he 1hello 3me 1you 1
在这个过程中,注意观察hiveserver2终端的输出:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = uplooking_20180323084825_63044683-393d-4625-a3c3-b440109c3d70Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=Starting Job = job_1521765850571_0002, Tracking URL = http://uplooking02:8088/proxy/application_1521765850571_0002/Kill Command = /home/uplooking/app/hadoop/bin/hadoop job -kill job_1521765850571_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12018-03-23 08:48:33,427 Stage-1 map = 0%, reduce = 0%2018-03-23 08:48:40,864 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.54 sec2018-03-23 08:48:48,294 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.84 secMapReduce Total cumulative CPU time: 6 seconds 840 msecEnded Job = job_1521765850571_0002MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.84 sec HDFS Read: 8870 HDFS Write: 159 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 840 msecOK
Hive中文注释乱码解决
如果有乱码出现,可以尝试下面的解决方案:
hive中文注释乱码解决: 在hive的元数据库中,执行一下脚本 ALTER TABLE COLUMNS_V2 MODIFY COLUMN COMMENT VARCHAR(256) CHARACTER SET utf8; ALTER TABLE TABLE_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; ALTER TABLE PARTITION_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; ALTER TABLE PARTITION_KEYS MODIFY COLUMN PKEY_COMMENT VARCHAR(4000) CHARACTER SET utf8; ALTER TABLE INDEX_PARAMS MODIFY COLUMN PARAM_VALUE VARCHAR(4000) CHARACTER SET utf8; 同时将url,加上utf-8 &useUnicode=true&characterEncoding=UTF-8 javax.jdo.option.ConnectionURL jdbc:mysql://uplooking01:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8
Hive的maven依赖
UTF-8 2.1.0 2.6.4 1.2.1 junit junit 4.12 test org.apache.hadoop hadoop-common ${hadoop-api.version} org.apache.hadoop hadoop-mapreduce-client-core ${hadoop-api.version} org.apache.hadoop hadoop-core ${hadoop-core.version} org.apache.hive hive-exec ${hive-api.version} org.apache.hive hive-serde ${hive-api.version} org.apache.hive hive-service ${hive-api.version} org.apache.hive hive-metastore ${hive-api.version} org.apache.hive hive-common ${hive-api.version} org.apache.hive hive-cli ${hive-api.version} org.apache.hive hive-jdbc ${hive-api.version} org.apache.thrift libfb303 0.9.0
函数
用户
员工
服务
配置
接口
数据
输出
代码
数组
星座
程序
薪资
排序
不用
乱码
信息
名称
学工
日期
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
平顶山客户管理软件开发公司
数据库单元格只读怎么办
服务器主机能做家用吗
网络安全课题研究题
网络安全教育党员思想汇报
网络游戏数据库在哪里
泰州小型工控机服务器
腾讯云服务器怎么连接电视
小学生网络安全教育主题
华为 服务器 技术 支持
搭建公司服务器需要哪些
郑州消防网络安全中标
jsp访问数据库
食通天6餐饮数据库安装
重庆石柱农副配送软件开发
金融业网络安全人员认证
名源数据库
作家读者常用数据库
计算机等级 网络技术
数据库密码加密有什么方法
网络安全五百字日记
英国网络技术专业有哪些
江苏塔式服务器价格咨询
创新软件开发创造辉煌
反诈教育大会网络安全问答赛
自动化专业可以做软件开发吗
镇江智能化服务器厂商
暗黑二都有什么服务器
软件开发需要数据分析吗
电子商务活动提高网络安全