千家信息网

hive 与 hbase 结合

发表于:2025-01-31 作者:千家信息网编辑
千家信息网最后更新 2025年01月31日,一、hive与hbase的结合Hive会经常和Hbase结合使用,把Hbase作为Hive的存储路径,所以Hive整合Hbase尤其重要。使用Hive读取Hbase中的数据,可以使用HQL语句在HBa
千家信息网最后更新 2025年01月31日hive 与 hbase 结合

一、hive与hbase的结合
Hive会经常和Hbase结合使用,把Hbase作为Hive的存储路径,所以Hive整合Hbase尤其重要。使用Hive读取Hbase中的数据,可以使用HQL语句在HBase表上进行查询、插入操作;甚至是进行Join和Union等复杂查询。此功能是从Hive 0.6.0开始引入的。Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive-hbase-handler-*.jar工具里面的类实现的。使用Hive操作HBase中的表,只是提供了便捷性,hiveQL引擎使用的是MapReduce,对于性能上,表现不尽人意。

步骤:
1、将hbase相关jar包复制到hive/lib下,操作如下:

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-protocol-1.4.5.jar /opt/hive/lib/[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-server-1.4.5.jar /opt/hive/lib/[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-client-1.4.5.jar /opt/hive/lib/[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5.jar /opt/hive/lib/[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5-tests.jar /opt/hive/lib/[hadoop@bus-stable hive]$ 

2、在hive-site.xml文件中引用hbase,添加如下内容:

[hadoop@bus-stable hive]$ vim /opt/hive/conf/hive-site.xml       hive.aux.jars.path            file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar,        file:///opt/hive/lib/hbase-protocol-1.4.5.jar,        file:///opt/hive/lib/hbase-server-1.4.5.jar,        file:///opt/hive/lib/hbase-client-1.4.5.jar,        file:///opt/hive/lib/hbase-common-1.4.5.jar,        file:///opt/hive/lib/hbase-common-1.4.5-tests.jar,        file:///opt/hive/lib/zookeeper-3.4.6.jar,        file:///opt/hive/lib/guava-14.0.1.jar        The location of the plugin jars that contain implementations of user defined functions and serdes.            hbase.zookeeper.quorum        open-stable,permission-stable,sp-stable            dfs.permissions.enabled        false    

3、启动hive:

[hadoop@bus-stable hive]$ hive -hiveconf hbase.master=oversea-stable:60000          SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.hive> create table htest(key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties('hbase.columns.mapping'=':key,f:value') tblproperties('hbase.table.name'='htest');OKTime taken: 9.376 secondshive> show databases;OKdefaultinspiryTime taken: 0.121 seconds, Fetched: 2 row(s)hive> show tables;OKhtestTime taken: 0.047 seconds, Fetched: 1 row(s)hive> select * from htest; OKTime taken: 1.967 secondshive> 

4、在hbase中验证数据:

[hadoop@oversea-stable opt]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]HBase ShellUse "help" to get list of supported commands.Use "exit" to quit this interactive shell.Version 1.4.5, rca99a9466415dc4cfc095df33efb45cb82fe5480, Wed Jun 13 15:13:00 EDT 2018hbase(main):001:0> listTABLE                                                                                                                                                     htest                                                                                                                                                     1 row(s) in 0.2970 seconds=> ["htest"]hbase(main):002:0> scan "htest"ROW                                     COLUMN+CELL                                                                                                       0 row(s) in 0.1410 secondshbase(main):003:0> 

二、导入外部数据
(1) 数据文件如下:
[hadoop@bus-stable ~]$ cat score.csv
hive,85
hbase,90
hadoop,92
flume,89
kafka,95
spark,80
storm,70
[hadoop@bus-stable ~]$ hadoop fs -put score.csv /data/score.csv
[hadoop@bus-stable ~]$ hadoop fs -ls /data/
Found 2 items
-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 /data/notepad.txt
-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 /data/score.csv
[hadoop@bus-stable ~]$

(2) 创建外部表
利用hdfs上的现有数据,创建hive外部表
hive> create external table if not exists course.testcourse(cname string,score int) row format delimited fields terminated by ',' stored as textfile location '/data';
OK
Time taken: 0.282 seconds
hive> show databases;
OK
course
default
inspiry
Time taken: 0.013 seconds, Fetched: 3 row(s)
hive> use course;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testcourse
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive> select * from testcourse ;
OK
hive 85
hbase 90
hadoop 92
flume 89
kafka 95
spark 80
storm 70
Time taken: 2.272 seconds, Fetched: 7 row(s)
hive>

三、利用HQL语句创建hbase 表
使用HQL语句创建一个指向HBase的Hive表,语法如下:

CREATE TABLE tbl_name(key int, value string)  //Hive中的表名tbl_nameSTORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'   //指定存储处理器WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  //声明列族,列名TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog");  //hbase.table.name 声明HBase表名, 为可选属性默认与Hive的表名相同, hbase.mapred.output.outputtable 指定插入数据时写入的表, 如果以后需要往该表插入数据就需要指定该值

(1) 创建语句如下

hive> create table course.hbase_testcourse(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score")TBLPROPERTIES("hbase.table.name" = "hbase_testcourse","hbase.mapred.output.outputtable" = "hbase_testcourse"); OKTime taken: 3.745 secondshive> show databases;OKcoursedefaultinspiryTime taken: 0.019 seconds, Fetched: 3 row(s)hive> use course;OKTime taken: 0.02 secondshive> show tables;OKhbase_testcoursetestcourseTime taken: 0.025 seconds, Fetched: 2 row(s)hive> select * from hbase_testcourse;OKTime taken: 1.883 secondshive> 

(2) 创建完内部表,可以通过Hive支持的insert overwrite 方式将一个表的数据导入 HBase

hive> insert overwrite table course.hbase_testcourse select cname,score from course.testcourse;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cbTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1529932626564_0002Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02018-06-26 17:06:02,793 Stage-3 map = 0%,  reduce = 0%2018-06-26 17:06:14,126 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.12 secMapReduce Total cumulative CPU time: 6 seconds 120 msecEnded Job = job_1529932626564_0002MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 6.12 sec   HDFS Read: 4224 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 120 msecOKTime taken: 41.489 secondshive> hive> select * from hbase_testcourse;OKflume   89hadoop  92hbase   90hive    85kafka   95spark   80storm   70Time taken: 0.201 seconds, Fetched: 7 row(s)hive> 

(3) 验证hbase

hbase(main):011:0> listTABLE                                                                                                                                                     hbase_testcourse                                                                                                                                          htest                                                                                                                                                     2 row(s) in 0.0110 seconds=> ["hbase_testcourse", "htest"]hbase(main):012:0> scan "hbase_testcourse"ROW                                     COLUMN+CELL                                                                                                        flume                                  column=cf:score, timestamp=1530003973026, value=89                                                                 hadoop                                 column=cf:score, timestamp=1530003973026, value=92                                                                 hbase                                  column=cf:score, timestamp=1530003973026, value=90                                                                 hive                                   column=cf:score, timestamp=1530003973026, value=85                                                                 kafka                                  column=cf:score, timestamp=1530003973026, value=95                                                                 spark                                  column=cf:score, timestamp=1530003973026, value=80                                                                 storm                                  column=cf:score, timestamp=1530003973026, value=70                                                                7 row(s) in 0.0760 secondshbase(main):013:0> 

四、使用Hive映射HBase中已经存在的表
(1) 在hbase中创建HBase表,进入HBase Shell客户端执行建表命令

hbase(main):036:0> create 'hbase_test',{ NAME => 'cf'}0 row(s) in 2.2830 seconds=> Hbase::Table - hbase_test

(2) 插入数据

hbase(main):037:0> put 'hbase_test','hadoop','cf:score', '95'0 row(s) in 0.1110 secondshbase(main):038:0> put 'hbase_test','storm','cf:score', '96'0 row(s) in 0.0120 secondshbase(main):039:0> put 'hbase_test','spark','cf:score', '97'0 row(s) in 0.0110 seconds

(3) 查看数据

hbase(main):041:0> scan "hbase_test"ROW                                     COLUMN+CELL                                                                                                        hadoop                                 column=cf:score, timestamp=1530004351399, value=95                                                                 spark                                  column=cf:score, timestamp=1530004365368, value=97                                                                 storm                                  column=cf:score, timestamp=1530004359169, value=96                                                                3 row(s) in 0.0220 secondshbase(main):042:0> 

(4) 进入Hive Shell 客户端,创建外部表course.hbase_test,建表命令如下所示

hive>  create external table course.hbase_test(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score") TBLPROPERTIES("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test");OKTime taken: 0.221 secondshive> show tables;OKhbase_testhbase_testcoursetestcourseTime taken: 0.024 seconds, Fetched: 3 row(s)备注:创建外部表和创建内部表的命令基本一致,唯一的区别就是:创建内部表使用create table,创建外部表使用create external table。Hive 查看数据hive> select * from hbase_test;OKhadoop  95spark   97storm   96Time taken: 0.22 seconds, Fetched: 3 row(s)hive> 

该Hive表一个外部表,所以删除该表并不会删除HBase表中的数据,有几点需要注意的是:
a)、建表或映射表的时候如果没有指定:key则第一个列默认就是行键
b)、HBase对应的Hive表中没有时间戳概念,默认返回的就是最新版本的值
c)、由于HBase中没有数据类型信息,所以在存储数据的时候都转化为String类型

五、使用java连接hive操作hbase
pom.xml

    4.0.0    cn.itcast.hbase    hbase    0.0.1-SNAPSHOT                                org.apache.hadoop            hadoop-client            2.6.4                            org.apache.hadoop            hadoop-common            2.6.4                                    junit            junit            4.12                                    org.apache.hbase            hbase-client            1.4.0                                    org.apache.hbase            hbase-server            1.4.0                                    org.apache.hive            hive-jdbc            1.2.1                                    org.apache.hive            hive-metastore            1.2.1                                    org.apache.hive            hive-exec            1.2.1            

Hive_Hbase.java

package cn.itcast.bigdata.hbase;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.SQLException;import java.sql.Statement;public class Hive_Hbase {    public static void main(String[] args) {        try {            Class.forName("org.apache.hive.jdbc.HiveDriver");            Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/shizhan02","hadoop","");            Statement statement = connection.createStatement();            String sql = "SELECT * FROM hive_hbase_table_kv";            ResultSet res = statement.executeQuery(sql);            while (res.next()) {                System.out.println(res.getString(2));            }        } catch (ClassNotFoundException | SQLException e) {            // TODO Auto-generated catch block            e.printStackTrace();        }    }}
0