千家信息网

Spark SQL 使用beeline访问hive仓库

发表于:2024-10-06 作者:千家信息网编辑
千家信息网最后更新 2024年10月06日,一、添加hive-site.xml在$SPARK_HOME/conf下添加hive-site.xml的配置文件,目的是能正常访问hive的元数据vim hive-site.xml
千家信息网最后更新 2024年10月06日Spark SQL 使用beeline访问hive仓库

一、添加hive-site.xml

在$SPARK_HOME/conf下添加hive-site.xml的配置文件,目的是能正常访问hive的元数据

vim hive-site.xml
            javax.jdo.option.ConnectionURL            jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true                        javax.jdo.option.ConnectionDriverName            com.mysql.jdbc.Driver                javax.jdo.option.ConnectionUserName            root                    javax.jdo.option.ConnectionPassword            123456                            hive.cli.print.header        true                    hive.cli.print.current.db        true    

注意:在节点上不需要部署hive,只要是你可以连接到hive的元数据就可以!

二、启动thriftserver服务

[hadoop@hadoop003 spark]$ ./sbin/start-thriftserver.sh --jars ~/softwares/mysql-connector-java-5.1.47.jar starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out

检查日志,确认thriftserver服务正常启动

[hadoop@hadoop003 spark]$ tail -50f /home/hadoop/app/spark/logs/spark-hadoop-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-hadoop003.out19/05/21 09:39:14 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.19/05/21 09:39:15 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY19/05/21 09:39:15 INFO metastore.ObjectStore: Initialized ObjectStore19/05/21 09:39:15 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.019/05/21 09:39:15 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added admin role in metastore19/05/21 09:39:15 INFO metastore.HiveMetaStore: Added public role in metastore19/05/21 09:39:15 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_all_databases19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr  cmd=get_all_databases   19/05/21 09:39:15 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*19/05/21 09:39:15 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 19/05/21 09:39:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/73df82dd-1fd3-4dd5-97f1-680d53bd44bc_resources19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc19/05/21 09:39:16 INFO session.SessionState: Created local directory: /tmp/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc19/05/21 09:39:16 INFO session.SessionState: Created HDFS directory: /tmp/hive/hadoop/73df82dd-1fd3-4dd5-97f1-680d53bd44bc/_tmp_space.db19/05/21 09:39:16 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is file:/home/hadoop/app/spark-2.4.2-bin-hadoop-2.6.0-cdh6.7.0/spark-warehouse19/05/21 09:39:16 INFO session.SessionManager: Operation log root directory is created: /tmp/hadoop/operation_logs19/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread pool size: 10019/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread wait queue size: 10019/05/21 09:39:16 INFO session.SessionManager: HiveServer2: Background operation thread keepalive time: 10 seconds19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is inited.19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is inited.19/05/21 09:39:16 INFO service.AbstractService: Service: CLIService is inited.19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is inited.19/05/21 09:39:16 INFO service.AbstractService: Service: HiveServer2 is inited.19/05/21 09:39:16 INFO service.AbstractService: Service:OperationManager is started.19/05/21 09:39:16 INFO service.AbstractService: Service:SessionManager is started.19/05/21 09:39:16 INFO service.AbstractService: Service:CLIService is started.19/05/21 09:39:16 INFO metastore.ObjectStore: ObjectStore, initialize called19/05/21 09:39:16 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing19/05/21 09:39:16 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY19/05/21 09:39:16 INFO metastore.ObjectStore: Initialized ObjectStore19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: get_databases: default19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr  cmd=get_databases: default  19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Shutting down the object store...19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr  cmd=Shutting down the object store...   19/05/21 09:39:16 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.19/05/21 09:39:16 INFO HiveMetaStore.audit: ugi=hadoop  ip=unknown-ip-addr  cmd=Metastore shutdown complete.    19/05/21 09:39:16 INFO service.AbstractService: Service:ThriftBinaryCLIService is started.19/05/21 09:39:16 INFO service.AbstractService: Service:HiveServer2 is started.19/05/21 09:39:16 INFO thriftserver.HiveThriftServer2: HiveThriftServer2 started19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a7c72af{/sqlserver,null,AVAILABLE,@Spark}19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e24bab6{/sqlserver/json,null,AVAILABLE,@Spark}19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@391e2a{/sqlserver/session,null,AVAILABLE,@Spark}19/05/21 09:39:16 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@641cea11{/sqlserver/session/json,null,AVAILABLE,@Spark}19/05/21 09:39:16 INFO thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads#标志启动成功

三、启动beeline

[hadoop@hadoop003 spark]$ ./bin/beeline -u jdbc:hive2://localhost:10000 -n hadoopConnecting to jdbc:hive2://localhost:1000019/05/21 09:46:19 INFO jdbc.Utils: Supplied authorities: localhost:1000019/05/21 09:46:19 INFO jdbc.Utils: Resolved authority: localhost:1000019/05/21 09:46:19 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000Connected to: Spark SQL (version 2.4.2)Driver: Hive JDBC (version 1.2.1.spark2)Transaction isolation: TRANSACTION_REPEATABLE_READBeeline version 1.2.1.spark2 by Apache Hive0: jdbc:hive2://localhost:10000> select * from student.student limit 5;+---------+-----------+-----------------+--------------------------------------------+--+| stu_id  | stu_name  |  stu_phone_num  |                 stu_email                  |+---------+-----------+-----------------+--------------------------------------------+--+| 1       | Burke     | 1-300-746-8446  | ullamcorper.velit.in@ametnullaDonec.co.uk  || 2       | Kamal     | 1-668-571-5046  | pede.Suspendisse@interdumenim.edu          || 3       | Olga      | 1-956-311-1686  | Aenean.eget.metus@dictumcursusNunc.edu     || 4       | Belle     | 1-246-894-6340  | vitae.aliquet.nec@neque.co.uk              || 5       | Trevor    | 1-300-527-4967  | dapibus.id@acturpisegestas.net             |+---------+-----------+-----------------+--------------------------------------------+--+5 rows selected (3.275 seconds)0: jdbc:hive2://localhost:10000> 

启动成功

四、注意

1、最好在spark/bin目录下启动beeline
因为如果你启动sparkbeeline的机器还部署了hive,恰巧你的hive环境变量正好在spark环境变量之前,那么很可能启动的是hive的beeline
比如:

[hadoop@hadoop003 spark]$ beelinels: cannot access /home/hadoop/app/spark/lib/spark-assembly-*.jar: No such file or directorywhich: no hbase in (/home/hadoop/app/hive/bin:/home/hadoop/app/spark/bin:/home/hadoop/app/hadoop-2.6.0-cdh6.7.0//bin:/home/hadoop/app/hadoop-2.6.0-cdh6.7.0//sbin:/home/hadoop/app/zookeeper/bin:/usr/java/jdk1.8.0_131/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hadoop/bin)Beeline version 1.1.0-cdh6.7.0 by Apache Hive  # 这不就是hive么beeline> 

此时你查看下环境变量

[hadoop@hadoop003 spark]$ cat ~/.bash_profile # .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then    . ~/.bashrcfi# User specific environment and startup programsPATH=$PATH:$HOME/binexport PATH#####JAVA_HOME#####export JAVA_HOME=/usr/java/jdk1.8.0_131####ZOOKEEPER_HOME####export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper#####HADOOP_HOME######export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh6.7.0/export SPARK_HOME=/home/hadoop/app/spark#####HIVE_HOME#####export HIVE_HOME=/home/hadoop/app/hiveexport PATH=$HIVE_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$PATH

果然如果不指定beeline路径就会优先使用hive的beeline

0