导航：首页 > 互联网科技 >

hive的语法命令介绍

发表于：2025-02-06 作者：千家信息网编辑

千家信息网最后更新 2025年02月06日，1.hive的基本语法：create databases mydb #创建数据库show databases #查看所有的库use mydb

千家信息网最后更新 2025年02月06日hive的语法命令介绍

1.hive的基本语法：

create databases mydb #创建数据库show databases            #查看所有的库use mydb                      #切换数据库create table t_user(id int ,name string,age int)  #创建表create table t_user(id int ,name string,age int) row format delimited fields terminated by '分隔符'  #指定分隔符的建表语句insert into table t_user values(值1,值1,值1)     #插入数据select * from t_table       #查询语句load data inpath 'HDFS path' into table t_name  #在hdfs中导入数据load data local inpath 'linux path' into table t_name #导入Linux数据到hive

2.hive的DDL操作：

（1）对hive库的操作：

建库

create database if not exists myhive   #如果不存在则创建该数据库create database if not exists myhive2 localtion 'hdfs path'  #指定该库的位置

查看库：

show databases;                            #查看hive中所有的数据库desc databases dbname ;              #显示数据库的详细信息select current_database();             #查看正在使用的数据库show create database db_name ;  #查看建库语句

删除库：

drop databases db_name restrict;drop database if exists dbname;#注意：默认情况下，hive不允许删除包含表的库，有两种办法：1. 手动删除所有的表，然后在删除库2. 使用cascade 关键字：drop database myhive cascade ;

（2）对hive表的操作：

建表：
语法分析：

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)][CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format][STORED AS file_format][LOCATION hdfs_path][EXTERNAL] TABLE                # 表示创建的是内部表还是外部表[IF NOT EXISTS] table_name  # 防止报错[(col_name data_type [COMMENT col_comment], ...)]  #表的字段[COMMENT table_comment]   #表的描述信息[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  #指定分区表[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]   #指定分桶，排序规则，以及分桶个数[ROW FORMAT row_format]   #指定分隔符fields terminated by ''  #指定列分割符lines terminated by ''  #指定行分隔符[STORED AS file_format]   #指定数据存储格式[LOCATION hdfs_path]      #指定数据存储目录 （在创建外部表时使用）

建表举例：

#内部表create table if not exists student(id int ,name string) row format delimited fields terminated by ','      #外部表create external table if not exists student (id int ,name string) row format delimited fields terminated by ',' location '/hive/data/';#分区表create table  if not exists student (id int ,name string) partitioned by (age int conmment 'partitioned comment') row format delimited fields terminated by ',' ;   #分区字段的字段名称，不能是表中的任意一个字段#创建分桶表create table if not exists  student (id int ,name string,age int ) clustered by (age) sort by (age desc) into 10 buckets row format delimited fields terminated by ',' ;   #分桶字段一定要是表中的属性字段 #like 方式create table student like t_student ;   #复制一个表结构，分区表和分桶表也同样可以复制（分区表只能复制在创建表的时候的信息，之后添加的信息不能复制）#CTAScreate table student as select * from t_student #创建表并复制

修改表属性

alter table old_name rename to new_name ;   #修改表名alter table t_name set tb_properties (property_name=roperty_val)  #修改表的属性alter table t_name set serdeproperties('field.delim'='-');   #修改列的分隔符alter table t_name add cloumns(f_name type) #增加一个字段alter table t_name drop   #hive本身不支持alter table t_name replace columns(id int ,name string )  #替换所有的列alter table t_name change old_field_name new_field_name type [first|after field] #修改字段的名称、类型以及位置#接下来是对分区表操作：alter table t_name add partition(分区字段='value')  #添加一个分区alter table t_name add partition(分区字段='value') partition (分区字段='value')  #添加多个分区alter table t_name drop partition(分区字段='value')  #删除分区alter table t_name partition(分区字段='value') set location 'hdfs path' #修改分区路径alter table t_name partition(分区字段='value') enable no_drop ; #防止分区被删除alter table t_name partition(分区字段='value') enable offline #防止分区被查询

删除表：

drop tab;e if exists t_name ;   #删除表注意：1. 内部表删除：元数据和数据都删2. 内部表删除：元数据和数据都删3. 分区表（内部表）：所有的分区都会被删除，元数据和数据都删4. 分桶表的删除和普通表的删除没有任何区别truncate table t_name ;  #清空表的内容

对表的常见操作：

show tables ;  #查看库下的所有表show partitions t_name;  # 查看表的分区show partitions 表名 partition(分区字段='value')  #查看某个分区desc t_name ;   #查看表的详细信息desc extended 表名  #查看表的详细信息   desc formatted 表名  #查看表的详细信息

3.hive的DML操作：

（1）数据的装载：

load data local inpath 'linux path' into table t_name ;   #本地导入local data inpath 'hdfs path' into table t_name    #从hdfs中导入#注意：如果是内部表的话，在hdfs导入，那么原本的数据会被移动到相应的表的目录下load data local inpath 'linux path ' overwrite into table 表名；  #覆盖导入

（2）数据的插入：

注意：

insert into //表示追加操作insert overwrite   //表示覆盖插入操作

insert into table t_name(fields1,fields2,fields3) values(value1,value2,value3)  #插入一条数据insert into table t_name select * from tt_name;  #利用查询，将结果导入表中#分区表的多重插入insert into talbe student_ptn partition(department='SC') select id ,name,age ,sex from student where dept=' SC'; insert into talbe student_ptn partition(department='AC') select id ,name,age ,sex from student where dept=' AC';insert into talbe student_ptn partition(department='ES') select id ,name,age ,sex from student where dept=' ES';上面的方法是使用单个sql去查询表，但是这里每执行一个sql就需要对student表中的所有数据进行扫描，效率太低！转换：from student insert into table student_ptn partition(department='SC') select id ,name,age ,sex where dept=' SC'; insert into talbe student_ptn partition(department='AC') select id ,name,age ,sex  where dept=' AC';insert into talbe student_ptn partition(department='ES') select id ,name,age ,sex where dept=' ES';这种方式进行数据的处理，只需要扫描表一次，整个MR程序就是一个输入多个输出，如果指定的分区不存在，在执行这条语句时会自动创建。# 分桶表的数据插入，这里分桶表只能使用insert进行数据插入insert into table stu_bck select * from 表名  #和普通的插入一样**分桶的原则：分桶字段的hashcode值%分桶个数=  相同的值分在一组

动态分区插入和静态分区插入：
静态分区插入：要进行数据插入的数据的定义是手动指定的（分区在插入之前指定）
动态分区插入：用来解决静态分区插入的缺点。按照某个分区字段的值进行判断，每遇到一个不同的值，当前的程序自行进行判断来创建对应的分区
举例：

#静态分区插入：load data local inpath "路径" into table 表名 partition(dpt='')insert into talbe student_ptn partition(department='SC') select id ,name,age ,sex where dept=' SC';#动态分区插入：insert into table t_name partition(字段名) select * from tt_name #这里查询的表的最后一个字段需要是分区字段。#多分区动态插入：insert into table stu_ptn01 partition(sex,department) select id,name,age,sex,department from student_manager;  #只要查询字段的最后几个字段是分区字段即可，顺序不能颠倒

注意：如果想使用动态分区插入需要在hive中开启几个参数：

set hive.exec.dynamic.partiton=true;  #打开动态分区开关set hive.exec.dynamic.partition.mode=nonstrict ;  #关闭动态分区插入的不合法约束。

静态分区插入和动态分区插入的区别：
- 静态分区插入数据后，需要指定分区的名字，而动态分区不需要
- 静态分区中可能会存在某一个分区没有数据，分区的目录是一个空目录，动态分区的时候根据实际的数据生成分区，每一个分区至少有一条数据
- 3）动态分区的时候，每一个分区都会对应配置文件中设置的reducetask的个数，
set reducetask=3

（3）数据的导出：

#单重导出insert overwrite local directory 'linux path' select * from t_name;#多重导出from t_name insert overwrite local directory 'linux path' select * where ...insert overwrite local directory 'linux path' select * where...

4.hive的DQL操作：

查询语句的书写顺序：select fields ... from [join] where group by having order by limit
查询语句的执行顺序：from ----join ----group by ---having ---select ---ordey by -----limit

（1）hive中的join

特点：
- Hive中连接，只支持等值连接不支持不等值连接
- Hive中and连接，不支持or
- Hive支持多表关联，但是hive中进行关联的时候尽量避免笛卡尔积
- Hive支持in 和 exists 但是效率特别低
举例：

#内连接select a.id aid,a.name name,b.id bid,b.score score from test_a a inner join test_b b on a.id=b.id;  (交集)#左外链接：以join左侧的表为基础表  左侧的表的所有数据都会显示  右侧可以关联上的就会补全  关联不上 null补充select a.id aid,a.name name,b.id bid,b.score score from test_a a left join test_b b on a.id=b.id;#右外链接：以join右侧的表为基础select a.id aid,a.name name,b.id bid,b.score score from test_a a right join test_b b on a.id=b.id;#全外链接：取两个表的并集select a.id aid,a.name name,b.id bid,b.score score from test_a a full join test_b b on a.id=b.id;#半连接，相当于内连接  取左半表的数据，左表中在右表中出现关联上的数据select * from test_a a left semi join test_b b on a.id=b.id;

关于 left semi join 的特点：
left semi join 是对hive中的exists/in的一个更高级额的操作。
- left semi join 的限制是， JOIN 子句中右边的表只能在 ON 子句中设置过滤条件，在 WHERE 子句、SELECT 子句或其他地方过滤都不行。
- left semi join 是只传递表的 join key 给 map 阶段，因此left semi join 中最后 select 的结果只许出现左表。
- 因为 left semi join 是 in(keySet) 的关系，遇到右表重复记录，左表会跳过，而 join 则会一直遍历。这就导致右表有重复值得情况下 left semi join 只产生一条，join 会产生多条，也会导致 left semi join 的性能更高。

（2）hive中的排序

order by
特点：局排序
例：select * from 表名 order by 字段 desc; (降序排序)

sort by
特点：sort by 是一个局部排序，在每一个reduce中进行排序，当reduceTask个数为1个时，这时与全局排序相同
原理：sort by 在进行分每一个reduceTask中的数据时，时随机选择的字段进行分配
例：select * from 表名sort by 字段；

distribute by
特点：按照指定字段分桶，在每个桶中进行排序。
例1：select * from 表名 distribute by 字段（字段.hash%分桶个数）
例2：select * from 表名 distribute by 分桶字段 sort by 排序字段 #按照指定字段分桶，在每一个桶中进行排序

cluster by
特点：既分桶又排序
例：select * from 表名 cluster by 分桶排序字段
注意：当分桶字段和排序字段相同时：distribute by+ sort by= cluster by，否则distribute by+ sort by功能更强大一些！

（3）union和union all区别

union和union all：都是将查询结果进行拼接，（连接的两个表的结构必须相同）

select * from xxx  union selecet * from xxxselect * from xxx  union all selecet * from xxx

union：表示去重连接
union all ：表示不去重连接

很赞哦！

字段数据排序动态查询信息静态分区表数据库特点语句支持个数分隔符关联相同子句时候属性目录数据库的安全要保护哪些东西数据库安全各自的含义是什么生产安全数据库录入数据库的安全性及管理数据库安全策略包含哪些海淀数据库安全审计系统建立农村房屋安全信息数据库易用的数据库客户端支持安全管理连接数据库失败ssl安全错误数据库的锁怎样保障安全 tbc莫格莱尼服务器排队无法连接unity服务器数字媒体计算机网络技术办公室网络安全保障制度北京it软件开发师月薪一般多少黄河网络安全科技馆地址电话网络安全工程师演示谷歌网络安全月宣传月 bim软件开发前景文明重启开服务器收益服务器数据中心板块股票玉际国际网络技术有限公司泸州直销企业直销软件开发最好用的操作数据库的工具网络安全常见的协议激战2转服务器网络安全钥匙是什么怎么查看软件开发解释数据包是什么 svn 服务器管理工具华中科技大学网络安全专家数据库单例模式服务器访问量要多大合适不动产登记数据库标准怎么创建数据库链路济南公安网络安全宣传守卫网络安全建设网络强国弈城围棋台湾服务器互联网科技网站网络安全手抄报文字知识内容 kz服务器

千家信息网