千家信息网

Hive基础sql语法(DML)

发表于:2025-01-31 作者:千家信息网编辑
千家信息网最后更新 2025年01月31日,DML操作(Data Manipulation Language)参考官方文档: DML文档因update和delete在Hive中一般用不到,本篇文章不做讲解。本文主要介绍Load和insert操作
千家信息网最后更新 2025年01月31日Hive基础sql语法(DML)
DML操作(Data Manipulation Language)

参考官方文档: DML文档

  • 因update和delete在Hive中一般用不到,本篇文章不做讲解。本文主要介绍Load和insert操作。
1. LOAD(加载数据)

LOAD作用是加载文件到表中(Loading files into tables)

  • 下面是官网上为我们列出的语法:

    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
  • 1.加载数据到表中时,Hive不做任何转换。加载操作只是把数据拷贝或移动操作,即移动数据文件到Hive表相应的位置。

  • 2.加载的目标可以是一个表,也可以是一个分区。如果表是分区的,则必须通过指定所有分区列的值来指定一个表的分区。

  • 3.filepath可以是一个文件,也可以是一个目录。不管什么情况下,filepath被认为是一个文件集合。

LOCAL:表示输入文件在本地文件系统(Linux),如果没有加LOCAL,hive则会去HDFS上查找该文件。
OVERWRITE:表示如果表中有数据,则先删除数据,再插入新数据,如果没有这个关键词,则直接附加数据到表中。
PARTITION:如果表中存在分区,可以按照分区进行导入。

# 创建一张员工表hive> create table emp     > (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)    > ROW FORMAT DELIMITED     > FIELDS TERMINATED BY '\t' ;OKTime taken: 0.651 seconds# 把本地文件系统中emp.txt导入表hive> LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp; Loading data to table default.empTable default.emp stats: [numFiles=1, numRows=0, totalSize=886, rawDataSize=0]OKTime taken: 1.848 seconds# 查看表数据hive> select * from emp;OK7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10# 不用OVERWRITE关键字hive>  LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt'  INTO TABLE emp;# 再次查看hive> select * from emp;OK7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    107369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10Time taken: 0.137 seconds, Fetched: 28 row(s)# 再次OVERWRITE覆盖导入hive> LOAD DATA LOCAL INPATH '/home/hadoop/emp.txt' OVERWRITE INTO TABLE emp; # 发现数据被覆盖了hive> select * from emp;OK7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10Time taken: 0.164 seconds, Fetched: 14 row(s)
2. insert into到表(Inserting data into Hive Tables from queries)
  • 下面是官网上为我们列出的语法:
Standard syntax:INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;Hive extension (multiple inserts):FROM from_statementINSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2][INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;FROM from_statementINSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2][INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;Hive extension (dynamic partition inserts):INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

官网又给我们列出一大堆语法,看着就很可怕,但是仔细整理后再来看看你会发现并没有什么,下面对其进行分析:

  • 1 标准语法(Standard syntax):INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement; 其实就是一个简单的插入语句。
  • 2.可以使用PARTITION 关键字,进行分区插入。
  • 3.OVERWRITE是否选择覆盖。
  • 4 使用插入语法会跑mr作业。

  • 5 multiple inserts:代表多行插入。

  • 6 dynamic partition inserts:动态分区插入。

注:这里有两种插语法,也就是加上OVERWRITE关键字和不加的区别。

# insert overwritehive> insert overwrite table emp2 select * from emp;Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7eTotal jobs = 3Launching Job 1 out of 3...OKTime taken: 19.554 secondshive> select * from emp2;OK7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10Time taken: 0.143 seconds, Fetched: 14 row(s)# insert追加hive> insert into table emp2 select * from emp;Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7eTotal jobs = 3...OKTime taken: 18.539 secondshive> select * from emp2;OK7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    107369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    207499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   307521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   307566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    207654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    307782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    107788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    207839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    107844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     307876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    207900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    307902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    207934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10Time taken: 0.132 seconds, Fetched: 28 row(s)
  • Inserting values into tables(手动插入一条或多条记录,会跑mr作业 不常用)
  • 官方用法:
    INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
hive> create table stu(    > id int,    > name string    > )    >  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';OKTime taken: 0.253 secondshive> insert into table stu values (1,"zhangsan");Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7eTotal jobs = 3...OKTime taken: 16.589 secondshive> select * from stu;OK1       zhangsanTime taken: 0.123 seconds, Fetched: 1 row(s)
3. 数据导出(Writing data into the filesystem from queries)

查询结果可以通过语句插入文件系统中:

  • 下面是官网上为我们列出的语法:
    Standard syntax:(标准语法)INSERT OVERWRITE [LOCAL] DIRECTORY directory1[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)SELECT ... FROM ...

Hive extension (multiple inserts):(导出多条记录)
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...

row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

**LOCAL**:加上LOCAL关键字代表导入本地系统,不加默认导入HDFS; **STORED AS**:可以指定存储格式。```shellhive> insert overwrite local directory '/home/hadoop/stu' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from stu;Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97fTotal jobs = 1...OKTime taken: 15.09 seconds# 结果[hadoop@hadoop000 stu]$ pwd/home/hadoop/stu[hadoop@hadoop000 stu]$ cat 000000_0 1       zhangsan# 导出多条记录hive> from emp    > INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'    > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"    > select empno, ename      > INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'    > ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"    > select ename;Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97fTotal jobs = 1...OKTime taken: 16.261 seconds# 结果[hadoop@hadoop000 tmp]$ cd hivetmp1[hadoop@hadoop000 hivetmp1]$ lltotal 4-rw-r--r-- 1 hadoop hadoop 154 Jun 24 15:39 000000_0[hadoop@hadoop000 hivetmp1]$ cat 000000_0 7369    SMITH7499    ALLEN7521    WARD7566    JONES7654    MARTIN7698    BLAKE7782    CLARK7788    SCOTT7839    KING7844    TURNER7876    ADAMS7900    JAMES7902    FORD7934    MILLER[hadoop@hadoop000 hivetmp1]$ cd ..[hadoop@hadoop000 tmp]$ cd hivetmp2[hadoop@hadoop000 hivetmp2]$ cat 000000_0 SMITHALLENWARDJONESMARTINBLAKECLARKSCOTTKINGTURNERADAMSJAMESFORDMILLER

参考: https://blog.csdn.net/yu0_zhang0/article/details/79007784

0