HIVE如何统计各个区域下最热门的TOP3的商品
小编给大家分享一下HIVE如何统计各个区域下最热门的TOP3的商品,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!
环境:
hadoop-2.6.0-cdh6.7.0
hive-1.1.0-cdh6.7.0
sqoop-1.4.6-cdh6.7.0
MySQL5.6.39
需求:HIVE统计各个区域下最热门的TOP3的商品,将统计结果导出到MySQL中
MySQL中有:city_info 城市信息表和product_info 商品信息表
HIVE中有: user_click 用户行为日志,按date分区
一、MySQL数据库建库建表及初始化数据
1.1 MySQL创建 ruozedata数据库:
CREATE DATABASE ruozedata;
1.2 创建city_info表
DROP TABLE if exists city_info;
CREATE TABLE `city_info` (
`city_id` int(11) DEFAULT NULL,
`city_name` varchar(255) DEFAULT NULL,
`area` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
初始化city_info表数据
insert into `city_info`(`city_id`,`city_name`,`area`) values (1,'BEIJING','NC'),(2,'SHANGHAI','EC'),(3,'NANJING','EC'),(4,'GUANGZHOU','SC'),(5,'SANYA','SC'),(6,'WUHAN','CC'),(7,'CHANGSHA','CC'),(8,'XIAN','NW'),(9,'CHENGDU','SW'),(10,'HAERBIN','NE');
1.3 创建product_info表
DROP TABLE if exists product_info;
CREATE TABLE `product_info` (
`product_id` int(11) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`extend_info` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
初始化product_info表数据
insert into product_info(product_id,product_name,extend_info) values (1,'product1','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (2,'product2','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (3,'product3','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (4,'product4','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (5,'product5','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (6,'product6','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (7,'product7','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (8,'product8','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (9,'product9','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (10,'product10','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (11,'product11','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (12,'product12','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (13,'product13','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (14,'product14','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (15,'product15','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (16,'product16','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (17,'product17','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (18,'product18','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (19,'product19','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (20,'product20','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (21,'product21','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (22,'product22','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (23,'product23','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (24,'product24','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (25,'product25','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (26,'product26','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (27,'product27','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (28,'product28','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (29,'product29','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (30,'product30','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (31,'product31','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (32,'product32','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (33,'product33','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (34,'product34','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (35,'product35','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (36,'product36','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (37,'product37','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (38,'product38','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (39,'product39','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (40,'product40','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (41,'product41','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (42,'product42','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (43,'product43','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (44,'product44','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (45,'product45','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (46,'product46','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (47,'product47','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (48,'product48','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (49,'product49','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (50,'product50','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (51,'product51','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (52,'product52','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (53,'product53','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (54,'product54','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (55,'product55','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (56,'product56','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (57,'product57','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (58,'product58','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (59,'product59','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (60,'product60','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (61,'product61','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (62,'product62','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (63,'product63','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (64,'product64','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (65,'product65','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (66,'product66','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (67,'product67','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (68,'product68','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (69,'product69','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (70,'product70','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (71,'product71','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (72,'product72','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (73,'product73','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (74,'product74','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (75,'product75','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (76,'product76','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (77,'product77','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (78,'product78','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (79,'product79','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (80,'product80','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (81,'product81','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (82,'product82','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (83,'product83','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (84,'product84','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (85,'product85','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (86,'product86','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (87,'product87','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (88,'product88','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (89,'product89','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (90,'product90','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (91,'product91','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (92,'product92','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (93,'product93','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (94,'product94','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (95,'product95','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (96,'product96','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (97,'product97','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (98,'product98','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (99,'product99','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (100,'product100','{"product_status":1}');
1.4 创建MySQL数据库产品统计表product_stat
drop table if exists `product_stat`;
CREATE TABLE `product_stat`(
`product_id` int(11) DEFAULT NULL
,`product_name` varchar(255) DEFAULT NULL
,`area` varchar(255) DEFAULT NULL
,`click_count` int(11)
,`rank` int
,`days` varchar(20)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;
二、使用sqoop将MySQL的city_info表和product_info表的数据导入到hive(包括创建表)
2.1 操作之前先看看hive里面的表
hive> show databases;
OK
default
hive2
hive2_ruozedata
ruozedata
sqoophive
Time taken: 0.06 seconds, Fetched: 5 row(s)
hive> use ruozedata;
OK
Time taken: 0.069 seconds
hive> show tables;
OK
emp_hive
gw_test
hive_wc
rating_json
ruozedata_emp
ruozedata_emp_managed
ruozedata_emp_partition
ruozedata_person
Time taken: 0.061 seconds, Fetched: 8 row(s)
hive>
2.2 导入city_info表(从MySQL导入数据到hive的ruozedata.city_info表):
sqoop import \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table city_info -m 1 \
--mapreduce-job-name FromMySQLToHive \
--delete-target-dir \
--create-hive-table \
--hive-table ruozedata.city_info \
--hive-import --hive-overwrite;
2.3 导入product_info表(从MySQL导入数据到hive的ruozedata.product_info表):
sqoop import \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table product_info -m 1 \
--mapreduce-job-name FromMySQLToHive \
--delete-target-dir \
--create-hive-table \
--hive-table ruozedata.product_info \
--hive-import --hive-overwrite;
2.4 操作完之后再看看hive里面的表:
hive> show tables;
OK
city_info
emp_hive
gw_test
hive_wc
product_info
rating_json
ruozedata_emp
ruozedata_emp_managed
ruozedata_emp_partition
ruozedata_person
Time taken: 0.061 seconds, Fetched: 10 row(s)
hive> select * from city_info;
OK
1 BEIJING NC
2 SHANGHAI EC
3 NANJING EC
4 GUANGZHOU SC
5 SANYA SC
6 WUHAN CC
7 CHANGSHA CC
8 XIAN NW
9 CHENGDU SW
10 HAERBIN NE
Time taken: 0.554 seconds, Fetched: 10 row(s)
hive> select * from product_info;
OK
1 product1 {"product_status":1}
2 product2 {"product_status":1}
3 product3 {"product_status":1}
4 product4 {"product_status":1}
5 product5 {"product_status":1}
6 product6 {"product_status":1}
7 product7 {"product_status":1}
8 product8 {"product_status":1}
9 product9 {"product_status":0}
10 product10 {"product_status":1}
11 product11 {"product_status":0}
12 product12 {"product_status":0}
13 product13 {"product_status":0}
14 product14 {"product_status":0}
15 product15 {"product_status":1}
16 product16 {"product_status":0}
17 product17 {"product_status":1}
18 product18 {"product_status":0}
19 product19 {"product_status":1}
20 product20 {"product_status":1}
21 product21 {"product_status":0}
22 product22 {"product_status":0}
23 product23 {"product_status":0}
24 product24 {"product_status":0}
25 product25 {"product_status":1}
26 product26 {"product_status":1}
27 product27 {"product_status":0}
28 product28 {"product_status":1}
29 product29 {"product_status":0}
30 product30 {"product_status":0}
31 product31 {"product_status":0}
32 product32 {"product_status":0}
33 product33 {"product_status":1}
34 product34 {"product_status":1}
35 product35 {"product_status":0}
36 product36 {"product_status":0}
37 product37 {"product_status":1}
38 product38 {"product_status":0}
39 product39 {"product_status":0}
40 product40 {"product_status":1}
41 product41 {"product_status":1}
42 product42 {"product_status":1}
43 product43 {"product_status":1}
44 product44 {"product_status":0}
45 product45 {"product_status":1}
46 product46 {"product_status":1}
47 product47 {"product_status":0}
48 product48 {"product_status":1}
49 product49 {"product_status":1}
50 product50 {"product_status":1}
51 product51 {"product_status":1}
52 product52 {"product_status":0}
53 product53 {"product_status":0}
54 product54 {"product_status":1}
55 product55 {"product_status":0}
56 product56 {"product_status":0}
57 product57 {"product_status":1}
58 product58 {"product_status":1}
59 product59 {"product_status":1}
60 product60 {"product_status":1}
61 product61 {"product_status":0}
62 product62 {"product_status":1}
63 product63 {"product_status":1}
64 product64 {"product_status":0}
65 product65 {"product_status":0}
66 product66 {"product_status":1}
67 product67 {"product_status":1}
68 product68 {"product_status":0}
69 product69 {"product_status":1}
70 product70 {"product_status":0}
71 product71 {"product_status":0}
72 product72 {"product_status":0}
73 product73 {"product_status":1}
74 product74 {"product_status":0}
75 product75 {"product_status":1}
76 product76 {"product_status":0}
77 product77 {"product_status":0}
78 product78 {"product_status":1}
79 product79 {"product_status":0}
80 product80 {"product_status":0}
81 product81 {"product_status":0}
82 product82 {"product_status":1}
83 product83 {"product_status":1}
84 product84 {"product_status":1}
85 product85 {"product_status":0}
86 product86 {"product_status":1}
87 product87 {"product_status":1}
88 product88 {"product_status":1}
89 product89 {"product_status":1}
90 product90 {"product_status":1}
91 product91 {"product_status":1}
92 product92 {"product_status":0}
93 product93 {"product_status":0}
94 product94 {"product_status":1}
95 product95 {"product_status":0}
96 product96 {"product_status":0}
97 product97 {"product_status":1}
98 product98 {"product_status":1}
99 product99 {"product_status":0}
100 product100 {"product_status":1}
Time taken: 0.141 seconds, Fetched: 100 row(s)
hive>
sqoop成功的将MySQL的city_info表和product_info表的数据导入到hive(包括创建表)
三、在hive里面对表和数据进行操作,最后在进行统计。
3.1 创建用户行为日志表user_click分区表
create table user_click
(
user_id int
,session_id string
,action_time string
,city_id int
,product_id int
)
partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
;
3.2 上传user_click.txt数据文件,并导入数据到表。
上传user_click.txt数据文件到/home/hadoop/data 目录。
[hadoop@hadoop002 data]$ ll
total 62912
-rw-rw-r--. 1 hadoop hadoop 652 Jun 4 16:34 emp.txt
-rw-r--r--. 1 hadoop hadoop 84 Jun 7 09:53 hive_row_number.txt
-rw-rw-r--. 1 hadoop hadoop 34 Jun 11 15:17 hive_wc.txt
-rw-r--r--. 1 hadoop hadoop 63602280 Jun 7 09:54 rating.json
-rw-rw-r--. 1 hadoop hadoop 67 Jun 6 18:30 student.txt
-rwxrwxrwx. 1 hadoop hadoop 725264 Jun 9 21:28 user_click.txt
[hadoop@hadoop002 data]$ pwd
/home/hadoop/data
导入数据到user_click表:
LOAD DATA LOCAL INPATH '/home/hadoop/data/user_click.txt' OVERWRITE INTO TABLE user_click PARTITION(date='2018-06-20');
3.3 在hive创建产品统计分区表product_stat,用于写入统计结果
create table product_stat
(
product_id int
,product_name string
,area string
,click_count int
,rank int
,days string
)
partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
;
3.4 统计各个区域下最热门的TOP3的商品,并写入到统计表product_stat
insert overwrite table product_stat partition(date='2018-06-20')
select
t.product_id
,t.product_name
,t.area
,t.click_count
,t.rank
,'2018-06-20' as days
from
(
select
ci.area as area
,uc.product_id as product_id
,pd.product_name as product_name
,count(uc.product_id) as click_count
,(row_number() over(partition by ci.area order by count(uc.product_id) desc)) as rank
from city_info ci
left join user_click uc on uc.city_id = ci.city_id and uc.date='2018-06-20'
left join product_info pd on pd.product_id = uc.product_id
group by ci.area,uc.product_id,pd.product_name
)t where t.rank<=3
;
查询统计结果
hive> select * from product_stat where date='2018-06-20';
OK
7 product7 CC 39 1 2018-06-20 2018-06-20
26 product26 CC 39 2 2018-06-20 2018-06-20
70 product70 CC 38 3 2018-06-20 2018-06-20
4 product4 EC 40 1 2018-06-20 2018-06-20
96 product96 EC 32 2 2018-06-20 2018-06-20
99 product99 EC 31 3 2018-06-20 2018-06-20
9 product9 NC 16 1 2018-06-20 2018-06-20
40 product40 NC 16 2 2018-06-20 2018-06-20
94 product94 NC 13 3 2018-06-20 2018-06-20
NULL NULL NE 0 1 2018-06-20 2018-06-20
67 product67 NW 20 1 2018-06-20 2018-06-20
56 product56 NW 20 2 2018-06-20 2018-06-20
48 product48 NW 19 3 2018-06-20 2018-06-20
38 product38 SC 35 1 2018-06-20 2018-06-20
88 product88 SC 34 2 2018-06-20 2018-06-20
33 product33 SC 34 3 2018-06-20 2018-06-20
16 product16 SW 20 1 2018-06-20 2018-06-20
95 product95 SW 19 2 2018-06-20 2018-06-20
60 product60 SW 19 3 2018-06-20 2018-06-20
Time taken: 0.345 seconds, Fetched: 19 row(s)
hive>
四、使用sqoop将hive的统计表product_stat数据导出到MySQL的统计表product_stat中:
./sqoop export \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table product_stat \
--export-dir /ruozedata_03/product_stat/date=2018-06-20 \
--input-fields-terminated-by '\t' \
--input-null-string '' --input-null-non-string 0 \
--columns "product_id,product_name,area,click_count,rank,days" \
--update-key product_id,area --update-mode allowinsert \
;
去MySQL里面查看统计数据是否有成功导出:
mysql> select * from product_stat where days='2018-06-20' order by area,rank;
+------------+--------------+------+-------------+------+------------+
| product_id | product_name | area | click_count | rank | days |
+------------+--------------+------+-------------+------+------------+
| 7 | product7 | CC | 39 | 1 | 2018-06-20 |
| 26 | product26 | CC | 39 | 2 | 2018-06-20 |
| 70 | product70 | CC | 38 | 3 | 2018-06-20 |
| 4 | product4 | EC | 40 | 1 | 2018-06-20 |
| 96 | product96 | EC | 32 | 2 | 2018-06-20 |
| 99 | product99 | EC | 31 | 3 | 2018-06-20 |
| 9 | product9 | NC | 16 | 1 | 2018-06-20 |
| 40 | product40 | NC | 16 | 2 | 2018-06-20 |
| 94 | product94 | NC | 13 | 3 | 2018-06-20 |
| 67 | product67 | NW | 20 | 1 | 2018-06-20 |
| 56 | product56 | NW | 20 | 2 | 2018-06-20 |
| 48 | product48 | NW | 19 | 3 | 2018-06-20 |
| 38 | product38 | SC | 35 | 1 | 2018-06-20 |
| 88 | product88 | SC | 34 | 2 | 2018-06-20 |
| 33 | product33 | SC | 34 | 3 | 2018-06-20 |
| 16 | product16 | SW | 20 | 1 | 2018-06-20 |
| 95 | product95 | SW | 19 | 2 | 2018-06-20 |
| 60 | product60 | SW | 19 | 3 | 2018-06-20 |
+------------+--------------+------+-------------+------+------------+
18 rows in set (0.00 sec)
看完了这篇文章,相信你对"HIVE如何统计各个区域下最热门的TOP3的商品"有了一定的了解,如果想了解更多相关知识,欢迎关注行业资讯频道,感谢各位的阅读!