千家信息网

Sqoop1.x的安装配置方法

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,这篇文章主要介绍"Sqoop1.x的安装配置方法",在日常操作中,相信很多人在Sqoop1.x的安装配置方法问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"Sqoop1.
千家信息网最后更新 2025年01月23日Sqoop1.x的安装配置方法

这篇文章主要介绍"Sqoop1.x的安装配置方法",在日常操作中,相信很多人在Sqoop1.x的安装配置方法问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答"Sqoop1.x的安装配置方法"的疑惑有所帮助!接下来,请跟着小编一起来学习吧!

一、安装hadoop

hadoop:

sqoop2.x:

http://my.oschina.net/u/204498/blog/518941

二、安装sqoop1.x

1.选择对应的版本

[hadoop@hftclclw0001 ~]$ pwd/home/hadoop[hadoop@hftclclw0001 ~]$ wget  [hadoop@hftclclw0001 ~]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz[hadoop@hftclclw0001 ~]$ cd sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf[hadoop@hftclclw0001 conf]$ ls -altotal 44drwx------ 2 hadoop root 4096 Nov 25 04:32 .drwx------ 9 hadoop root 4096 Nov 25 04:20 ..-rw------- 1 hadoop root  818 Apr 27  2015 .gitignore-rw------- 1 hadoop root 3895 Apr 27  2015 oraoop-site-template.xml-rw------- 1 hadoop root 1404 Apr 27  2015 sqoop-env-template.cmd-rwx------ 1 hadoop root 1345 Apr 27  2015 sqoop-env-template.sh-rw------- 1 hadoop root 5531 Apr 27  2015 sqoop-site-template.xml-rw------- 1 hadoop root 5531 Apr 27  2015 sqoop-site.xml[hadoop@hftclclw0001 conf]$ cp  sqoop-env-template.sh  sqoop-env.sh[hadoop@hftclclw0001 conf]$ vim sqoop-env.shexport HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1#Set path to where hadoop-*-core.jar is availableexport HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce#set the path to where bin/hbase is available           => 可以不用,当使用到HBASE时再配置#export HBASE_HOME=/home/hadoop/hbase-1.0.1.1#Set the path to where bin/hive is available            => 可以不使用,当使用的HIVE时再配置#export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin#Set the path for where zookeper config dir is#export ZOOCFGDIR=export JAVA_HOME=/usr/java/latest                       => 要安装JDK,之前安装的JRE,使用时会有问题

2.添加对应的jdbc 驱动,我使用的是mysql

[hadoop@hftclclw0001 lib]$ pwd/home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib[hadoop@hftclclw0001 lib]$ ls -al | grep mysql-rw------- 1 hadoop root  848401 Nov  3 06:41 mysql-connector-java-5.1.25-bin.jar

三、Sqoop 1.x 语法

1.安装mysql(配置相应的repo)

[root@hftclclw0001 opt] yum install mysql-server mysql mysql-client

2.启动并测试,并给root用户添加密码

[root@hftclclw0001 opt] service mysqld start[root@hftclclw0001 opt]# netstat -apn|grep 3306tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      24540/mysqld [root@hftclclw0001 opt]# mysql -u root -pEnter password: mysql>

3.准备测试数据

我参考的是Apache Sqoop Cookbook 使用的mysql的

https://github.com/jarcec/Apache-Sqoop-Cookbook

使用github上面的mysql文件,创建sqoop用户,创建sqoop数据库,并新增对应的tables。并给sqoop用户赋予相应的权限mysql> show databases;+--------------------+| Database           |+--------------------+| information_schema || mysql              || performance_schema || sqoop              |+--------------------+4 rows in set (0.00 sec)mysql> use sqoopmysql> show tables;+-----------------+| Tables_in_sqoop |+-----------------+| cities          || countries       || normcities      || staging_cities  || visits          |+-----------------+5 rows in set (0.00 sec)

chapter2 importing data

sqoop list:[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop-list-tables --connect jdbc:mysql://{ip}:{por}/sqoop \> --username sqoop \> --password sqoop......citiescountriesnormcitiesstaging_citiesvisits                        => 这些tables就是之前mysql中的新建的
sqoop import:全表导入(transferring an entire table)[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop import \> --connect jdbc:mysql://{ip}:{port}/sqoop \> --username sqoop \> --password sqoop \> --table cities...                            => 会调用MR,读取mysql,并写入文件中(默认理解是当前用户下,table名称对应的木)总共三条记录,生成了三个文件[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -ls /user/hadoop/cities-rw-r--r--   3 hadoop supergroup          0 2015-11-25 05:29 /user/hadoop/cities/_SUCCESS-rw-r--r--   3 hadoop supergroup         16 2015-11-25 05:29 /user/hadoop/cities/part-m-00000-rw-r--r--   3 hadoop supergroup         22 2015-11-25 05:29 /user/hadoop/cities/part-m-00001-rw-r--r--   3 hadoop supergroup         16 2015-11-25 05:29 /user/hadoop/cities/part-m-00002[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -cat cities/part-m-000001,USA,Palo Alto
sqoop import:指定路径(specifying a target directory)--target-dir   指定路径不能存在(针对单表使用的)[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop import \> --connect jdbc:mysql://{ip}:{port}/sqoop \> --username sqoop \> --password sqoop \> --table cities \> --target-dir /tmp/cities[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -ls /tmp/cities-rw-r--r--   3 hadoop supergroup          0 2015-11-25 05:29 /user/hadoop/cities/_SUCCESS-rw-r--r--   3 hadoop supergroup         16 2015-11-25 05:29 /user/hadoop/cities/part-m-00000-rw-r--r--   3 hadoop supergroup         22 2015-11-25 05:29 /user/hadoop/cities/part-m-00001-rw-r--r--   3 hadoop supergroup         16 2015-11-25 05:29 /user/hadoop/cities/part-m-00002[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -cat cities/part-m-000001,USA,Palo Alto当多表导入是,可以使用--warehouse-dir   会再指定目录下,再生成以table表名称的目录
sqoop import:带where条件的sql,即子集 (importing only a subset of data)mysql> select * from sqoop.cities;+----+----------------+-----------+| id | country        | city      |+----+----------------+-----------+|  1 | USA            | Palo Alto ||  2 | Czech Republic | Brno      ||  3 | USA            | Sunnyvale |+----+----------------+-----------+3 rows in set (0.00 sec)mysql> select * from sqoop.cities where country = 'USA';+----+---------+-----------+| id | country | city      |+----+---------+-----------+|  1 | USA     | Palo Alto ||  3 | USA     | Sunnyvale |+----+---------+-----------+2 rows in set (0.00 sec)[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -rmr /tmp/cities[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop import \> --connect jdbc:mysql://{ip}:{port}/sqoop \> --username sqoop \> --password sqoop \> --table cities \> --where "country = 'USA'" \> --target-dir /tmp/cities
sqoop import:(protecting your password)[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop import \> --connect jdbc:mysql://{ip}:{port}/sqoop \> --username sqoop \> --table cities \> --where "country = 'USA'" \> --target-dir /tmp/cities \> -P                                => 命令行输入>--password-file my-sqoop-password  => 指定密码文件sqoop import:(Using a File Format Other Than CSV)默认生成的是CSV文件,字段间使用tab间隔--as-sequencefile--as-avrodatafile        sqoop import:(Compressing imported data)--compress--compression-codec org.apache.hadoop.io.compress.BZip2Codec            =>指定压缩算法
sqoop import:(speeding up transfers)默认mr的inputformat是通过jdbc的形式读取数据,效率低,可以使用数据库提供的一些工具,如mysql的 mysqldump等--direct

chapter3 Incremental Import

mysql> select * from sqoop.visits;+----+----------+---------------------+| id | city     | last_update_date    |+----+----------+---------------------+|  1 | Freemont | 1983-05-22 01:01:01 ||  2 | Jicin    | 1987-02-02 02:02:02 |+----+----------+---------------------+2 rows in set (0.00 sec)importing only new data表中有个id的主键(int类型的) 我们导入>1的数据--check-column   => 检查那个字段--last-value     => 检查的字段,上次的值是多少,这次会 +1 开始导入[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop import \>--connect jdbc:mysql://{ip}:{port}/sqoop \>--username sqoop \>--password sqoop \>--table visits \>--target-dir /tmp/visits \>--incremental append \        => incremental 模式是append 即追加>--check-column id \           => append模式下, 需要一个递增的主键>--last-value 1                => 会从 id>1开始导入注意这边在执行的时候是输出以下日志,提示下次增量import是last-value 2  (即本次导入的最后一条记录)并提示你最好使用 sqoop job --create 来处理类似的定时增量导入15/11/25 06:05:28 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:15/11/25 06:05:28 INFO tool.ImportTool:  --incremental append15/11/25 06:05:28 INFO tool.ImportTool:   --check-column id15/11/25 06:05:28 INFO tool.ImportTool:   --last-value 215/11/25 06:05:28 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -ls /tmp/visits-rw-r--r--   3 hadoop supergroup         30 2015-11-25 06:05 /tmp/visits/part-m-00000[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ hadoop dfs -cat /tmp/visits/part-m-000002,Jicin,1987-02-02 02:02:02.0incrementally importing mutable data

Sqoop Job:

http://shiyanjun.cn/archives/621.html

我们使用Sqoop1.x是,在rdbms和hadoop/hive进行数据同步时,如果是用了--incremental append模式,我们要记录--last-value.如果每次执行同步脚步时候,都需要从日志中解析出一个--last-value的值,然后重新设置脚步的参数,才能正确的保证同步正确。

[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop job \>--create visits-sync-job \                            => 创建job: job-id(visits-sync-job)>-- \>import \>--connect jdbc:mysql://10.224.243.124:3306/sqoop \>--username sqoop \>--password sqopp \>--table visits \>--incremental append \>--check-column id \>--last-value 1[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop job --list15/11/25 06:40:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6Available jobs:  visits-sync-job  [hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop job --show visits-sync-job15/11/25 06:40:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6Enter password:Job: visits-sync-jobTool: import......incremental.last.value = 1...[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop job --exec visits-sync-jobEnter password:执行job后,我们在show job[hadoop@hftclclw0001 sqoop-1.4.6.bin__hadoop-2.0.4-alpha]$ ./bin/sqoop job --show visits-sync-job15/11/25 06:44:52 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6Enter password: incremental.last.value = 2            => last_value已经被记录下,下次再执行的时候就会读取该值,再执行

chapter 4 Free-Form Query Import

sqoop import:(importing data from two tables)mysql> select * from sqoop.cities;+----+----------------+-----------+| id | country        | city      |+----+----------------+-----------+|  1 | USA            | Palo Alto ||  2 | Czech Republic | Brno      ||  3 | USA            | Sunnyvale |+----+----------------+-----------+3 rows in set (0.00 sec)mysql> select * from sqoop.countries;+------------+----------------+| country_id | country        |+------------+----------------+|          1 | USA            ||          2 | Czech Republic |+------------+----------------+2 rows in set (0.00 sec)

到此,关于"Sqoop1.x的安装配置方法"的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注网站,小编会继续努力为大家带来更多实用的文章!

0