sqoop相关整理记录
生产背景:
在从mysql导入到hive中,遇到如下问题:
从hive导出到mysql中,遇到如下问题:
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop的export原理:
验证sqoop的各种报错:
1 mysql字段太短
2 hive的空字段转换
3 分隔符错误
4 mysql的网络不在集群网络中
5 mysql停止服务
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
7 Sqoop调式信息
8 修改生成的Java类,重新打包。
Sqoop命令行说明
生产背景:
在从mysql导入到hive中,遇到如下问题:
1) 源mysql和集群机器不在同一个网段中,导致执行导入命令,网络连接失败。
2) 某些字符导入到hive中,出现报错终止。
2.1 sqoop使用的JDBC-connector 版本太低(更换版本)。
从hive导出到mysql中,遇到如下问题:
1)某些字符插入mysql,出现报错终止。
1.1 可能mysql本身编码的限制,某些字符不支持,比如uft8和utf8mb4
1.2 sqoop使用的JDBC-connector 版本太低(更换版本)。
sqoop 缺点:
1 基于命令行的操作方式,易出错,且不安全。
2 数据传输和数据格式是紧耦合的,这使得connector无法支持所有的数据格式
3 用户名和密码暴漏出来
4 sqoop安装需要root权限
Sqoop优点:
1 高效可控的利用资源,任务并行度,超时时间。
2 数据类型映射与转化,可自动进行,用户也可自定义 .
3 支持多种主流数据库,MySQL,Oracle,SQL Server,DB2等等 。
Sqoop原理:
Sqoop的inport原理:
Sqoop在import时,需要制定split-by参数。Sqoop根据不同的split-by参数值来进行切分,然后将切分出来的区域分配到不同map中。每个map中再处理数据库中获取的一行一行的值,写入到HDFS中。同时split-by根据不同的参数类型有不同的切分方法,如比较简单的int型,Sqoop会取最大和最小split-by字段值,然后根据传入的num-mappers来确定划分几个区域。 比如select max(split_by),min(split-by) from得到的max(split-by)和min(split-by)分别为1000和1,而num-mappers为2的话,则会分成两个区域(1,500)和(501-100),同时也会分成2个sql给2个map去进行导入操作,分别为select XXX from table where split-by>=1 and split-by<500和select XXX from table where split-by>=501 and split-by<=1000。最后每个map各自获取各自SQL中的数据进行导入工作。
Sqoop的export原理:根据mysql表名称,生成一个以表名称命名的Java类,该类继承了sqoopRecord的,是一个只有Map的MR,且自定义了输出字段。
sqoop export --connect jdbc:mysql://$url:3306/$3?characterEncoding=utf8 --username $username --password $password --table $1 --export-dir $2 --input-fields-terminated-by '|' --null-non-string '0' --null-string '0';
验证sqoop的各种报错:
mysql表
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(20),
click_url varchar(200)
)
hive 表
CREATE TABLE TRLOG
(
PLATFORM string,
USER_ID int,
CLICK_TIME string,
CLICK_URL string
)
row format delimited
fields terminated by '\t';
sqoop list-databases -connect jdbc:mysql://192.168.119.129:3306/ -username li72 -password 123
1 mysql字段太短
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
Warning: $HADOOP_HOME is deprecated.
14/11/06 01:42:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:42:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:42:32 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:42:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:42:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d5e37c20a9231b3253c97fc27d16d8a9/dm_trlog.jar
14/11/06 01:42:38 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:42:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:42:43 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:42:44 INFO mapred.JobClient: Running job: job_201411060114_0001
14/11/06 01:42:45 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:43:15 INFO mapred.JobClient: Task Id : attempt_201411060114_0001_m_000000_0, Status : FAILED
java.io.IOException: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'click_time' at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:192)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'click_time' at row 1
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4118)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4052)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2794)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
mysql字段太短了
drop table dm_trlog;
create table dm_trlog (
plat varchar(20),
user_id varchar(20),
click_time varchar(200),
click_url varchar(200)
)
2 hive的空字段转换
由于Hive的NULL用\N来表示,字段用\001来分割,换行用\n来换行,导出分隔符一定要和hive表保持一致,如果为空可以指定转换为0,有些mysql数字字段不能插入\N
加上两个参数:--input-null-string '\\N' --input-null-non-string '\\N',多加一个'\',是为转义
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
14/10/23 04:53:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/915d24128af59c9e517580e0f07411d4/dm_pc_play_kpi.jar
14/10/23 04:53:47 INFO mapreduce.ExportJobBase: Beginning export of dm_pc_play_kpi
14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1
14/10/23 04:53:48 INFO input.FileInputFormat: Total input paths to process : 1
14/10/23 04:53:48 WARN snappy.LoadSnappy: Snappy native library is available
14/10/23 04:53:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/10/23 04:53:48 INFO snappy.LoadSnappy: Snappy native library loaded
14/10/23 04:53:48 INFO mapred.JobClient: Running job: job_201408301703_84117
14/10/23 04:53:49 INFO mapred.JobClient: map 0% reduce 0%
14/10/23 04:55:45 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: "N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
at java.lang.Float.valueOf(Float.java:417)
at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335)
at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/10/23 04:55:53 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_1, Status : FAILED
14/10/23 04:55:58 INFO mapred.JobClient: Task Id : attempt_201408301703_84117_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NumberFormatException: For input string: "N"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
at java.lang.Float.valueOf(Float.java:417)
at dm_pc_play_kpi.__loadFromFields(dm_pc_play_kpi.java:335)
at dm_pc_play_kpi.parse(dm_pc_play_kpi.java:282)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
3 分隔符错误
Hive中的分隔符是'\t' 但是导出写成'|'
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '|' --null-non-string '0' --null-string '0';
为了测试,特意把分隔符改成 "|"
14/11/06 01:50:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 01:50:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 01:50:19 INFO tool.CodeGenTool: Beginning code generation
14/11/06 01:50:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_trlog` AS t LIMIT 1
14/11/06 01:50:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/bigdata/hadoop
Note: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/06 01:50:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e474b3f8292f91dd4134b302ae35df19/dm_trlog.jar
14/11/06 01:50:25 INFO mapreduce.ExportJobBase: Beginning export of dm_trlog
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO input.FileInputFormat: Total input paths to process : 1
14/11/06 01:50:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/06 01:50:45 WARN snappy.LoadSnappy: Snappy native library not loaded
14/11/06 01:50:51 INFO mapred.JobClient: Running job: job_201411060114_0003
14/11/06 01:50:52 INFO mapred.JobClient: map 0% reduce 0%
14/11/06 01:51:20 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
14/11/06 01:51:21 INFO mapred.JobClient: Task Id : attempt_201411060114_0003_m_000001_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at dm_trlog.__loadFromFields(dm_trlog.java:252)
at dm_trlog.parse(dm_trlog.java:201)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
4 mysql的网络不在集群网络中
sqoop export --connect jdbc:mysql://192.168.119.1:3306/student?characterEncoding=utf8 --username li72 --password 123 --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
Warning: $HADOOP_HOME is deprecated.
14/11/06 02:04:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 02:04:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 02:04:30 INFO tool.CodeGenTool: Beginning code generation
14/11/06 02:07:40 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.
at com.mysql.jdbc.JDBC4Connection.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.
at java.net.Socket.
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.
... 32 more
5 mysql停止服务
14/11/06 04:55:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
14/11/06 04:55:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory: com.cloudera.sqoop.manager.DefaultManagerFactory
14/11/06 04:55:25 DEBUG manager.DefaultManagerFactory: Trying with scheme: jdbc:mysql:
14/11/06 04:55:25 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/06 04:55:25 DEBUG sqoop.ConnFactory: Instantiated ConnManager org.apache.sqoop.manager.MySQLManager@8a0d5d
14/11/06 04:55:25 INFO tool.CodeGenTool: Beginning code generation
14/11/06 04:55:26 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.
at com.mysql.jdbc.JDBC4Connection.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:605)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:628)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:235)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:219)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:283)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1255)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1072)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.
at java.net.Socket.
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.
... 32 more
14/11/06 04:55:27 DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
14/11/06 04:55:27 ERROR manager.CatalogQueryManager: Failed to list columns from query: SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = (SELECT SCHEMA()) AND TABLE_NAME = 'dm_trlog'
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2461)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2498)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2283)
at com.mysql.jdbc.ConnectionImpl.
at com.mysql.jdbc.JDBC4Connection.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:404)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:317)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:745)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.CatalogQueryManager.getColumnNames(CatalogQueryManager.java:147)
at org.apache.sqoop.orm.ClassWriter.getColumnNames(ClassWriter.java:1222)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1074)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.
at java.net.Socket.
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:259)
at com.mysql.jdbc.MysqlIO.
... 28 more
14/11/06 04:55:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
6 mysql utf8编码只是3个字节,可能是因为某些unicode字符转成utf8之后变成了4个字节,需要mysql支持utf8mb4
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
14/11/09 07:00:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/09 07:00:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/11/09 07:00:33 INFO tool.CodeGenTool: Beginning code generation
14/11/09 07:00:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1
14/11/09 07:00:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dm_go_snger` AS t LIMIT 1
14/11/09 07:00:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop/hadoop-1.1.2
Note: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/09 07:00:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/211b9679d4ac771d3ff710cbbd1c7277/dm_go_snger.jar
14/11/09 07:00:35 INFO mapreduce.ExportJobBase: Beginning export of dm_go_snger
14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1
14/11/09 07:00:36 INFO input.FileInputFormat: Total input paths to process : 1
14/11/09 07:00:36 WARN snappy.LoadSnappy: Snappy native library is available
14/11/09 07:00:36 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/11/09 07:00:36 INFO snappy.LoadSnappy: Snappy native library loaded
14/11/09 07:00:37 INFO mapred.JobClient: Running job: job_201408301703_121362
14/11/09 07:00:38 INFO mapred.JobClient: map 0% reduce 0%
14/11/09 07:00:52 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:00:59 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:01:07 INFO mapred.JobClient: Task Id : attempt_201408301703_121362_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:220)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.sql.SQLException: Incorrect string value: 'xF3x90x8Cx92xEFxBF...' for column 'singer' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1072)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3563)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3495)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2113)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2693)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2102)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1364)
at org.apache.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)
14/11/09 07:01:20 INFO mapred.JobClient: Job complete: job_201408301703_121362
14/11/09 07:01:20 INFO mapred.JobClient: Counters: 8
14/11/09 07:01:20 INFO mapred.JobClient: Job Counters
14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31500
14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/11/09 07:01:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/11/09 07:01:20 INFO mapred.JobClient: Rack-local map tasks=3
14/11/09 07:01:20 INFO mapred.JobClient: Launched map tasks=4
14/11/09 07:01:20 INFO mapred.JobClient: Data-local map tasks=1
14/11/09 07:01:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/11/09 07:01:20 INFO mapred.JobClient: Failed map tasks=1
14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 44.4052 seconds (0 bytes/sec)
14/11/09 07:01:20 INFO mapreduce.ExportJobBase: Exported 0 records.
14/11/09 07:01:20 ERROR tool.ExportTool: Error during export: Export job failed!
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in jar:file:/home/hadoop/hadoop/hive-0.10.0.20140629/lib/hive-common-0.10.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201411090701_1423933997.txt
7 Sqoop调式信息
增加关键字--verbose
sqoop export --connect jdbc:mysql://192.168.119.129:3306/student?characterEncoding=utf8 --username li72 --password 123 --verbose --table dm_trlog --export-dir /home/bigdata/hive/data/db1.db/trlog --input-fields-terminated-by '\t' --null-non-string '0' --null-string '0';
8 修改生成的Java类,重新打包。
每次通过sqoop导入MySql的时,都会生成一个以MySql表命名的.java文件,然后打成JAR包,给sqoop提交给hadoop 的MR来解析Hive表中的数据。那可以根据报的错误,找到对应的行,改写该文件,编译,重新打包,sqoop可以通过 -jar-file ,--class-name 组合让我们指定运行自己的jar包中的某个class。来解析该hive表中的每行数据。脚本如下:一个完整的例子如下:
./bin/sqoop export --connect "jdbc:mysql://192.168.119.129:3306/student?useUnicode=true&characterEncoding=utf-8"
--username li72 --password 123 --table dm_trlog
--export-dir /hive/warehouse/trlog --input-fields-terminated-by '\t'
--input-null-string '\\N' --input-null-non-string '\\N'
--class-name com.li72.trlog
--jar-file /tmp/sqoopTempjar/trlog.jar
上面--jar-file 参数指定jar包的路径。--class-name 指定jar包中的class。
这样就可以解决所有解析异常了。
Sqoop命令行说明
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256 | Common arguments: --connect < jdbc-uri > Specify JDBC connect string --connection-manager < class-name > Specify connection manager class name --connection-param-file < properties-file > Specify connection parameters file --driver < class-name > Manually specify JDBC driver class to use --hadoop-home < hdir > Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home < dir > Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password < password > Set authentication password --password-file < password-file > Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --username < username > Set authentication username --verbose Print more information while working Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-sequencefile Imports data to SequenceFile s --as-textfile Imports data as plain text (default) --boundary-query < statement > Set boundary query for retrieving max and min value of the primary key --columns < col ,col,col...> Columns to import from table --compression-codec < codec > Compression codec to use for import --delete-target-dir Imports data in delete mode --direct Use direct import fast path --direct-split-size < n > Split the input stream every 'n' bytes when importing in direct mode -e,--query < statement > Import results of SQL 'statement' --fetch-size < n > Set number 'n' of rows to fetch from the database when more rows are needed --inline-lob-limit < n > Set the maximum size for an inline LOB -m,--num-mappers < n > Use 'n' map tasks to import in parallel --mapreduce-job-name < name > Set name for generated mapreduce job --split-by < column-name > Column of the table used to split work units --table < table-name > Table to read --target-dir < dir > HDFS plain table destination --validate Validate the copy using the configured validator --validation-failurehandler < validation-failurehandler > Fully qualified class name for ValidationFa ilureHandler --validation-threshold < validation-threshold > Fully qualified class name for ValidationTh reshold --validator < validator > Fully qualified class name for the Validator --warehouse-dir < dir > HDFS parent for table destination --where < where clause> WHERE clause to use during import -z,--compress Enable compression Incremental import arguments: --check-column < column > Source column to check for incremental change --incremental < import-type > Define an incremental import of type 'append' or 'lastmodified' --last-value < value > Last imported value in the incremental check column Output line formatting arguments: --enclosed-by < char > Sets a required field enclosing character --escaped-by < char > Sets the escape character --fields-terminated-by < char > Sets the field separator character --lines-terminated-by < char > Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' --optionally-enclosed-by < char > Sets a field enclosing character Input parsing arguments: --input-enclosed-by < char > Sets a required field encloser --input-escaped-by < char > Sets the input escape character --input-fields-terminated-by < char > Sets the input field separator --input-lines-terminated-by < char > Sets the input end-of-line char --input-optionally-enclosed-by < char > Sets a field enclosing character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-database < database-name > Sets the database name to use when importing to hive --hive-delims-replacement < arg > Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string --hive-drop-import-delims Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields --hive-home < dir > Override $HIVE_HOME --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --hive-table < table-name > Sets the table name to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HBase arguments: --column-family < family > Sets the target column family for the import --hbase-bulkload Enables HBase bulk loading --hbase-create-table If specified, create missing HBase tables --hbase-row-key < col > Specifies which input column to use as the row key --hbase-table < table > Import to < table > in HBase HCatalog arguments: --hcatalog-database < arg > HCatalog database name --hcatalog-home < hdir > Override $HCAT_HOME --hcatalog-table < arg > HCatalog table name --hive-home < dir > Override $HIVE_HOME --hive-partition-key < partition-key > Sets the partition key to use when importing to hive --hive-partition-value < partition-value > Sets the partition value to use when importing to hive --map-column-hive < arg > Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --hcatalog-storage-stanza < arg > HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size < size > Batch size in bytes --accumulo-column-family < family > Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance < instance > Accumulo instance name. --accumulo-max-latency < latency > Max write latency in milliseconds --accumulo-password < password > Accumulo password. --accumulo-row-key < col > Specifies which input column to use as the row key --accumulo-table < table > Import to < table > in Accumulo --accumulo-user < user > Accumulo user name. --accumulo-visibility < vis > Visibility token to be applied to all rows imported --accumulo-zookeepers < zookeepers > Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir < dir > Output directory for compiled objects --class-name < name > Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string < null-str > Input null non-string representation --input-null-string < null-str > Input null string representation --jar-file < file > Disable code generation; use specified jar --map-column-java < arg > Override mapping for specific columns to java types --null-non-string < null-str > Null non-string representation --null-string < null-str > Null string representation --outdir < dir > Output directory for generated code --package-name < name > Put auto-generated classes in this package |