千家信息网

Flume-1.6.0学习笔记(五)sink到hdfs

发表于:2024-09-22 作者:千家信息网编辑
千家信息网最后更新 2024年09月22日,鲁春利的工作笔记,谁说程序员不能有文艺范?Flume从指定目录读取数据,通过memory作为channel,然后讲数据写入到hdfs。Spooling Directory Source(http://
千家信息网最后更新 2024年09月22日Flume-1.6.0学习笔记(五)sink到hdfs

鲁春利的工作笔记,谁说程序员不能有文艺范?



Flume从指定目录读取数据,通过memory作为channel,然后讲数据写入到hdfs。

Spooling Directory Source(http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)


Memory Channel(http://flume.apache.org/FlumeUserGuide.html#memory-channel)


HDFS Sink(http://flume.apache.org/FlumeUserGuide.html#hdfs-sink)


Flume配置文件

# vim agent-hdfs.conf# write data to hdfsagent.sources = sd-sourceagent.channels = mem-channelagent.sinks = hdfs-sink# define sourceagent.sources.sd-source.type = spooldiragent.sources.sd-source.spoolDir = /opt/flumeSpoolagent.sources.sd-source.fileHeader = true# define channelagent.channels.mem-channel.type = memory# define sinkagent.sinks.hdfs-sink.type = hdfsagent.sinks.hdfs-sink.hdfs.path = hdfs://nnode:8020/flume/webdata# assembleagent.sources.sd-source.channels = mem-channelagent.sinks.hdfs-sink.channel = mem-channel

说明:/opt/flumeSpool目录需要提前创建,否则flume检测不到该目录,会有错误提示。


启动Agent

[hadoop@nnode flume1.6.0]$ bin/flume-ng agent --conf conf --name agent --conf-file conf/agent-hdfs.conf -Dflume.root.logger=INFO,console


拷贝数据到/opt/flumeSpool目录下

cp /usr/local/hadoop2.6.0/logs/* /opt/flumeSpool


Flume检测到该目录下数据变化,并会自动写入到HDFS


查看HDFS上flume目录

[hadoop@nnode flume1.6.0]$ hdfs dfs -ls -R /flume/drwxr-xr-x   - hadoop hadoop          0 2015-11-21 16:55 /flume/webdata-rw-r--r--   2 hadoop hadoop       2568 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836223-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836224-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836225-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836226-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836227-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836228-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836229-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836230-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836231-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836232-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836233-rw-r--r--   2 hadoop hadoop       2163 2015-11-21 16:50 /flume/webdata/FlumeData.1448095836234


查看文件


说明:

通过Flume往hdfs写入数据时,默认格式(hdfs.fileType)为SequenceFile,无法直接查看;若希望保存为文本格式,则可以指定hdfs.fileType为DataStream。


查看flumeSpool目录

[root@nnode flumeSpool]# lltotal 3028-rw-r--r-- 1 root root  227893 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.log.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.1.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.2.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.COMPLETED-rw-r--r-- 1 root root 1993109 Nov 21 16:50 hadoop-hadoop-namenode-nnode.log.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.1.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.2.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.COMPLETED-rw-r--r-- 1 root root  169932 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.log.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.1.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.2.COMPLETED-rw-r--r-- 1 root root     718 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.COMPLETED

说明:Flume处理万文件后默认是不删除的,但是会标记该文件已经被flume处理过了,如果处理后无需对文件保留可以通过Source指定删除策略:

deletePolicy    never    When to delete completed files: never or immediate


0