怎么使用Pig分析Hadoop日志
发表于:2025-01-30 作者:千家信息网编辑
千家信息网最后更新 2025年01月30日,这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!目标计算出
千家信息网最后更新 2025年01月30日怎么使用Pig分析Hadoop日志
这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!
目标
计算出每个ip的点击次数,例如 123.24.56.57 13 24.53.23.123 7 34.56.78.120 20 等等……
待分析文件
220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_widthauto.css?AZH HTTP/1.1" 200 1024 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_forum_forumdisplay.css?AZH HTTP/1.1" 200 11486 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
环境配置
# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programsexport ANT_HOME=/home/wukong/usr/apache-ant-1.9.4export HADOOP_HOME=/home/wukong/usr/hadoop-1.2.1export PIG_HOME=/home/wukong/usr/pig-0.13.0export PIG_CLASSPATH=$HADOOP_HOME/confPATH=$PATH:$HOME/bin:$ANT_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin:$PIG_HOME/bin:$PIG_CLASSPATHexport PATH
Pig脚本
A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);B = GROUP A BY ip;C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; STORE C INTO '/user/wukong/w08/access_log.out.txt';
执行过程
[wukong@bd11 ~]$ pig -x mapreduceWarning: $HADOOP_HOME is deprecated.14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE14/08/28 01:10:51 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType2014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:342014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Logging error messages to: /home/wukong/pig_1409159451241.log2014-08-28 01:10:51,319 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/wukong/.pigbootup not found2014-08-28 01:10:51,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://bd11:90002014-08-28 01:10:52,343 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: bd11:9001grunt> lshdfs://bd11:9000/user/wukong/testhdfs://bd11:9000/user/wukong/w05 hdfs://bd11:9000/user/wukong/w06 hdfs://bd11:9000/user/wukong/w07 grunt> mkdir w08grunt> copyFromLocal ./access_log.txt ./w08/grunt> lshdfs://bd11:9000/user/wukong/test hdfs://bd11:9000/user/wukong/w05 hdfs://bd11:9000/user/wukong/w06 hdfs://bd11:9000/user/wukong/w07 hdfs://bd11:9000/user/wukong/w08 grunt> cd w08grunt> lshdfs://bd11:9000/user/wukong/w08/access_log.txt 7118627grunt> A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);grunt> B = GROUP A BY ip;grunt> C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; grunt> STORE C INTO '/user/wukong/w08/out';
执行过程日志
2014-08-28 01:13:56,741 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY2014-08-28 01:13:56,875 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}2014-08-28 01:13:57,091 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false2014-08-28 01:13:57,121 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner2014-08-28 01:13:57,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 12014-08-28 01:13:57,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 12014-08-28 01:13:57,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job2014-08-28 01:13:57,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32014-08-28 01:13:57,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.2014-08-28 01:13:57,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=71186272014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 12014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4751117514743080762.jar2014-08-28 01:14:01,054 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4751117514743080762.jar created2014-08-28 01:14:01,077 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche2014-08-28 01:14:01,129 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []2014-08-28 01:14:01,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.2014-08-28 01:14:01,805 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete2014-08-28 01:14:02,067 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12014-08-28 01:14:02,067 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 12014-08-28 01:14:02,109 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library2014-08-28 01:14:02,109 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded2014-08-28 01:14:02,114 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 12014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408280106_00012014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],C[3,4],B[2,4] C: C[3,4],B[2,4] R: C[3,4]2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bd11:50030/jobdetails.jsp?jobid=job_201408280106_00012014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete2014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:30,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:39,202 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete2014-08-28 01:14:39,210 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:HadoopVersion PigVersion UserId StartedAt FinishedAt Features1.2.1 0.13.0 wukong 2014-08-28 01:13:57 2014-08-28 01:14:39 GROUP_BYSuccess!Job Stats (time in seconds):JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputsjob_201408280106_0001 1 1 6 6 6 6 11 11 11 11 A,B,C GROUP_BY,COMBINER /user/wukong/w08/access_log.out.txt,Input(s):Successfully read 28134 records (7118993 bytes) from: "/user/wukong/w08/access_log.txt"Output(s):Successfully stored 476 records (8051 bytes) in: "/user/wukong/w08/out"Counters:Total records written : 476Total bytes written : 8051Spillable Memory Manager spill count : 0Total bags proactively spilled: 0Total records proactively spilled: 0Job DAG:job_201408280106_00012014-08-28 01:14:39,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
执行结果查看
[wukong@bd11 ~]$ hadoop fs -cat ./w08/out/part-r-00000Warning: $HADOOP_HOME is deprecated.127.0.0.1 21.59.65.67 2112.4.2.19 9112.4.2.51 8060.2.99.33 42省略。。。。。 221.194.180.166 4576
感谢各位的阅读,以上就是"怎么使用Pig分析Hadoop日志"的内容了,经过本文的学习后,相信大家对怎么使用Pig分析Hadoop日志这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!
日志
分析
学习
内容
过程
就是
思路
情况
文件
文章
更多
次数
环境
目标
知识
知识点
篇文章
结果
脚本
跟着
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
数据库DNL
胶州游戏软件开发
千与千寻下载软件开发
web服务器安全沙箱
软件开发螺旋模型步骤
软件开发免责书
js 服务器文件管理
小学国家网络安全宣传周活动美篇
隆安天气预报软件开发
服务器怎么判断用户登录
中国第一网络安全科技馆
2017网络安全周 北京
服务器下载播放转链接
服务器内存和普通pc的区别
Ps5gt赛车连不上服务器
数据库怎么注销一句话
和平之役怎么更改延迟低的服务器
成都工控软件开发机构
数据库求闭包
小学生网络安全主题班会方案
计算机网络技术专业就业前景好
如何提高服务器防护
数据库主键在哪设置
大学数据库在物联网的应用有哪些
如何合并两行重复的数据库
数据库怎样确定检查点
加强网络安全和信息化工作会议
软件开发QA策略
贯彻网络安全法简报
数据库索引超出了宿主界限