千家信息网

怎么使用Pig分析Hadoop日志

发表于:2025-01-30 作者:千家信息网编辑
千家信息网最后更新 2025年01月30日,这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!目标计算出
千家信息网最后更新 2025年01月30日怎么使用Pig分析Hadoop日志

这篇文章主要讲解了"怎么使用Pig分析Hadoop日志",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"怎么使用Pig分析Hadoop日志"吧!

目标

计算出每个ip的点击次数,例如 123.24.56.57 13 24.53.23.123 7 34.56.78.120 20 等等……

待分析文件

220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_widthauto.css?AZH HTTP/1.1" 200 1024 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_forum_forumdisplay.css?AZH HTTP/1.1" 200 11486 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"

环境配置

# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then        . ~/.bashrcfi# User specific environment and startup programsexport ANT_HOME=/home/wukong/usr/apache-ant-1.9.4export HADOOP_HOME=/home/wukong/usr/hadoop-1.2.1export PIG_HOME=/home/wukong/usr/pig-0.13.0export PIG_CLASSPATH=$HADOOP_HOME/confPATH=$PATH:$HOME/bin:$ANT_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin:$PIG_HOME/bin:$PIG_CLASSPATHexport PATH

Pig脚本

A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);B = GROUP A BY ip;C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; STORE C INTO '/user/wukong/w08/access_log.out.txt';

执行过程

[wukong@bd11 ~]$ pig -x mapreduceWarning: $HADOOP_HOME is deprecated.14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE14/08/28 01:10:51 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType2014-08-28 01:10:51,242 [main] INFO  org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:342014-08-28 01:10:51,242 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/wukong/pig_1409159451241.log2014-08-28 01:10:51,319 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/wukong/.pigbootup not found2014-08-28 01:10:51,698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://bd11:90002014-08-28 01:10:52,343 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: bd11:9001grunt> lshdfs://bd11:9000/user/wukong/test       hdfs://bd11:9000/user/wukong/w05        hdfs://bd11:9000/user/wukong/w06        hdfs://bd11:9000/user/wukong/w07        grunt> mkdir w08grunt> copyFromLocal ./access_log.txt ./w08/grunt> lshdfs://bd11:9000/user/wukong/test       hdfs://bd11:9000/user/wukong/w05        hdfs://bd11:9000/user/wukong/w06        hdfs://bd11:9000/user/wukong/w07        hdfs://bd11:9000/user/wukong/w08        grunt> cd w08grunt> lshdfs://bd11:9000/user/wukong/w08/access_log.txt    7118627grunt> A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);grunt> B = GROUP A BY ip;grunt> C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; grunt> STORE C INTO '/user/wukong/w08/out';

执行过程日志

2014-08-28 01:13:56,741 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY2014-08-28 01:13:56,875 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}2014-08-28 01:13:57,091 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false2014-08-28 01:13:57,121 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner2014-08-28 01:13:57,178 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 12014-08-28 01:13:57,179 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 12014-08-28 01:13:57,432 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job2014-08-28 01:13:57,471 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.32014-08-28 01:13:57,479 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.2014-08-28 01:13:57,480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=71186272014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 12014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4751117514743080762.jar2014-08-28 01:14:01,054 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4751117514743080762.jar created2014-08-28 01:14:01,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job2014-08-28 01:14:01,095 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.2014-08-28 01:14:01,095 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche2014-08-28 01:14:01,129 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []2014-08-28 01:14:01,304 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.2014-08-28 01:14:01,805 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete2014-08-28 01:14:02,067 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 12014-08-28 01:14:02,067 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 12014-08-28 01:14:02,109 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library2014-08-28 01:14:02,109 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded2014-08-28 01:14:02,114 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 12014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408280106_00012014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],C[3,4],B[2,4] C: C[3,4],B[2,4] R: C[3,4]2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bd11:50030/jobdetails.jsp?jobid=job_201408280106_00012014-08-28 01:14:18,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete2014-08-28 01:14:18,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:30,058 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]2014-08-28 01:14:39,202 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete2014-08-28 01:14:39,210 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features1.2.1   0.13.0  wukong  2014-08-28 01:13:57     2014-08-28 01:14:39     GROUP_BYSuccess!Job Stats (time in seconds):JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime       Alias    Feature Outputsjob_201408280106_0001   1       1       6       6       6       6       11     11       11      11      A,B,C   GROUP_BY,COMBINER       /user/wukong/w08/access_log.out.txt,Input(s):Successfully read 28134 records (7118993 bytes) from: "/user/wukong/w08/access_log.txt"Output(s):Successfully stored 476 records (8051 bytes) in: "/user/wukong/w08/out"Counters:Total records written : 476Total bytes written : 8051Spillable Memory Manager spill count : 0Total bags proactively spilled: 0Total records proactively spilled: 0Job DAG:job_201408280106_00012014-08-28 01:14:39,227 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

执行结果查看

[wukong@bd11 ~]$ hadoop fs -cat ./w08/out/part-r-00000Warning: $HADOOP_HOME is deprecated.127.0.0.1       21.59.65.67      2112.4.2.19      9112.4.2.51      8060.2.99.33      42省略。。。。。 221.194.180.166 4576

感谢各位的阅读,以上就是"怎么使用Pig分析Hadoop日志"的内容了,经过本文的学习后,相信大家对怎么使用Pig分析Hadoop日志这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!

0