千家信息网

hadoop网站日志举例分析

发表于:2024-11-30 作者:千家信息网编辑
千家信息网最后更新 2024年11月30日,这篇文章主要讲解了"hadoop网站日志举例分析",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"hadoop网站日志举例分析"吧!一、项目要求日志处理方
千家信息网最后更新 2024年11月30日hadoop网站日志举例分析

这篇文章主要讲解了"hadoop网站日志举例分析",文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习"hadoop网站日志举例分析"吧!

一、项目要求


  • 日志处理方法中的日志,仅指Web日志。其实并没有精确的定义,可能包括但不限于各种前端Web服务器--apache、lighttpd、nginx、tomcat等产生的用户访问日志,以及各种Web应用程序自己输出的日志。


二、需求分析: KPI指标设计

PV(PageView): 页面访问量统计
IP: 页面独立IP的访问量统计
Time: 用户每小时PV的统计
Source: 用户来源域名的统计
Browser: 用户的访问设备统计

下面我着重分析浏览器统计

三、分析过程

1、 日志的一条nginx记录内容

222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] "GET /images/my.jpg HTTP/1.1" 200 19939
"http://www.angularjs.cn/A00n"
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"

2、对上面的日志记录进行分析

remote_addr : 记录客户端的ip地址, 222.68.172.190
remote_user : 记录客户端用户名称, -
time_local: 记录访问时间与时区, [18/Sep/2013:06:49:57 +0000]
request: 记录请求的url与http协议, "GET /images/my.jpg HTTP/1.1″
status: 记录请求状态,成功是200, 200
body_bytes_sent: 记录发送给客户端文件主体内容大小, 19939
http_referer: 用来记录从那个页面链接访问过来的, "http://www.angularjs.cn/A00n"
http_user_agent: 记录客户浏览器的相关信息, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36″

3、java语言分析上面一条日志记录(使用空格切分)

String line = "222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] \"GET /images/my.jpg HTTP/1.1\" 200 19939 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36\"";            String[] elementList = line.split(" ");            for(int i=0;i

测试结果:

0 : 222.68.172.1901 : -2 : -3 : [18/Sep/2013:06:49:574 : +0000]5 : "GET6 : /images/my.jpg7 : HTTP/1.1"8 : 2009 : 1993910 : "http://www.angularjs.cn/A00n"11 : "Mozilla/5.012 : (Windows13 : NT14 : 6.1)15 : AppleWebKit/537.3616 : (KHTML,17 : like18 : Gecko)19 : Chrome/29.0.1547.6620 : Safari/537.36"

4、实体Kpi类的代码:

public class Kpi {        private String remote_addr;// 记录客户端的ip地址    private String remote_user;// 记录客户端用户名称,忽略属性"-"    private String time_local;// 记录访问时间与时区    private String request;// 记录请求的url与http协议    private String status;// 记录请求状态;成功是200    private String body_bytes_sent;// 记录发送给客户端文件主体内容大小    private String http_referer;// 用来记录从那个页面链接访问过来的    private String http_user_agent;// 记录客户浏览器的相关信息    private String method;//请求方法 get post    private String http_version; //http版本            public String getMethod() {                return method;        }        public void setMethod(String method) {                this.method = method;        }        public String getHttp_version() {                return http_version;        }        public void setHttp_version(String http_version) {                this.http_version = http_version;        }        public String getRemote_addr() {                return remote_addr;        }        public void setRemote_addr(String remote_addr) {                this.remote_addr = remote_addr;        }        public String getRemote_user() {                return remote_user;        }        public void setRemote_user(String remote_user) {                this.remote_user = remote_user;        }        public String getTime_local() {                return time_local;        }        public void setTime_local(String time_local) {                this.time_local = time_local;        }        public String getRequest() {                return request;        }        public void setRequest(String request) {                this.request = request;        }        public String getStatus() {                return status;        }        public void setStatus(String status) {                this.status = status;        }        public String getBody_bytes_sent() {                return body_bytes_sent;        }        public void setBody_bytes_sent(String body_bytes_sent) {                this.body_bytes_sent = body_bytes_sent;        }        public String getHttp_referer() {                return http_referer;        }        public void setHttp_referer(String http_referer) {                this.http_referer = http_referer;        }        public String getHttp_user_agent() {                return http_user_agent;        }        public void setHttp_user_agent(String http_user_agent) {                this.http_user_agent = http_user_agent;        }        @Override        public String toString() {                return "Kpi [remote_addr=" + remote_addr + ", remote_user="                                + remote_user + ", time_local=" + time_local + ", request="                                + request + ", status=" + status + ", body_bytes_sent="                                + body_bytes_sent + ", http_referer=" + http_referer                                + ", http_user_agent=" + http_user_agent + ", method=" + method                                + ", http_version=" + http_version + "]";        }            }

5、kpi的工具类

package org.aaa.kpi;public class KpiUtil {        /***         * line记录转化成kpi对象         * @param line 日志的一条记录         * @author tianbx         * */        public static Kpi transformLineKpi(String line){                String[] elementList = line.split(" ");                Kpi kpi = new Kpi();            kpi.setRemote_addr(elementList[0]);            kpi.setRemote_user(elementList[1]);            kpi.setTime_local(elementList[3].substring(1));            kpi.setMethod(elementList[5].substring(1));            kpi.setRequest(elementList[6]);            kpi.setHttp_version(elementList[7]);            kpi.setStatus(elementList[8]);            kpi.setBody_bytes_sent(elementList[9]);            kpi.setHttp_referer(elementList[10]);            kpi.setHttp_user_agent(elementList[11] + " " + elementList[12]);                return kpi;        }}

6、算法模型: 并行算法

Browser: 用户的访问设备统计
- Map: {key:$http_user_agent,value:1}
- Reduce: {key:$http_user_agent,value:求和(sum)}
7、map-reduce分析代码


import java.io.IOException;import java.util.Iterator;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.FileInputFormat;import org.apache.hadoop.mapred.FileOutputFormat;import org.apache.hadoop.mapred.JobClient;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reducer;import org.apache.hadoop.mapred.Reporter;import org.apache.hadoop.mapred.TextInputFormat;import org.apache.hadoop.mapred.TextOutputFormat;import org.hmahout.kpi.entity.Kpi;import org.hmahout.kpi.util.KpiUtil;import cz.mallat.uasparser.UASparser;import cz.mallat.uasparser.UserAgentInfo;public class KpiBrowserSimpleV {        public static class KpiBrowserSimpleMapper extends MapReduceBase                 implements Mapper {                UASparser parser = null;                @Override                public void map(Object key, Text value,                                OutputCollector out, Reporter reporter)                                throws IOException {                        Kpi kpi = KpiUtil.transformLineKpi(value.toString());                        if(kpi!=null && kpi.getHttP_user_agent_info()!=null){                                if(parser==null){                                        parser = new UASparser();                                }                                UserAgentInfo info =                                 parser.parseBrowserOnly(kpi.getHttP_user_agent_info());                                if("unknown".equals(info.getUaName())){                                        out.collect(new Text(info.getUaName()), new IntWritable(1));                                }else{                                        out.collect(new Text(info.getUaFamily()), new IntWritable(1));                                }                        }                }        }        public static class KpiBrowserSimpleReducer extends MapReduceBase implements                Reducer{                @Override                public void reduce(Text key, Iterator value,                                OutputCollector out, Reporter reporter)                                throws IOException {                        IntWritable sum = new IntWritable(0);                        while(value.hasNext()){                                sum.set(sum.get()+value.next().get());                        }                        out.collect(key, sum);                }        }        public static void main(String[] args) throws IOException {                String input = "hdfs://127.0.0.1:9000/user/tianbx/log_kpi/input";        String output ="hdfs://127.0.0.1:9000/user/tianbx/log_kpi/browerSimpleV";        JobConf conf = new JobConf(KpiBrowserSimpleV.class);        conf.setJobName("KpiBrowserSimpleV");        String url = "classpath:";        conf.addResource(url+"/hadoop/core-site.xml");        conf.addResource(url+"/hadoop/hdfs-site.xml");        conf.addResource(url+"/hadoop/mapred-site.xml");                conf.setMapOutputKeyClass(Text.class);        conf.setMapOutputValueClass(IntWritable.class);                conf.setOutputKeyClass(Text.class);        conf.setOutputValueClass(IntWritable.class);                conf.setMapperClass(KpiBrowserSimpleMapper.class);        conf.setCombinerClass(KpiBrowserSimpleReducer.class);        conf.setReducerClass(KpiBrowserSimpleReducer.class);        conf.setInputFormat(TextInputFormat.class);        conf.setOutputFormat(TextOutputFormat.class);        FileInputFormat.setInputPaths(conf, new Path(input));        FileOutputFormat.setOutputPath(conf, new Path(output));        JobClient.runJob(conf);        System.exit(0);        }}


8、输出文件log_kpi/browerSimpleV内容

AOL Explorer 1
Android Webkit 123
Chrome 4867
CoolNovo 23
Firefox 1700
Google App Engine 5
IE 1521
Jakarta Commons-HttpClient 3
Maxthon 27
Mobile Safari 273
Mozilla 130
Openwave Mobile Browser 2
Opera 2
Pale Moon 1
Python-urllib 4
Safari 246
Sogou Explorer 157
unknown 4685

8 R制作图片


data<-read.table(file="borwer.txt",header=FALSE,sep=",")

names(data)<-c("borwer","num")

qplot(borwer,num,data=data,geom="bar")


感谢各位的阅读,以上就是"hadoop网站日志举例分析"的内容了,经过本文的学习后,相信大家对hadoop网站日志举例分析这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是,小编将为大家推送更多相关知识点的文章,欢迎关注!

0