导航：首页 > 服务器 >

hadoop中mapreduce的示例代码

发表于：2024-12-04 作者：千家信息网编辑

千家信息网最后更新 2024年12月04日，这篇文章主要介绍hadoop中mapreduce的示例代码，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！package cn.itheima.bigdata.hadoop.mr

千家信息网最后更新 2024年12月04日hadoop中mapreduce的示例代码

这篇文章主要介绍hadoop中mapreduce的示例代码，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper{

@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {

//获取到一行文件的内容
String line = value.toString();
//切分这一行的内容为一个单词数组
String[] words = StringUtils.split(line, " ");
//遍历输出
for(String word:words){

context.write(new Text(word), new LongWritable(1));

}

}

}
package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer{

// key: hello , values : {1,1,1,1,1.....}
@Override
protected void reduce(Text key, Iterable values,Context context)
throws IOException, InterruptedException {

//定义一个累加计数器
long count = 0;
for(LongWritable value:values){

count += value.get();

}

//输出<单词：count>键值对
context.write(key, new LongWritable(count));

}

}

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* 用来描述一个作业job（使用哪个mapper类，哪个reducer类，输入文件在哪，输出结果放哪。。。。）
* 然后提交这个job给hadoop集群
* @author duanhaitao@itcast.cn
*
*/
//cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
public class WordCountRunner {

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job wcjob = Job.getInstance(conf);
//设置job所使用的jar包
conf.set("mapreduce.job.jar", "wcount.jar");

//设置wcjob中的资源所在的jar包
wcjob.setJarByClass(WordCountRunner.class);

//wcjob要使用哪个mapper类
wcjob.setMapperClass(WordCountMapper.class);
//wcjob要使用哪个reducer类
wcjob.setReducerClass(WordCountReducer.class);

//wcjob的mapper类输出的kv数据类型
wcjob.setMapOutputKeyClass(Text.class);
wcjob.setMapOutputValueClass(LongWritable.class);

//wcjob的reducer类输出的kv数据类型
wcjob.setOutputKeyClass(Text.class);
wcjob.setOutputValueClass(LongWritable.class);

//指定要处理的原始数据所存放的路径
FileInputFormat.setInputPaths(wcjob, "hdfs://192.168.88.155:9000/wc/srcdata");

//指定处理之后的结果输出到哪个路径
FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://192.168.88.155:9000/wc/output"));

boolean res = wcjob.waitForCompletion(true);

System.exit(res?0:1);

}

}

打包成mr.jar放在hadoop server上

[root@hadoop02 ~]# hadoop jar /root/Desktop/mr.jar cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
Java HotSpot(TM) Client VM warning: You have loaded library /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
15/12/05 06:07:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/05 06:07:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/192.168.88.155:8032
15/12/05 06:07:08 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/05 06:07:09 INFO input.FileInputFormat: Total input paths to process : 1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: number of splits:1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449322432664_0001
15/12/05 06:07:10 INFO impl.YarnClientImpl: Submitted application application_1449322432664_0001
15/12/05 06:07:10 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1449322432664_0001/
15/12/05 06:07:10 INFO mapreduce.Job: Running job: job_1449322432664_0001
15/12/05 06:07:22 INFO mapreduce.Job: Job job_1449322432664_0001 running in uber mode : false
15/12/05 06:07:22 INFO mapreduce.Job: map 0% reduce 0%
15/12/05 06:07:32 INFO mapreduce.Job: map 100% reduce 0%
15/12/05 06:07:39 INFO mapreduce.Job: map 100% reduce 100%
15/12/05 06:07:40 INFO mapreduce.Job: Job job_1449322432664_0001 completed successfully
15/12/05 06:07:41 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=635
FILE: Number of bytes written=212441
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=338
HDFS: Number of bytes written=223
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7463
Total time spent by all reduces in occupied slots (ms)=4688
Total time spent by all map tasks (ms)=7463
Total time spent by all reduce tasks (ms)=4688
Total vcore-seconds taken by all map tasks=7463
Total vcore-seconds taken by all reduce tasks=4688
Total megabyte-seconds taken by all map tasks=7642112
Total megabyte-seconds taken by all reduce tasks=4800512
Map-Reduce Framework
Map input records=10
Map output records=41
Map output bytes=547
Map output materialized bytes=635
Input split bytes=114
Combine input records=0
Combine output records=0
Reduce input groups=30
Reduce shuffle bytes=635
Reduce input records=41
Reduce output records=30
Spilled Records=82
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=211
CPU time spent (ms)=1350
Physical memory (bytes) snapshot=221917184
Virtual memory (bytes) snapshot=722092032
Total committed heap usage (bytes)=137039872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=224
File Output Format Counters
Bytes Written=223