导航：首页 > 互联网科技 >

基于CentOS的Hadoop分布式环境如何搭建

发表于：2025-01-23 作者：千家信息网编辑

千家信息网最后更新 2025年01月23日，这篇文章主要讲解了"基于CentOS的Hadoop分布式环境如何搭建"，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习"基于CentOS的Hadoop分布式

千家信息网最后更新 2025年01月23日基于CentOS的Hadoop分布式环境如何搭建

这篇文章主要讲解了"基于CentOS的Hadoop分布式环境如何搭建"，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习"基于CentOS的Hadoop分布式环境如何搭建"吧！

在搭建hadoop环境中你要知道的一些事儿：

1.hadoop运行于linux系统之上，你要安装linux操作系统

2.你需要搭建一个运行hadoop的集群，例如局域网内能互相访问的linux系统

3.为了实现集群之间的相互访问，你需要做到ssh无密钥登录

4.hadoop的运行在jvm上的，也就是说你需要安装java的jdk，并配置好java_home

5.hadoop的各个组件是通过xml来配置的。在官网上下载好hadoop之后解压缩，修改/etc/hadoop目录中相应的配置文件

工欲善其事，必先利其器。这里也要说一下，在搭建hadoop环境中使用到的相关软件和工具：

1.virtualbox--毕竟要模拟几台linux，条件有限，就在virtualbox中创建几台虚拟机楼

2.centos--下载的centos7的iso镜像，加载到virtualbox中，安装运行

3.securecrt--可以ssh远程访问linux的软件

4.winscp--实现windows和linux的通信

5.jdk for linux--oracle官网上下载，解压缩之后配置一下即可

6.hadoop2.7.1--可在apache官网上下载

好了，下面分三个步骤来讲解

linux环境准备

配置ip

为了实现本机和虚拟机以及虚拟机和虚拟机之间的通信，virtualbox中设置centos的连接模式为host-only模式，并且手动设置ip，注意虚拟机的网关和本机中host-only network 的ip地址相同。配置ip完成后还要重启网络服务以使得配置有效。这里搭建了三台linux，如下图所示

配置主机名字

对于192.168.56.101设置主机名字hadoop01。并在hosts文件中配置集群的ip和主机名。其余两个主机的操作与此类似

[root@hadoop01 ~]# cat /etc/sysconfig/network # created by anaconda networking = yes hostname = hadoop01   [root@hadoop01 ~]# cat /etc/hosts 127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1     localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.101 hadoop01 192.168.56.102 hadoop02 192.168.56.103 hadoop03

永久关闭防火墙

service iptables stop(1.下次重启机器后，防火墙又会启动，故需要永久关闭防火墙的命令；2由于用的是centos 7,关闭防火墙的命令如下）

systemctl stop firewalld.service    #停止firewallsystemctl disable firewalld.service #禁止firewall开机启动

关闭selinux防护系统

改为disabled 。reboot重启机器，使配置生效

[root@hadoop02 ~]# cat /etc/sysconfig/selinux  # this file controls the state of selinux on the system # selinux= can take one of these three values: #   enforcing - selinux security policy is enforced  #   permissive - selinux prints warnings instead of enforcing #   disabled - no selinux policy is loaded selinux=disabled # selinuxtype= can take one of three two values: #   targeted - targeted processes are protected, #   minimum - modification of targeted policy only selected processes are protected #   mls - multi level security protection selinuxtype=targeted

集群ssh免密码登录

首先设置ssh密钥

ssh-keygen -t rsa

拷贝ssh密钥到三台机器

ssh-copy-id 192.168.56.101

ssh-copy-id 192.168.56.102

ssh-copy-id 192.168.56.103

这样如果hadoop01的机器想要登录hadoop02，直接输入ssh hadoop02

ssh hadoop02

配置jdk

这里在/home忠诚创建三个文件夹中

tools--存放工具包

softwares--存放软件

data--存放数据

通过winscp将下载好的linux jdk上传到hadoop01的/home/tools中

解压缩jdk到softwares中

tar -zxf jdk-7u76-linux-x64.tar.gz -c /home/softwares

可见jdk的家目录在/home/softwares/jdk.x.x.x，将该目录拷贝粘贴到/etc/profile文件中，并且在文件中设置java_home

export java_home=/home/softwares/jdk0_111 export path=$path:$java_home/bin

保存修改,执行source /etc/profile使配置生效

查看java jdk是否安装成功:

java -version

可以将当前节点中设置的文件拷贝到其他节点

scp -r /home/* root@192.168.56.10x:/home

hadoop集群安装

集群的规划如下：

101节点作为hdfs的namenode ,其余作为datanode;102作为yarn的resourcemanager，其余作为nodemanager。103作为secondarynamenode。分别在101和102节点启动jobhistoryserver和webappproxyserver

下载hadoop-2.7.3

并将其放在/home/softwares文件夹中。由于hadoop需要jdk的安装环境，所以首先配置/etc/hadoop/hadoop-env.sh的java_home

（ps：感觉我用的jdk版本过高了）

接下来依次修改hadoop相应组件对应的xml

修改core-site.xml ：

指定namenode地址

修改hadoop的缓存目录

hadoop的垃圾回收机制

        fsdefaultfs     hdfs://101:8020           hadooptmpdir     /home/softwares/hadoop-3/data/tmp           fstrashinterval     10080

hdfs-site.xml

设置备份数目

关闭权限

设置http访问接口

设置secondary namenode 的ip地址

        dfsreplication     3           dfspermissionsenabled     false           dfsnamenodehttp-address     101:50070           dfsnamenodesecondaryhttp-address     103:50090

修改mapred-site.xml.template名字为mapred-site.xml

指定mapreduce的框架为yarn，通过yarn来调度

指定jobhitory

指定jobhitory的web端口

开启uber模式--这是针对mapreduce的优化

        mapreduceframeworkname     yarn           mapreducejobhistoryaddress     101:10020           mapreducejobhistorywebappaddress     101:19888           mapreducejobubertaskenable     true

修改yarn-site.xml

指定mapreduce为shuffle

指定102节点为resourcemanager

指定102节点的安全代理

开启yarn的日志

指定yarn日志删除时间

指定nodemanager的内存：8g

指定nodemanager的cpu：8核

          yarnnodemanageraux-services     mapreduce_shuffle           yarnresourcemanagerhostname     102           yarnweb-proxyaddress     102:8888           yarnlog-aggregation-enable     true           yarnlog-aggregationretain-seconds     604800           yarnnodemanagerresourcememory-mb     8192           yarnnodemanagerresourcecpu-vcores     8

配置slaves

指定计算节点，即运行datanode和nodemanager的节点

192.168.56.101
192.168.56.102
192.168.56.103

先在namenode节点格式化，即101节点上执行：

进入到hadoop主目录： cd /home/softwares/hadoop-3

执行bin目录下的hadoop脚本： bin/hadoop namenode -format

出现successful format才算是执行成功（ps，这里是盗用别人的图，不要介意哈）

以上配置完成后，将其拷贝到其他的机器

hadoop环境测试

进入hadoop主目录下执行相应的脚本文件

jps命令--java virtual machine process status，显示运行的java进程

在namenode节点101机器上开启hdfs

[root@hadoop01 hadoop-3]# sbin/start-dfssh  java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack' 16/11/07 16:49:19 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-namenode-hadoopout 102: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 103: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 101: starting datanode, logging to /home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout starting secondary namenodes [hadoop03] hadoop03: starting secondarynamenode, logging to /home/softwares/hadoop-3/logs/hadoop-root-secondarynamenode-hadoopout

此时101节点上执行jps，可以看到namenode和datanode已经启动

[root@hadoop01 hadoop-3]# jps 7826 jps 7270 datanode 7052 namenode

在102和103节点执行jps，则可以看到datanode已经启动

[root@hadoop02 bin]# jps 4260 datanode 4488 jps  [root@hadoop03 ~]# jps 6436 secondarynamenode 6750 jps 6191 datanode

启动yarn

在102节点执行

[root@hadoop02 hadoop-3]# sbin/start-yarnsh  starting yarn daemons starting resourcemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-resourcemanager-hadoopout 101: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 103: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 102: starting nodemanager, logging to /home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout

jps查看各节点：

[root@hadoop02 hadoop-3]# jps 4641 resourcemanager 4260 datanode 4765 nodemanager 5165 jps   [root@hadoop01 hadoop-3]# jps 7270 datanode 8375 jps 7976 nodemanager 7052 namenode   [root@hadoop03 ~]# jps 6915 nodemanager 6436 secondarynamenode 7287 jps 6191 datanode

分别启动相应节点的jobhistory和防护进程

[root@hadoop01 hadoop-3]# sbin/mr-jobhistory-daemonsh start historyserver starting historyserver, logging to /home/softwares/hadoop-3/logs/mapred-root-historyserver-hadoopout [root@hadoop01 hadoop-3]# jps 8624 jps 7270 datanode 7976 nodemanager 8553 jobhistoryserver 7052 namenode  [root@hadoop02 hadoop-3]# sbin/yarn-daemonsh start proxyserver starting proxyserver, logging to /home/softwares/hadoop-3/logs/yarn-root-proxyserver-hadoopout [root@hadoop02 hadoop-3]# jps 4641 resourcemanager 4260 datanode 5367 webappproxyserver 5402 jps 4765 nodemanager

在hadoop01节点，即101节点上，通过浏览器查看节点状况

hdfs上传文件

[root@hadoop01 hadoop-3]# bin/hdfs dfs -put /etc/profile /profile

运行wordcount程序

[root@hadoop01 hadoop-3]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-jar wordcount /profile /fll_out java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack' 16/11/07 17:17:10 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable 16/11/07 17:17:12 info clientrmproxy: connecting to resourcemanager at /102:8032 16/11/07 17:17:18 info inputfileinputformat: total input paths to process : 1 16/11/07 17:17:19 info mapreducejobsubmitter: number of splits:1 16/11/07 17:17:19 info mapreducejobsubmitter: submitting tokens for job: job_1478509135878_0001 16/11/07 17:17:20 info implyarnclientimpl: submitted application application_1478509135878_0001 16/11/07 17:17:20 info mapreducejob: the url to track the job: http://102:8888/proxy/application_1478509135878_0001/ 16/11/07 17:17:20 info mapreducejob: running job: job_1478509135878_0001 16/11/07 17:18:34 info mapreducejob: job job_1478509135878_0001 running in uber mode : true 16/11/07 17:18:35 info mapreducejob: map 0% reduce 0% 16/11/07 17:18:43 info mapreducejob: map 100% reduce 0% 16/11/07 17:18:50 info mapreducejob: map 100% reduce 100% 16/11/07 17:18:55 info mapreducejob: job job_1478509135878_0001 completed successfully 16/11/07 17:18:59 info mapreducejob: counters: 52     file system counters         file: number of bytes read=4264         file: number of bytes written=6412         file: number of read operations=0         file: number of large read operations=0         file: number of write operations=0         hdfs: number of bytes read=3940         hdfs: number of bytes written=261673         hdfs: number of read operations=35         hdfs: number of large read operations=0         hdfs: number of write operations=8     job counters          launched map tasks=1         launched reduce tasks=1         other local map tasks=1         total time spent by all maps in occupied slots (ms)=8246         total time spent by all reduces in occupied slots (ms)=7538         total_launched_ubertasks=2         num_uber_submaps=1         num_uber_subreduces=1         total time spent by all map tasks (ms)=8246         total time spent by all reduce tasks (ms)=7538         total vcore-milliseconds taken by all map tasks=8246         total vcore-milliseconds taken by all reduce tasks=7538         total megabyte-milliseconds taken by all map tasks=8443904         total megabyte-milliseconds taken by all reduce tasks=7718912     map-reduce framework         map input records=78         map output records=256         map output bytes=2605         map output materialized bytes=2116         input split bytes=99         combine input records=256         combine output records=156         reduce input groups=156         reduce shuffle bytes=2116         reduce input records=156         reduce output records=156         spilled records=312         shuffled maps =1         failed shuffles=0         merged map outputs=1         gc time elapsed (ms)=870         cpu time spent (ms)=1970         physical memory (bytes) snapshot=243326976         virtual memory (bytes) snapshot=2666557440         total committed heap usage (bytes)=256876544     shuffle errors         bad_id=0         connection=0         io_error=0         wrong_length=0         wrong_map=0         wrong_reduce=0     file input format counters          bytes read=1829     file output format counters          bytes written=1487

浏览器中通过yarn查看运行状态

查看最后的词频统计结果

浏览器中查看hdfs的文件系统

[root@hadoop01 hadoop-3]# bin/hdfs dfs -cat /fll_out/part-r-00000 java hotspot(tm) client vm warning: you have loaded library /home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack' 16/11/07 17:29:17 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable !=   1 "$-"  1 "$2"  1 "$euid" 2 "$histcontrol" 1 "$i"  3 "${-#*i}"    1 "0"   1 ":${path}:"   1 "`id  2 "after" 1 "ignorespace"  1 #    13 $uid  1 &&   1 ()   1 *)   1 *:"$1":*)    1 -f   1 -gn`"  1 -gt   1 -r   1 -ru`  1 -u`   1 -un`"  2 -x   1 -z   1     2 /etc/bashrc   1 /etc/profile  1 /etc/profiled/ 1 /etc/profiled/*sh   1 /usr/bin/id   1 /usr/local/sbin 2 /usr/sbin    2 /usr/share/doc/setup-*/uidgid  1 002   1 022   1 199   1 200   1 2>/dev/null`  1 ;    3 ;;   1 =    4 >/dev/null   1 by   1 current 1 euid=`id    1 functions    1 histcontrol   1 histcontrol=ignoreboth 1 histcontrol=ignoredups 1 histsize    1 histsize=1000  1 hostname    1 hostname=`/usr/bin/hostname   1 it's  2 java_home=/home/softwares/jdk0_111 1 logname 1 logname=$user  1 mail  1 mail="/var/spool/mail/$user"  1 not   1 path  1 path=$1:$path  1 path=$path:$1  1 path=$path:$java_home/bin    1 path  1 system 1 this  1 uid=`id 1 user  1 user="`id    1 you   1 [    9 ]    3 ];   6 a    2 after  2 aliases 1 and   2 are   1 as   1 better 1 case  1 change 1 changes 1 check  1 could  1 create 1 custom 1 customsh    1 default,    1 do   1 doing 1 done  1 else  5 environment   1 environment,  1 esac  1 export 5 fi   8 file  2 for   5 future 1 get   1 go   1 good  1 i    2 idea  1 if   8 in   6 is   1 it   1 know  1 ksh   1 login  2 make  1 manipulation  1 merging 1 much  1 need  1 pathmunge    6 prevent 1 programs,    1 reservation   1 reserved    1 script 1 set  1 sets  1 setup  1 shell  2 startup 1 system 1 the   1 then  8 this  2 threshold    1 to   5 uid/gids    1 uidgid 1 umask  3 unless 1 unset  2 updates    1 validity    1 want  1 we   1 what  1 wide  1 will  1 workaround   1 you   2 your  1 {    1 }    1

感谢各位的阅读，以上就是"基于CentOS的Hadoop分布式环境如何搭建"的内容了，经过本文的学习后，相信大家对基于CentOS的Hadoop分布式环境如何搭建这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是，小编将为大家推送更多相关知识点的文章，欢迎关注！

很赞哦！