千家信息网

【3】搭建HA高可用hadoop-2.3(部署配置hadoop--cdh5.1.0)

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,【1】搭建HA高可用hadoop-2.3(规划+环境准备)【2】搭建HA高可用hadoop-2.3(安装zookeeper)【3】搭建HA高可用hadoop-2.3(部署配置hadoop--cdh6.
千家信息网最后更新 2025年01月23日【3】搭建HA高可用hadoop-2.3(部署配置hadoop--cdh5.1.0)

【1】搭建HA高可用hadoop-2.3(规划+环境准备)

【2】搭建HA高可用hadoop-2.3(安装zookeeper)

【3】搭建HA高可用hadoop-2.3(部署配置hadoop--cdh6.1.0)

【4】搭建HA高可用hadoop-2.3(部署配置HBase)





安装部署hadoop

(1)安装hadoop

  • master1、master2、slave1、slave2、slave3

#cd /opt/ #tar xf  hadoop-2.3.0-cdh6.1.0.tar.gz#ln -s ln -s  hadoop-2.3.0-cdh6.1.0 hadoop

(2)添加hadoop环境变量

  • master1、master2、slave1、slave2、slave3

#cat >> /etc/profile <

(3)配置hadoop

主要配置文件

(hadoop-2.3.0-cdh6.1.0 /etc/hadoop/)

格式作用
hadoop-env.shbash脚本hadoop需要的环境变量
core-site.xmlxmlhadoop的core的配置项
hdfs-site.xmlxmlhdfs的守护进程配置,包括namenode、datanode
slaves纯文本datanode的节点列表(每行一个)
mapred-env.shbash脚本mapreduce需要的环境变量
mapre-site.xmlxmlmapreduce的守护进程配置
yarn-env.shbash脚本yarn需要的环境变量
yarn-site.xmlxmlyarn的配置项

以下1-8的配置,所有机器都相同,可先配置一台,将配置统一copy到另外几台机器。

  • master1、master2、slave1、slave2、slave3

1:配置hadoop-env.sh

cat >> hadoop-env.sh  <

2:配置core-site.xml

#mkdir -p /data/hadoop/tmp#vim  core-site.xml                    fs.defaultFS        hdfs://mycluster                        hadoop.tmp.dir        /data/hadoop/tmp                        ha.zookeeper.quorum        master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181    


3:配置hdfs-site.xml

#mkdir -p /data/hadoop/dfs/{namenode,datanode}#mkdir -p /data/hadoop/ha/journal#vim hdfs-site.xml                    dfs.webhdfs.enabled        true                        dfs.replication        3                        dfs.namenode.name.dir        file:/data/hadoop/dfs/namenode                        dfs.datanode.data.dir        file:/data/hadoop/dfs/datanode                        dfs.permissions        false                        dfs.permissions.enabled        false                            dfs.nameservices        mycluster                        dfs.ha.namenodes.mycluster        namenode1,namenode2                        dfs.namenode.rpc-address.mycluster.namenode1        master1:9000                        dfs.namenode.rpc-address.mycluster.namenode2        master2:9000                        dfs.namenode.http-address.mycluster.namenode1        master1:50070                dfs.namenode.http-address.mycluster.namenode2        master2:50070                        dfs.namenode.servicerpc-address.mycluster.namenode1        master1:53310                dfs.namenode.servicerpc-address.mycluster.namenode2        master2:53310                        dfs.namenode.shared.edits.dir        qjournal://master1:8485;master2:8485;slave1:8485;slave2:8485;slave3:8485/mycluster                        dfs.journalnode.edits.dir        /data/hadoop/ha/journal                        dfs.client.failover.proxy.provider.mycluster        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider                        dfs.ha.automatic-failover.enabled        true                        ha.zookeeper.quorum        master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181                        dfs.ha.fencing.methods        sshfence                        dfs.ha.fencing.ssh.private-key-files        /root/.ssh/id_rsa    


4:配置mapred-env.sh

cat >> mapred-env.sh  <

5:配置mapred-site.xml

    mapreduce.framework.name    yarn


6:配置yarn-env.sh

cat >> yarn-env.sh  <


7:配置yarn-site.xml

#mkdir -p /data/hadoop/yarn/local#mkdir -p /data/hadoop/logs#chown -R hadoop /data/hadoop#vim yarn-site.xml[an error occurred while processing the directive]               yarn.resourcemanager.connect.retry-interval.ms      2000                     yarn.resourcemanager.ha.enabled      true                  yarn.resourcemanager.ha.automatic-failover.enabled      true                  yarn.resourcemanager.ha.rm-ids      rm1,rm2            yarn.resourcemanager.ha.id      rm1      If we want to launch more than one RM in single node, we need this configuration                  yarn.resourcemanager.recovery.enabled      true                                yarn.resourcemanager.zk-state-store.address      localhost:2181            yarn.resourcemanager.store.class      org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore            yarn.resourcemanager.zk-address      localhost:2181            yarn.resourcemanager.cluster-id      yarncluster                  yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms      5000                     yarn.resourcemanager.address.rm1      master1:8032                  yarn.resourcemanager.scheduler.address.rm1      master1:8030                  yarn.resourcemanager.webapp.address.rm1      master1:8088                  yarn.resourcemanager.resource-tracker.address.rm1      master1:8031                  yarn.resourcemanager.admin.address.rm1      master1:8033            yarn.resourcemanager.ha.admin.address.rm1      master1:8035                     yarn.resourcemanager.address.rm2      master2:8032                  yarn.resourcemanager.scheduler.address.rm2      master2:8030                  yarn.resourcemanager.webapp.address.rm2      master2:8088                  yarn.resourcemanager.resource-tracker.address.rm2      master2:8031                  yarn.resourcemanager.admin.address.rm2      master2:8033            yarn.resourcemanager.ha.admin.address.rm2      master2:8035                  Address where the localizer IPC is.      yarn.nodemanager.localizer.address      0.0.0.0:8040                  NM Webapp address.      yarn.nodemanager.webapp.address      0.0.0.0:8042            yarn.nodemanager.aux-services      mapreduce_shuffle            yarn.nodemanager.aux-services.mapreduce.shuffle.class      org.apache.hadoop.mapred.ShuffleHandler            yarn.nodemanager.local-dirs      /data/hadoop/yarn/local            yarn.nodemanager.log-dirs      /data/hadoop/logs            mapreduce.shuffle.port      8050            yarn.client.failover-proxy-provider      org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider   


8:配置slaves

cat >> slaves <

配置完毕



启动集群

(1)格式化命名空间

  • master1

#/opt/hadoop/bin/hdfs zkfc -formatZK


(2)启动journalnode

  • master1、master2、slave1、slave2、slave3 (集群内随意算则奇数台机器作为journalnode,三台也可以)

#/opt/hadoop/sbin/hadoop-daemon.sh start journalnode

(3)master1节点格式化,并启动namenode

  • master1

格式化namenode的目录

#/opt/hadoop/bin/hadoop namenode -format mycluster

启动namenode

#/opt/hadoop/sbin/hadoop-daemon.sh start namenode


(4)master2节点同步master1的格式化目录,并启动namenode

  • master2

从master1将格式化的目录同步过来

#/opt/hadoop/bin/hdfs namenode -bootstrapStandby

启动namenode

#/opt/hadoop/sbin/hadoop-daemon.sh start namenode


(5)master节点启动zkfs

  • master1、master2

#/opt/hadoop/sbin/hadoop-daemon.sh start zkfc

(6)slave节点启动datanode

  • slave1、slave2、slave3

#/opt/hadoop/sbin/hadoop-daemon.sh start datanode


(7)master节点启动yarn

  • master1

#/opt/hadoop/sbin/start-yarn.sh

(8)master节点启动historyserver

  • master1

./mr-jobhistory-daemon.sh start historyserver


集群已启动。在各服务器执行jps查看,两个master上各一个namenode,形成namenode高可用,实现故障自动切换。




【1】搭建HA高可用hadoop-2.3(规划+环境准备)

【2】搭建HA高可用hadoop-2.3(安装zookeeper)

【3】搭建HA高可用hadoop-2.3(部署配置hadoop--cdh6.1.0)

【4】搭建HA高可用hadoop-2.3(部署配置HBase)


0