千家信息网

如何在hadoop YARN上运行spark-shell

发表于:2025-02-01 作者:千家信息网编辑
千家信息网最后更新 2025年02月01日,这篇文章主要为大家展示了"如何在hadoop YARN上运行spark-shell",内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下"如何在hadoop YAR
千家信息网最后更新 2025年02月01日如何在hadoop YARN上运行spark-shell

这篇文章主要为大家展示了"如何在hadoop YARN上运行spark-shell",内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下"如何在hadoop YARN上运行spark-shell"这篇文章吧。

    1. spark模式架构图![](https://cache.yisu.com/upload/information/20210522/355/683134.png "在这里输入图片标题")2. Scala下载安装        a. 官网: http://www.scala-alng.org/files/archive/        b. 选择好版本,复制链接,使用wget 命令下载        wget http://www.scala-alng.org/files/archive/scala-2.11.6.tgz        c. 解压        tar xvf scala-2.11.6.tgz        sudo mv scala-2.11.6 /usr/local/scala    # 将scala移动到/usr/local目录        d. 设置环境变量        sudo gedit  ~/.bashrc                export SCALA_HOME=/usr/local/scala                export PATH=$PATH:$SCALA_HOME/bin        source ~/.bashrc   # 使配置生效        e. 启动scala        hduser[@master](https://my.oschina.net/u/48054):~$ scala3. Spark安装        a. 官网: http://spark.apache.org/downloads.html         b. 选择版本1.4 || Pre-built for Hadoop 2.6 and later || 复制链接使用wget 命令下载        c.  wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz        d. 解压并移动到 /usr/local/spark/        e. 编辑环境变量        f. sudo gedit  ~/.bashrc                export SPARK_HOME=/usr/local/spark                export PATH=$PATH:$SPARK_HOME/bin        g. source ~/.bashrc   # 使配置生效4. 启动spark-shell交互页面        hduser[@master](https://my.oschina.net/u/48054):~$ spark-shell5. 启动hadoop6. 在本地运行spark-shell        a. spark-shell --master local[4]        b. 读取本地文件                val textFile=sc.textFile("file:/usr/local/spark/LREADME.md")                textFile.count7. 在Hadoop Yarn 运行spark-shell         SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar      HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell         SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar   # 设置sparkjar文件路径   HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # 设置hadoop配置文件目录        MASTER=yarn-client    # 设置运行模式是yarn-client        /usr/local/spark/bin/spark-shell    # 要运行的spark-shell的完整路径8. 构建Spark Standalone Cluster执行环境        a. cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh   # 复制模板文件 在进行设置        b. 设置spark-env.sh        c. sudo gedit /usr/local/spark/conf/spark-env.sh                export SPARK_MASTER_IP=master      master的IP                export SPARK_WORKER_CORES=1    每个worker使用的cpu核心                export SPARK_WORKER_MEMORY=600m   每个worker使用的内存                export SPARK_WORKER_INSTANCES=1    设置每个worker实例                # 一定要注意自己的内存                # hadoop+spark 在多个虚拟机上运行起来后8G内存是不够用的 非常耗内存                 # 资源在经过虚拟机后会有比较大的损耗        d. 使用ssh链接data1,data2 并创建spark目录                sudo mkdir /usr/local/spark                sudo chown hduser:hduser /usr/local/spark                # 对data1 和data2执行上面的操作        e. 将master的spark复制到data1,data2上                sudo scp -r /usr/local/spark hduser@data1:/usr/local                sudo scp -r /usr/local/spark hduser@data2:/usr/local        f. 编辑slaves文件                sudo gedit /usr/local/spark/conf/slaves                        data1                        data29. 在Spark Standalone运行spark-shell        a. 启动Spark Standalone Cluster        /usr/local/spark/sbin/start-all.sh         b. 运行        spark-shell --master spark://master:7077        c. 查看Spark Standalone Web UI界面        http://master:8080/        d. 停止Spark Standalone Cluster        /usr/local/spark/sbin/stop-all.sh 10. 命令参考 152  scala  153  jps  154  wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz  155  ping www.baidu.com  156  ssh data3  157  ssh data2  158  ssh data1  159  jps  160  start-all.sh  161  jps  162  spark-shell  163  spark-shell --master local[4]  164  SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar  HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell   165  ssh data2  166  ssh data1  167  cd /usr/local/hadoop/etc/hadoop/  168  ll  169  sudo gedit masters  170  sudo gedit slaves  171  sudo gedit /etc/hosts  172  sudo gedit hdfs-site.xml  173  sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs  174  mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode  175  sudo chown -R hduser:hduser /usr/local/hadoop  176  hadoop namenode -format  177  start-all.sh  178  jps  179  spark-shell  180  SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar  HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell   181  ssh data1  182  ssh data2  183  ssh data1  184  start-all.sh  185  jps  186  cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh  187  sudo gedit /usr/local/spark/conf/spark-env.sh  188  sudo scp -r /usr/local/spark hduser@data1:/usr/local  189  sudo scp -r /usr/local/spark hduser@data2:/usr/local  190  sudo gedit /usr/local/spark/conf/slaves  191  /usr/local/spark/sbin/start-all.sh   192  spark-shell --master spark://master:7077  193  /usr/local/spark/sbin/stop-all.sh   194  jps  195  stop-all.sh  196  history

以上是"如何在hadoop YARN上运行spark-shell"这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注行业资讯频道!

0