千家信息网

spark-2.4.2.tgz下载及编译

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,51CTO没有目录功能么?好难受========有任何问题欢迎加企鹅讨论^-^1176738641========前期准备文件夹创建#用户目录下创建五个文件夹app #存放应
千家信息网最后更新 2025年01月23日spark-2.4.2.tgz下载及编译

51CTO没有目录功能么?好难受

========
有任何问题欢迎加企鹅讨论^-^
1176738641

========

前期准备

文件夹创建

#用户目录下创建五个文件夹app              #存放应用software      #存放应用压缩包data            #存放测试数据lib               #存放jar包source       #存放源码

下载需要的软件及版本

  • apache-maven-3.6.1-bin.tar.gz
  • hadoop-2.6.0-cdh6.14.0.tar.gz
  • jdk-8u131-linux-x64.tar.gz
  • scala-2.11.8.tgz

安装jdk8

卸载现有jdk

rpm -qa|grep java# 如果安装的版本低于1.7,卸载该jdkrpm -e 软件包1 软件包2

解压jdk到~/app目录下

tar -zxf jdk-8u131-linux-x64.tar.gz -C ~/app/

测试jdk8安装成功

~/app/jdk1.8.0_131/bin/java -versionjava version "1.8.0_131"Java(TM) SE Runtime Environment (build 1.8.0_131-b11)Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

版本信息正常打印,说明安装成功

配置环境变量

切记 >>为追加!>为覆盖!一定不要打成>

echo "####JAVA_HOME####"echo "export JAVA_HOME=/home/max/app/jdk1.8.0_131" >> ~/.bash_profile  echo "export PATH=$JAVA_HOME/bin:$PATH" >> ~/.bash_profile # 刷新环境变量source ~/.bash_profile

此时,在任意目录下,使用java -version都可生效

安装maven

解压到~/app/

tar -zxvf  apache-maven-3.6.1-bin.tar.gz -C ~/app

测试maven安装成功

~/app/apache-maven-3.6.1/bin/mvn -vApache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-04T15:00:29-04:00)Maven home: /home/max/app/apache-maven-3.6.1Java version: 1.8.0_131, vendor: Oracle Corporation, runtime: /home/max/app/jdk1.8.0_131/jreDefault locale: en_US, platform encoding: UTF-8OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"

显示出版本信息,说明安装成功

添加环境变量

切记 >>为追加!>为覆盖!一定不要打成>

echo "####MAVEN_HOME####" >> ~/.bash_profileecho "export MAVEN_HOME=/home/max/app/apache-maven-3.6.1/" >> ~/.bash_profileecho "export PATH=$MAVEN_HOME/bin:$PATH" >> ~/.bash_profile # 刷新环境变量source ~/.bash_profile 

此时,在任意目录下,使用mvn -v 都可生效

配置本地仓库目录&&远程仓库地址

# 创建本地仓库文件夹mkdir ~/maven_repo# 修改settings.xml文件vim $MAVEN_HOME/conf/settings.xml

注意标签!别与已经存在的标签冲突

  /home/max/maven_repo        nexus-aliyun    *,!cloudera    Nexus aliyun                               http://maven.aliyun.com/nexus/content/groups/public    

安装Scala

解压到~/app/

tar -zxf scala-2.11.8.tgz -C ~/app/

测试scala安装成功

~/app/scala-2.11.8/bin/scala scala> [max@hadoop000 scala-2.11.8]$ scalaWelcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).Type in expressions for evaluation. Or try :help.scala> 

安装成功

添加环境变量

echo "####SCALA_HOME####" >> ~/.bash_profileecho "export SCALA_HOME=/home/max/app/scala-2.11.8" >> ~/.bash_profileecho "export PATH=$SCALA_HOME/bin:$PATH" >> ~/.bash_profile # 刷新环境变量source ~/.bash_profile 

此时,在任意目录下,使用scala 都可生效

安装Git

默认为CentOS用户

sudo yum install git#自动安装,期间需要按几次y#显示如下Installed:  git.x86_64 0:1.7.1-9.el6_9                                                                          Dependency Installed:  perl-Error.noarch 1:0.17015-4.el6                  perl-Git.noarch 0:1.7.1-9.el6_9                 Dependency Updated:  openssl.x86_64 0:1.0.1e-57.el6                                                                      Complete!#安装成功

前期工作终于完事儿了!!!!!
====================累成狗的分割线=========================
其实漫长的编译之路才刚刚开始

下载&编译Spark源码

祭出大杀器!===>参考官网

下载&解压Spark2.4.2源码

cd ~/sourcewget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz#有时候贼慢[max@hadoop000 source]$ lltotal 15788-rw-rw-r--. 1 max max 16165557 Apr 28 12:27 spark-2.4.2.tgz[max@hadoop000 source]$ tar -zxf spark-2.4.2.tgz 

关于Maven

我们不使用mvn这个命令,直接用make-distribution.sh脚本,但是需要修改一下

#spark-2.4.2文件夹下vim ./dev/make-distribution.sh#将这些行注释掉    此处为最佳实践,为的是通过指定版本号减少编译时间#VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\#    | grep -v "INFO"\#    | grep -v "WARNING"\#    | tail -n 1)#SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\#    | grep -v "INFO"\#    | grep -v "WARNING"\#    | tail -n 1)#SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\#    | grep -v "INFO"\#    | grep -v "WARNING"\#    | tail -n 1)#SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\#    | grep -v "INFO"\#    | grep -v "WARNING"\#    | fgrep --count "hive";\#    # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\#    # because we use "set -o pipefail"#    echo -n)##添加一下参数,注意,版本号要对应自己想要的生产环境VERSION=2.4.2SCALA_VERSION=2.11SPARK_HADOOP_VERSION=hadoop-2.6.0-cdh6.14.0SPARK_HIVE=1

修改pom.xml

在maven默认的库里默认只有apache版本的Hadoop依赖,但由于我们hadoop版本是hadoop-2.6.0-cdh6.14.0,我们需要在pom文件里添加CDH仓库

#spark-2.4.2文件夹下vim pom.xml
                central        http://maven.aliyun.com/nexus/content/groups/public//                    true                            true            always            fail                        cloudera        https://repository.cloudera.com/artifactory/cloudera-repos/    

开始编译

./dev/make-distribution.sh \--name hadoop-2.6.0-cdh6.14.0  \--tgz \-Phadoop-2.6 \-Dhadoop.version=2.6.0-cdh6.14.0 \-Phive -Phive-thriftserver  \-Pyarn \-Pkubernetes

第一次大约需要编译 1h,我是阿里云镜像
再次编译大约就需要10min

注:报错的话一定要学会看报错日志!

##编译完成

#编译成功最后一部分日志+ mkdir /home/max/source/spark-2.4.2/dist/conf+ cp /home/max/source/spark-2.4.2/conf/docker.properties.template /home/max/source/spark-2.4.2/conf/fairscheduler.xml.template /home/max/source/spark-2.4.2/conf/log4j.properties.template /home/max/source/spark-2.4.2/conf/metrics.properties.template /home/max/source/spark-2.4.2/conf/slaves.template /home/max/source/spark-2.4.2/conf/spark-defaults.conf.template /home/max/source/spark-2.4.2/conf/spark-env.sh.template /home/max/source/spark-2.4.2/dist/conf+ cp /home/max/source/spark-2.4.2/README.md /home/max/source/spark-2.4.2/dist+ cp -r /home/max/source/spark-2.4.2/bin /home/max/source/spark-2.4.2/dist+ cp -r /home/max/source/spark-2.4.2/python /home/max/source/spark-2.4.2/dist+ '[' false == true ']'+ cp -r /home/max/source/spark-2.4.2/sbin /home/max/source/spark-2.4.2/dist+ '[' -d /home/max/source/spark-2.4.2/R/lib/SparkR ']'+ '[' true == true ']'+ TARDIR_NAME=spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ TARDIR=/home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ rm -rf /home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ cp -r /home/max/source/spark-2.4.2/dist /home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ tar czf spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0.tgz -C /home/max/source/spark-2.4.2 spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0+ rm -rf /home/max/source/spark-2.4.2/spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0

由此可以看出编译后的包在spark源码包下

解压

[max@hadoop000 spark-2.4.2]$ tar -zxf spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0.tgz -C ~/app/[max@hadoop000 spark-2.4.2]$ cd ~/app/[max@hadoop000 app]$ lltotal 16drwxrwxr-x.  6 max max 4096 Apr 28 17:02 apache-maven-3.6.1drwxr-xr-x.  8 max max 4096 Mar 15  2017 jdk1.8.0_131drwxrwxr-x.  6 max max 4096 Mar  4  2016 scala-2.11.8drwxrwxr-x. 11 max max 4096 Apr 28 21:20 spark-2.4.2-bin-hadoop-2.6.0-cdh6.14.0

完事儿!

0