Spark集群环境搭建( Standalone模式)

                       Spark Standalone模式集群环境搭建

Spark Standalone模式的搭建需要在集群的每个节点都安装Spark,集群角色分配如下表:

节点

                  角色

centoshadoop1

  Master

centoshadoop2

  Worker

centoshadoop3

  Worker

centoshadoop4

  Worker

一:下载scala安装包

下载地址如下:
https://www.scala-lang.org/download/2.12.7.html

 

                                            Spark集群环境搭建( Standalone模式)

执行以下命令安装scala

mkdir -p /home/hadoop/scala

解压scala-2.12.7.tgz安装包到安装目录scala

tar -zxvf  ~/tools/scala-2.12.7.tgz  -C  /home/hadoop/scala/

配置scala的环境变量

vi ~/.bash_profile

# scala

export SCALA_HOME=/home/hadoop/scala/scala-2.12.7

export PATH=$PATH:$SCALA_HOME/bin

 

source ~/.bash_profile

在任意目录执行: scala -version

Scala code runner version 2.12.7 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.

在任意目录执行scala进入命令行模式

 

scala> val str:String="yanghong"

str: String = yanghong

 

 

二:下载spark

下载地址如下:

http://spark.apache.org/downloads.html

在 Choose a Spark release 中选择自己的版本2.4.5

在 Choose a package type 中选择2.7 and later

点击 Download Spark 后面的tgz文件下载即可

 

                                         Spark集群环境搭建( Standalone模式)

执行如下命令安装spark

mkdir -p /home/hadoop/spark

解压安装包spark-2.4.5-bin-hadoop2.7.tgz到~/spark安装目录

tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz -C /home/hadoop/spark/

修改slaves配置文件

slave文件必须包含所有需要启动的Worker节点的主机名,且每个主机名占一行.

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/conf

cp slaves.template slaves

cp spark-env.sh.template spark-env.sh

 

vi slaves

Centoshadoop2

Centoshadoop3

Centoshadoop4

上述配置的将三个节点设置为集群的从节点(Worker节点)

 

修改spark-env.sh配置文件添加如下内容

export JAVA_HOME=/usr/local/java/jdk1.8.0_192

export SPARK_MASTER_IP=centoshadoop1

export SPARK_MASTER_PORT=7077

上述配置属性的解析:

JAVA_HOME:指定JAVA_HOME的路径。若集群中的每个节点在/etc/profile文件中都配置了JAVA_HOME,则该选项可以省略,Spark集群启动时会自动读取。为了防止出错,建议此处将该选项配置上。

SPARK_MASTER_IP:指定集群主节点(Master)的主机名或者IP地址,此处为centoshadoop1。

SPARK_MASTER_PORT:指定Master节点的访问端口。默认为7077

 

复制Spark 安装文件到其他节点

scp -r ~/spark/  [email protected]:~

scp -r ~/spark/  [email protected]:~

scp -r ~/spark/  [email protected]:~

 

 

scp -r scala/  [email protected]:~

scp -r scala/  [email protected]:~

scp -r scala/  [email protected]:~

 

scp -r  ~/.bash_profile  [email protected]:~

启动Spark集群

在centoshadoop1节点上进入Spark安装目录,执行以下命令,启动集群

cd /home/hadoop/spark/spark-2.4.5-bin-hadoop2.7

sbin/start-all.sh

输出日志为:

starting org.apache.spark.deploy.master.Master, logging to

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-centoshadoop1.out

centoshadoop3: starting org.apache.spark.deploy.worker.Worker, logging to

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop3.out

centoshadoop2: starting org.apache.spark.deploy.worker.Worker, logging to

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop2.out

centoshadoop1: starting org.apache.spark.deploy.worker.Worker, logging to

/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop1.out

 

监控各个节点的日志:

cd /home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/

Master节点的日志:

tail -f spark-hadoop-org.apache.spark.deploy.master.Master-1-centoshadoop1.out

20/03/25 12:51:17 INFO SecurityManager: Changing modify acls to: hadoop

20/03/25 12:51:17 INFO SecurityManager: Changing view acls groups to:

20/03/25 12:51:17 INFO SecurityManager: Changing modify acls groups to:

20/03/25 12:51:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()

20/03/25 12:51:17 INFO Utils: Successfully started service 'sparkMaster' on port 7077.

20/03/25 12:51:17 INFO Master: Starting Spark master at spark://centoshadoop1:7077

20/03/25 12:51:17 INFO Master: Running Spark version 2.4.5

20/03/25 12:51:18 INFO Utils: Successfully started service 'MasterUI' on port 8080.

20/03/25 12:51:18 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://centoshadoop1:8080

20/03/25 12:51:18 INFO Master: I have been elected leader! New state: ALIVE

tail -f spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop2.out

从节点的日志:

Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: 没有到主机的路由: centoshadoop1/192.168.227.140:7077

只需要在节点一上执行如下命令

firewall-cmd --zone=public --add-port=8080/tcp --permanent

firewall-cmd --zone=public --add-port=7077/tcp --permanent

firewall-cmd --reload

 

http://192.168.227.140:8080/    查看Spark的Web界面

Spark集群环境搭建( Standalone模式)

为了防止后续出错,必须在spark-env.sh中的SPARK_MASTER_IP属性指定的节点中启动Spark集群。