Hadoop大数据生态系统测试环境构建——基于CentOS7.8部署Hadoop3.1.4集群

1、准备三台测试机器并配置好网络和免密登录,
    配置4G 双核 500G ,系统 CentOS Linux release 7.8.2003 (Core)(如果觉得麻烦可以在虚拟机上搭建)   
       ip和hostname分别是:
        192.168.236.128  Master.Hadoop
        192.168.236.129  Slave1.Hadoop
        192.168.236.130  Slave2.Hadoop

我们可以先简单试下有没有问题

[[email protected] sbin]# ping -c 3 Slave1.Hadoop
PING Slave1.Hadoop (192.168.236.129) 56(84) bytes of data.
64 bytes from Slave1.Hadoop (192.168.236.129): icmp_seq=1 ttl=64 time=0.183 ms
64 bytes from Slave1.Hadoop (192.168.236.129): icmp_seq=2 ttl=64 time=0.750 ms
64 bytes from Slave1.Hadoop (192.168.236.129): icmp_seq=3 ttl=64 time=0.372 ms

--- Slave1.Hadoop ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.183/0.435/0.750/0.235 ms

##

[[email protected] sbin]# ping -c 3 Slave2.Hadoop
PING Slave2.Hadoop (192.168.236.130) 56(84) bytes of data.
64 bytes from Slave2.Hadoop (192.168.236.130): icmp_seq=1 ttl=64 time=0.271 ms
64 bytes from Slave2.Hadoop (192.168.236.130): icmp_seq=2 ttl=64 time=0.272 ms
64 bytes from Slave2.Hadoop (192.168.236.130): icmp_seq=3 ttl=64 time=0.287 ms

--- Slave2.Hadoop ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.271/0.276/0.287/0.020 ms

##

[[email protected] hadoop]# ping -c 3 Master.Hadoop
PING Master.Hadoop (192.168.236.128) 56(84) bytes of data.
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=1 ttl=64 time=0.205 ms
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=2 ttl=64 time=0.660 ms
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=3 ttl=64 time=0.610 ms

--- Master.Hadoop ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.205/0.491/0.660/0.205 ms

##


[[email protected] hadoop]# ping -c 3 Master.Hadoop
PING Master.Hadoop (192.168.236.128) 56(84) bytes of data.
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=1 ttl=64 time=0.218 ms
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=2 ttl=64 time=0.261 ms
64 bytes from Master.Hadoop (192.168.236.128): icmp_seq=3 ttl=64 time=0.547 ms

--- Master.Hadoop ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.218/0.342/0.547/0.146 ms

 

Ok开始行动

下载安装jdk并配置环境变量

export JAVA_HOME=/opt/package/jdk/jdk1.8.0_191
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin

官网下载Hadoop安装包,并上传到各个机器的安装目录下

解压Hadoop并配置环境变量
export HADOOP_HOME=/opt/package/hadoop/hadoop-3.1.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

配置集群相关参数:

##### 修改master 的 core-site.xml

<property>
    <name>fs.defaultFS</name>
    <!-- 注意别的slave机需要识别master主机名,否则将不能与主机hdp-01沟通 -->

    <value>hdfs://Master.Hadoop:9000</value>
  </property>
<property>
    <name>hadoop.tmp.dir</name>
    <!-- 以下为存放临时文件的路径 -->
    <value>/opt/package/hadoop/data/tmp</value>
</property>

https://www.cnblogs.com/mengzj233/p/9756099.html

##### 修改hadoop-env.sh

export JAVA_HOME=/opt/packages/jdk/jdk1.8.0_191


##### 修改hdfs-site.xml


<property>
    <name>dfs.namenode.http-address</name>
    <!-- hserver1 修改为你的机器名或者ip -->
    <value>Master.Hadoop:50070</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/hadoop/name</value>
</property>
<property>
    <name>dfs.replication</name>
    <!-- 备份次数 -->
    <value>1</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/hadoop/data</value>
</property>

###### 修改mapred-site.xml


<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>


#####修改workers

Master.Hadoop
Slave1.Hadoop
Slave2.Hadoop


#####修改yarn-site.xml

<configuration>

    <!-- Site specific YARN configuration properties -->
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>Master.Hadoop</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>1</value>
    </property>
</configuration>

# 初始化并启动

[[email protected] bin]# ./hdfs namenode -format

2020-09-03 22:59:53,117 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Master.Hadoop/192.168.236.128
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.1.4

STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 1e877761e8dadd71effef30e592368f7fe66a61b; compiled by 'gabota' on 2020-07-21T08:05Z
STARTUP_MSG:   java = 1.8.0_191
************************************************************/
2020-09-03 23:06:37,422 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2020-09-03 23:06:37,644 INFO namenode.NameNode: createNameNode [-format]
2020-09-03 23:06:38,756 INFO common.Util: Assuming 'file' scheme for path /hadoop/name in configuration.
2020-09-03 23:06:38,756 INFO common.Util: Assuming 'file' scheme for path /hadoop/name in configuration.
Formatting using clusterid: CID-a5465849-8331-41fe-9130-f9ed2e9f4071
2020-09-03 23:06:38,867 INFO namenode.FSEditLog: Edit logging is async:true
2020-09-03 23:06:38,906 INFO namenode.FSNamesystem: KeyProvider: null
2020-09-03 23:06:38,907 INFO namenode.FSNamesystem: fsLock is fair: true
2020-09-03 23:06:38,907 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2020-09-03 23:06:38,912 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
2020-09-03 23:06:38,912 INFO namenode.FSNamesystem: supergroup          = supergroup
2020-09-03 23:06:38,912 INFO namenode.FSNamesystem: isPermissionEnabled = true
2020-09-03 23:06:38,912 INFO namenode.FSNamesystem: HA Enabled: false
2020-09-03 23:06:38,975 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2020-09-03 23:06:38,988 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2020-09-03 23:06:38,988 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2020-09-03 23:06:38,999 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2020-09-03 23:06:38,999 INFO blockmanagement.BlockManager: The block deletion will start around 2020 Sep 03 23:06:38
2020-09-03 23:06:39,002 INFO util.GSet: Computing capacity for map BlocksMap
2020-09-03 23:06:39,002 INFO util.GSet: VM type       = 64-bit
2020-09-03 23:06:39,003 INFO util.GSet: 2.0% max memory 425.4 MB = 8.5 MB
2020-09-03 23:06:39,003 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2020-09-03 23:06:39,012 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2020-09-03 23:06:39,018 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: defaultReplication         = 1
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: maxReplication             = 512
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: minReplication             = 1
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2020-09-03 23:06:39,019 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2020-09-03 23:06:39,162 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
2020-09-03 23:06:39,206 INFO util.GSet: Computing capacity for map INodeMap
2020-09-03 23:06:39,206 INFO util.GSet: VM type       = 64-bit
2020-09-03 23:06:39,206 INFO util.GSet: 1.0% max memory 425.4 MB = 4.3 MB
2020-09-03 23:06:39,206 INFO util.GSet: capacity      = 2^19 = 524288 entries
2020-09-03 23:06:39,206 INFO namenode.FSDirectory: ACLs enabled? false
2020-09-03 23:06:39,206 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2020-09-03 23:06:39,206 INFO namenode.FSDirectory: XAttrs enabled? true
2020-09-03 23:06:39,206 INFO namenode.NameNode: Caching file names occurring more than 10 times
2020-09-03 23:06:39,209 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2020-09-03 23:06:39,211 INFO snapshot.SnapshotManager: SkipList is disabled
2020-09-03 23:06:39,213 INFO util.GSet: Computing capacity for map cachedBlocks
2020-09-03 23:06:39,213 INFO util.GSet: VM type       = 64-bit
2020-09-03 23:06:39,213 INFO util.GSet: 0.25% max memory 425.4 MB = 1.1 MB
2020-09-03 23:06:39,213 INFO util.GSet: capacity      = 2^17 = 131072 entries
2020-09-03 23:06:39,231 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2020-09-03 23:06:39,231 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2020-09-03 23:06:39,231 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2020-09-03 23:06:39,236 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2020-09-03 23:06:39,237 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2020-09-03 23:06:39,238 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2020-09-03 23:06:39,238 INFO util.GSet: VM type       = 64-bit
2020-09-03 23:06:39,245 INFO util.GSet: 0.029999999329447746% max memory 425.4 MB = 130.7 KB
2020-09-03 23:06:39,245 INFO util.GSet: capacity      = 2^14 = 16384 entries
Re-format filesystem in Storage Directory root= /hadoop/name; location= null ? (Y or N) y
2020-09-03 23:06:43,982 INFO namenode.FSImage: Allocated new BlockPoolId: BP-201352944-192.168.236.128-1599188803956
2020-09-03 23:06:43,982 INFO common.Storage: Will remove files: [/hadoop/name/current/VERSION, /hadoop/name/current/seen_txid, /hadoop/name/current/fsimage_0000000000000000000.md5, /hadoop/name/current/fsimage_0000000000000000000]
2020-09-03 23:06:43,997 INFO common.Storage: Storage directory /hadoop/name has been successfully formatted.
2020-09-03 23:06:44,040 INFO namenode.FSImageFormatProtobuf: Saving image file /hadoop/name/current/fsimage.ckpt_0000000000000000000 using no compression
2020-09-03 23:06:44,154 INFO namenode.FSImageFormatProtobuf: Image file /hadoop/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
2020-09-03 23:06:44,167 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2020-09-03 23:06:44,172 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2020-09-03 23:06:44,172 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/192.168.236.128
************************************************************/

[[email protected] sbin]# ./start-all.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [Master.Hadoop]
Last login: Thu Sep  3 22:59:16 EDT 2020 on pts/0
Starting datanodes
Last login: Thu Sep  3 23:08:17 EDT 2020 on pts/0
Starting secondary namenodes [master.hadoop]
Last login: Thu Sep  3 23:08:20 EDT 2020 on pts/0
Starting resourcemanager
Last login: Thu Sep  3 23:08:28 EDT 2020 on pts/0
Starting nodemanagers
Last login: Thu Sep  3 23:08:37 EDT 2020 on pts/0

查看启动情况

[[email protected] sbin]# jps
10036 NodeManager
9302 NameNode
10599 Jps
9643 SecondaryNameNode
9902 ResourceManager


[[email protected] hadoop]# jps
11938 DataNode
12264 Jps

[[email protected] hadoop]# jps
5051 DataNode
5372 Jps

没什么问题,可以看到Master.Hadoop、Slave1.Hadoop、Slave2.Hadoop节点都已正常启动

我们先简单验证下集群是否可用

查看集群状态:


WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.

Configured Capacity: 36477861888 (33.97 GB)
Present Capacity: 29833564160 (27.78 GB)
DFS Remaining: 29833547776 (27.78 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0
Erasure Coded Block Groups: 
    Low redundancy block groups: 0
    Block groups with corrupt internal blocks: 0
    Missing block groups: 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.236.129:9866 (Slave1.Hadoop)
Hostname: Slave1.Hadoop
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 3185786880 (2.97 GB)
DFS Remaining: 15053135872 (14.02 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.53%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 04 07:00:41 EDT 2020
Last Block Report: Fri Sep 04 06:52:53 EDT 2020
Num of Blocks: 0


Name: 192.168.236.130:9866 (Slave2.Hadoop)
Hostname: Slave2.Hadoop
Decommission Status : Normal
Configured Capacity: 18238930944 (16.99 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 3458510848 (3.22 GB)
DFS Remaining: 14780411904 (13.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.04%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Sep 04 07:00:40 EDT 2020
Last Block Report: Fri Sep 04 01:11:59 EDT 2020
Num of Blocks: 0

没什么问题,

在浏览器打开http://192.168.236.128:50070

Hadoop大数据生态系统测试环境构建——基于CentOS7.8部署Hadoop3.1.4集群

Hadoop大数据生态系统测试环境构建——基于CentOS7.8部署Hadoop3.1.4集群

可以看到Master.Hadoop、Slave1.Hadoop、Slave2.Hadoop节点的状态都是正常的。

是不是很简单,接下来就可以基于Hadoop集群构建与线上环境相似的测试环境,导入线上数据愉快的进行开发啦