flume安装与配置

目录

flume

Flume工作原理

flume安装

下载解压

配置环境变量

配置flume-env.sh文件

版本验证

Flume部署示例

Avro

Spool

MongoDB写入

Windows配置flume

配置flume环境变量


flume

centos:https://blog.****.net/qq_39160721/article/details/80255194

Linuxhttps://blog.****.net/u011254180/article/details/80000763

官网:http://flume.apache.org/

文档:http://flume.apache.org/FlumeUserGuide.html

蜜罐系统:Linux

项目用的1.8

下载详细:https://blog.****.net/qq_41910230/article/details/80920873

https://yq.aliyun.com/ask/236859

管道配置详解:http://www.cnblogs.com/gongxijun/p/5661037.html

参考Linux下直接下载(Linux公社):https://www.linuxidc.com/Linux/2016-12/138722.htm

目前搭建时用到的安装包与jar包:https://download.****.net/download/lagoon_lala/10949262

Flume工作原理

Flume的数据流由事件(Event)贯穿始终。事件是Flume的基本数据单位,它携带日志数据(字节数组形式)并且携带有头信息,这些EventAgent外部的Source生成,当Source捕获事件后会进行特定的格式化,然后Source会把事件推入(单个或多个)Channel中。可以把Channel看作是一个缓冲区,它将保存事件直到Sink处理完该事件。Sink负责持久化日志或者把事件推向另一个Source。以下是Flume的一些核心概念:

1Events:一个数据单元,带有一个可选的消息头,可以是日志记录、avro 对象等。

2AgentJVM中一个独立的Flume进程,包含组件SourceChannelSink

3Client:运行于一个独立线程,用于生产数据并将其发送给Agent

4Source:用来消费传递到该组件的Event,Client收集数据,传递给Channel

5Channel:中转Event的一个临时存储,保存Source组件传递过来的Event,其实就是连接 Source Sink ,有点像一个消息队列。

6Sink:从Channel收集数据,运行在一个独立线程。

FlumeAgent为最小的独立运行单位,一个Agent就是一个JVM。单AgentSourceSinkChannel三大组件构成,如下图所示:

flume安装与配置

值得注意的是,Flume提供了大量内置的SourceChannelSink类型。不同类型的SourceChannelSink可以自由组合。组合方式基于用户设置的配置文件,非常灵活。比如:Channel可以把事件暂存在内存里,也可以持久化到本地硬盘上;Sink可以把日志写入HDFSHbaseES甚至是另外一个Source等等。Flume支持用户建立多级流,也就是说多个Agent可以协同工作

flume安装

下载解压

下载命令:wget

flume只需下载二进制文件(bin

flume官网下载:http://mirrors.hust.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

$ wget http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.7.1.tar.gz

操作显示:

$ wget http://mirrors.hust.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz

--2018-12-24 15:36:52-- 

apache-flume-1.8.0-bi 100%[=======================>]  55.97M  5.21MB/s    in 12s    

2018-12-24 15:37:04 (4.83 MB/s) - ‘apache-flume-1.8.0-bin.tar.gz’ saved [58688757/58688757]

$ lsapache-flume-1.8.0-bin.tar.gz

$ tar -xvf flume-ng-1.6.0-cdh5.7.1.tar.gz

操作显示:

$ tar -xvf apache-flume-1.8.0-bin.tar.gz

$ ls

apache-flume-1.8.0-bin  apache-flume-1.8.0-bin.tar.gz

$ rm flume-ng-1.6.0-cdh5.7.1.tar.gz

$ mv apache-flume-1.6.0-cdh5.7.1-bin flume-1.6.0-cdh5.7.1

(删除、重命名没有操作)

配置环境变量

$ cd /home/Hadoop

$ vim .bash_profile(没找到这文件,可能用的是.profile,但用这个也可以)

export FLUME_HOME=/home/hadoop/app/cdh/flume-1.6.0-cdh5.7.1

export PATH=$PATH:$FLUME_HOME/bin

操作:

$ cd ~/hadoop (在用户目录下才生效,否则版本验证失败)

$ cd ~

$ vim .bash_profile

export FLUME_HOME=~/software/apache-flume-1.8.0-bin

export PATH=$PATH:$FLUME_HOME/bin

$ source .bash_profile

操作显示:

-bash: export: `/home/user/software/apache-flume-1.8.0-bin': not a valid identifier

删除FLUME_HOME等号后的空格后不再报错

版本验证失败,将.bash_profile复制到home下

~/hadoop$ cp .bash_profile ~/

~$ source .bash_profile

 

配置flume-env.sh文件

修改conf下的flume-env.sh,在里面配置JAVA_HOME

$ cd app/cdh/flume-1.6.0-cdh5.7.1/conf/

$ cp flume-env.sh.template flume-env.sh

$ vim flume-env.sh

export JAVA_HOME=/home/hadoop/app/jdk1.7.0_79

export HADOOP_HOME=/home/hadoop/app/cdh/hadoop-2.6.0-cdh5.7.1

操作:(jdk位置:/home/user/jdk1.8.0_171Hadoop位置:/home/user/hadoop

~/software/apache-flume-1.8.0-bin/conf$ cp flume-env.sh.template flume-env.sh

$ vim flume-env.sh

export JAVA_HOME=/home/user/jdk1.8.0_171

export HADOOP_HOME=/home/user/hadoop

文件中原有文字:

# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced during Flume startup.

如果此文件放置在 flume _ conf _ dir/fume-env. sh, 它将被获取

在flume启动过程中。

# Enviroment variables can be set here. 环境变量可以在这里设置。

# export JAVA_HOME=/usr/lib/jvm/java-8-oracle

 

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX

# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

 

# Let Flume write raw event data and configuration information to its log files for debugging purposes. Enabling these flags is not recommended in production,

# as it may result in logging sensitive user information or encryption secrets.

# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "

 

# Note that the Flume conf directory is always included in the classpath.

#FLUME_CLASSPATH=""

翻译:

# 给 flume 更多的内存和预分配, 通过 jmx 启用远程监控

# 导出 java _ opts = "-xms100-xmx2000 m-Dcom.sun.management.jmxremote"

 

# 让 flume 将原始事件数据和配置信息写入其日志文件, 以便进行调试。在生产中不建议启用这些标志,因为它可能会导致记录敏感的用户信息或加密机密。

# export java _ opts = "$JAVA _ opts-Dorg.apache.flume.log.rawdata = true-Dorg.apache.flume.log.printconfig = true "

 

# 请注意, flume conf 目录始终包含在类路径中。

#FLUME_CLASSPATH = ""

 

 

版本验证

$ flume-ng version

显示:

-bash: flume: command not found

版本验证失败,将.bash_profile复制到home下

操作:

~/hadoop$ cp .bash_profile ~/

~$ source .bash_profile

版本验证成功

显示:Flume 1.8.0

Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git

Revision: 99f591994468633fc6f8701c5fc53e0214b6da4f

Compiled by denes on Fri Sep 15 14:58:00 CEST 2017

From source with checksum fbb44c8c8fb63a49be0a59e27316833d

Flume部署示例

Avro

Flume可以通过Avro监听某个端口并捕获传输的数据,具体示例如下:

// 创建一个Flume配置文件

$ cd app/cdh/flume-1.6.0-cdh5.7.1

$ mkdir example

$ cp conf/flume-conf.properties.template example/netcat.conf

操作:

$ cd ~/software/apache-flume-1.8.0-bin

$ mkdir example

$ cp conf/flume-conf.properties.template example/netcat.conf

查看:

~/software/apache-flume-1.8.0-bin/example$ ls

netcat.conf

// 配置netcat.conf用于实时获取另一终端输入的数据

$ vim example/netcat.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel that buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

操作:

$ vim netcat.conf

源文件显示:

# The configuration file needs to define the sources,

# the channels and the sinks. 配置文件需要定义源、通道和接收器。

# Sources, channels and sinks are defined per agent,

# in this case called 'agent'源、通道和接收器是为代理定义的, 在这种情况下称为 "agent "

agent.sources = seqGenSrc

agent.channels = memoryChannel

agent.sinks = loggerSink

# For each one of the sources, the type is defined定义源

agent.sources.seqGenSrc.type = seq

# The channel can be defined as follows.定义通道

agent.sources.seqGenSrc.channels = memoryChannel

# Each sink's type must be defined定义接收器

agent.sinks.loggerSink.type = logger

#Specify the channel the sink should use定义接收器使用的通道

agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.定义通道类型

agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source) can be defined as well定义每种类型通道的

# In this case, it specifies the capacity of the memory channel内存容量

agent.channels.memoryChannel.capacity = 100

修改文件如上

 

 // 运行FlumeAgent,监听本机的44444端口

$ flume-ng agent -c conf -f example/netcat.conf -n a1 -Dflume.root.logger=INFO,console

操作(绝对路径,根据实际替换):

$flume-ng agent -c conf -f ~/software/apache-flume-1.8.0-bin/example/netcat.conf -n a1 -Dflume.root.logger=INFO,console

flume安装与配置

flume安装与配置

// 打开另一终端,通过telnet登录localhost的44444,输入测试数据

$ telnet localhost 44444

显示:telnet: command not found

安装telnet后还是这样,尝试用Windows往虚拟机端口写

telnet 10.2.68.104 44444

失败,只能连上22/23端口,44444端口连接失败

改用nc登录localhost的44444,输入测试数据:

nc -v localhost 44444

flume安装与配置

// 查看flume收集数据情况

flume安装与配置

flume安装与配置

Spool

Spool用于监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:拷贝到spool目录下的文件不可以再打开编辑、spool目录下不可包含相应的子目录。具体示例如下:

// 创建两个Flume配置文件

$ cd app/cdh/flume-1.6.0-cdh5.7.1

$ cp conf/flume-conf.properties.template example/spool1.conf

$ cp conf/flume-conf.properties.template example/spool2.conf

操作:

$ cd ~/software/apache-flume-1.8.0-bin

$ cp conf/flume-conf.properties.template example/spool1.conf

$ cp conf/flume-conf.properties.template example/spool2.conf

 

// 配置spool1.conf用于监控目录avro_data的文件,将文件内容发送到本地60000端口。

//监控目录需要换

操作:

$ vim example/spool1.conf

 

# Namethe components

local1.sources= r1

local1.sinks= k1

local1.channels= c1

# Source

local1.sources.r1.type= spooldir

local1.sources.r1.spoolDir= ~/avro_data

# Sink

local1.sinks.k1.type= avro

local1.sinks.k1.hostname= localhost

local1.sinks.k1.port= 60000

#Channel

local1.channels.c1.type= memory

# Bindthe source and sink to the channel

local1.sources.r1.channels= c1

local1.sinks.k1.channel= c1

// 配置spool2.conf用于从本地60000端口获取数据并写入HDFS

创建HDFS测试目录

$ vim example/spool2.conf

 

# Namethe components

a1.sources= r1

a1.sinks= k1

a1.channels= c1

# Source

a1.sources.r1.type= avro

a1.sources.r1.channels= c1

a1.sources.r1.bind= localhost

a1.sources.r1.port= 60000

# Sink

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path= hdfs://localhost:9000/user/wcbdd/flumeData

a1.sinks.k1.rollInterval= 0

a1.sinks.k1.hdfs.writeFormat= Text

a1.sinks.k1.hdfs.fileType= DataStream

# Channel

a1.channels.c1.type= memory

a1.channels.c1.capacity= 10000

# Bind the source and sink to the channel

a1.sources.r1.channels= c1

a1.sinks.k1.channel= c1

 // 分别打开两个终端,运行如下命令启动两个Flume Agent

$ flume-ng agent -c conf -f example/spool2.conf -n a1

$ flume-ng agent -c conf -f example/spool1.conf -n local1

操作:

cd ~/software/apache-flume-1.8.0-bin

$ flume-ng agent -c conf -f example/spool2.conf -n a1

$ flume-ng agent -c conf -f example/spool1.conf -n local1

 

// 查看本地文件系统中需要监控的avro_data目录内容(文件不存在,分别在本地和HDFS创建文件夹)

$ cd avro_data

$ cat avro_data.txt

显示:

-bash: cd: avro_data/: No such file or directory

cat: avro_data.txt: No such file or directory

操作

~$ mkdir avro_data

~/avro_data$ touch avro_data.txt

 

创建HDFS文件夹

原hdfs://localhost:9000/user/wcbdd/flumeData

 

修改spool配置文件监听、写入路径

查看HDFS文件中的创建文件夹命令,

sink配置详解:

https://blog.****.net/xiaolong_4_2/article/details/81945204

MongoDB写入

http://www.cnblogs.com/cswuyg/p/4498804.html

https://java-my-life.iteye.com/blog/2238085

https://blog.****.net/tinico/article/details/41079825?utm_source=blogkpcl14

flume+mongodb流式日志采集:

https://wenku.baidu.com/view/66f1e436ba68a98271fe910ef12d2af90242a81b.html

下载mongodb插件源码:mongosink(打成jar包),和mongodb java驱动

mongosink下载地址:https://github.com/leonlee/flume-ng-mongodb-sink

Clone the repository

Install latest Maven and build source by 'mvn package'

Generate classpath by 'mvn dependency:build-classpath'

Append classpath in $FLUME_HOME/conf/flume-env.sh

Add the sink definition according to Configuration

通过mvnbuild编译,下载依赖项,追加flume-env.sh,根据sink官网的配置说明配置sink definition,打包

mongosink打jar包

工程报错:The method configure(Context) of type MongoSink must override a superclass method

https://blog.****.net/kwuwei/article/details/38365839

经过查看,compiler已经是1.8

是因为Build path未修改

打包方法:

https://blog.****.net/ssbb1995/article/details/78983915

cd pom.xml所在位置,然后输出命令 mvn clean package

报错:

[ERROR] Unknown lifecycle phase "?clean?package". You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-site, site, post-site, site-deploy, pre-clean, clean, post-clean. -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/LifecyclePhaseNotFoundException

先尝试用eclipse自带的mvn install打了一个

尝试Maven buildhttps://blog.****.net/qq_28553681/article/details/80988190

Goals上填写:clean package 打包成功,显示

[INFO] Building jar: E:\studyMaterial\work\eclipse\flume-ng-mongodb-sink\target\flume-ng-mongodb-sink-1.0.0.jar

Flume配置参考:https://www.cnblogs.com/ywjy/p/5255161.html(引用https://blog.****.net/tinico/article/details/41079825

开启MongoDB

D:\Program Files\MongoDB\Server\3.4\bin>mongod.exe --port 65521 --dbpath "D:\MongoDB\DBData"

mongod --dbpath="D:\MongoDB\DBData"

启动Flume

# cd F:\temp\apache-flume-1.6.0-bin\bin
flume-ng.cmd agent --conf ..\conf -f ..\conf\mongo-agent.properties -n agent

报错:

2019-02-03 19:47:47,926 (New I/O worker #1) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.

org.apache.avro.AvroRuntimeException: Excessively large list allocation request detected: 1863125608 items! Connection closed.

        at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)

        at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)

        at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)

        at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)

        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)

        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)

        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)

        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)

        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)

        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)

        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)

        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)

        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:748)

修改flume配置中MongoDB端口无果

查阅https://blog.****.net/ma0903/article/details/48209681?utm_source=blogxgwz1

有可能flume端接收的协议,与client发送数据的协议不一致。

比如:flume接收avroclient发送tcp

Windows配置flume

  1. 到Apache的Flume官网(http://flume.apache.org/download.html)下载apache-flume-8.0-bin.tar.gz

http://www.apache.org/dist/flume/1.8.0/

解压文件中打开docs文件夹中的index.html即可本地查看文档

2.解压到目录,例如D:\software\apache-flume-1.8.0-bin

3.新建FLUME_HOME变量,填写flume安装目录D:\software\apache-flume-1.8.0-bin

4.编辑系统变量path,追加%FLUME_HOME%\conf和%FLUME_HOME%\bin

5.复制并重命名flume\config目录下的三个文件,并去掉.template后缀

Win+R输入cmd,进入命令窗口,输入

flume-ng version正常,证明环境是ok的。

直接用1.9的没配环境变量,加了example.conf文件,出现报错

flume-ng agent --conf ../conf --conf-file ../conf/example.conf --name a1 -property flume.root.logger=INFO,console

 

D:\Program Files (x86)\flume\win-apache-flume-1.9.0-bin\apache-flume-1.9.0-bin\bin>powershell.exe -NoProfile -InputFormat none -ExecutionPolicy unrestricted -File D:\Program Files (x86)\flume\win-apache-flume-1.9.0-bin\apache-flume-1.9.0-bin\bin\flume-ng.ps1 agent --conf ../conf --conf-file ../conf/example.conf --name a1 -property flume.root.logger=INFO,console

处理 -File“D:\Program”失败,因为该文件不具有 '.ps1' 扩展名。请指定一个有效的 Windows PowerShell 脚本文件名,然后重试。

Windows PowerShell

原因:软件中调用了一个.bat文件.bat文件无法识别路径中有空格,更改安装路径https://blog.****.net/yanhuatangtang/article/details/80404097

更换路径重新解压后:在bin目录下可以运行flume-ng version,但在默认目录不行(已经配过环境变量)

bin目录下查看version成功但显示(注意,如果出现问题尝试回来修复):

WARN: Config directory not set. Defaulting to D:\Programs\flume\apache-flume-1.8.0-bin\conf

Sourcing environment configuration script D:\Programs\flume\apache-flume-1.8.0-bin\conf\flume-env.ps1

WARN: Did not find D:\Programs\flume\apache-flume-1.8.0-bin\conf\flume-env.ps1

WARN: HADOOP_PREFIX or HADOOP_HOME not found

WARN: HADOOP_PREFIX not set. Unable to include Hadoop's classpath & java.library.path

WARN: HBASE_HOME not found

WARN: HIVE_HOME not found

使用以下资料测试:https://blog.****.net/ycf921244819/article/details/80341502

前面配置都正常,在用第二个窗口进行telnet时显示:

正在连接localhost...无法打开到主机的连接。 在端口 50000: 连接失败

Centos每次开机需要在右上角手动联网

192.168.43.156

配置flume环境变量

flume/conf下的fime-env.sh更改JAVA_HOME

java –verbose查看jdk位置

(本地为C:\Program Files\Java\jdk1.8.0_131