打造通用流处理平台

打造通用流处理平台

整合日志输出到Flume

streaming.conf

agent1.sources=avro-sources
agent1.channels=logger-channel
agent1.sinks=log-sink

#define source
agent1.sources.avro-sources.type=avro
agent1.sources.avro-sources.bind=0.0.0.0
agent1.sources.avro-sources.port=41414

#define channel
agent1.channels.logger-channel.type=memory

#define sink
agent1.sinks.log-sink.type=logger

agent1.sources.avro-sources.channels=logger-channel
agent1.sinks.log-sink.channels=logger-channel

启动:
flume-ng agent
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/streaming.conf
–name agent1
-Dflume.root.logger=INFO,console

整合Flume到Kafka

启动zookeeper

./kafka-topics.sh --create --zookeeper hadoop000:2181 --replication-factor 1 --partitions 1 --topic streaming_topic

streaming2.conf
agent1.sources=avro-sources
agent1.channels=logger-channel
agent1.sinks=log-sink

#define source
agent1.sources.avro-sources.type=avro
agent1.sources.avro-sources.bind=0.0.0.0
agent1.sources.avro-sources.port=41414

#define channel
agent1.channels.logger-channel.type=memory

#define sink
agent1.sinks.kafka-sink.type=org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafka-sink.topic = streaming_topic
agent1.sinks.kafka-sink.brokerList = hadoop000:9092
agent1.sinks.kafka-sink.requiredAcks = 1
agent1.sinks.kafka-sink.kafka.flumeBatchSize = 20

agent1.sources.avro-sources.channels=logger-channel
agent1.sinks.kafka-sink.channels=logger-channel

agent1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

flume-ng agent
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/streaming2.conf
–name agent1
-Dflume.root.logger=INFO,console

./kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic streaming_topic

我们现在实在本地进行测试,在IDEA中运行LoggerGenerator,然后使用Flume、Kafka以及Spark Streaming进行处理操作。

在生产上肯定不是这么干的:
1)打包jar,执行LoggerGenerator类
2)Flume、Kafka和我们的测试是一样的
3)Spark Streaming的代码也是需要打成jar包,然后使用spark-submit的方式进行提交到环境上执行

可以根据你们的实际情况选择运行模式:local/yarn/standalone/mesos

再生产上,整个流处理的流程都是一样的,区别在于业务逻辑的复杂性