你好,游客 登录 注册 搜索

Flume官方文档翻译——Flume 1.7.0 User Guide (unreleased version)

[日期:2016-12-08] 来源:Linux社区  作者:林子系 [字体: ]

Flume 1.7.0 User Guide





Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

Apache Flume is a top level project at the Apache Software Foundation.

There are currently two release code lines available, versions 0.9.x and 1.x.

Documentation for the 0.9.x track is available at the Flume 0.9.x User Guide.

This documentation applies to the 1.4.x track.

New and existing users are encouraged to use the 1.x releases so as to leverage the performance improvements and configuration flexibilities available in the latest architecture.

Apache Flume是一个分布式、高可靠和高可用的收集、集合和将大量来自不同来源的日志数据移动到一个中央数据仓库。

Apache Flume不仅局限于数据的聚集。因为数据是可定制的,所以Flume可以用于运输大量时间数据包括不限于网络传输数据,社交媒体产生的数据,电子邮件信息和几乎任何数据源。

Apache Flume是Apache软件基金会的顶级项目。



System Requirements

    1. Java Runtime Environment - Java 1.7 or later(Java运行环境-Java1.7或者以后的版本)
    2. Memory - Sufficient memory for configurations used by sources, channels or sinks(内存——足够的内存来配置souuces,channels和sinks)
    3. Disk Space - Sufficient disk space for configurations used by channels or sinks(磁盘空间-足够的磁盘空间来配置channels或者sinks)
    4. Directory Permissions - Read/Write permissions for directories used by agent(目录权限-代理所使用的目录读/写权限)


Data flow model(数据流动模型)

A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes. A Flume agent is a (JVM) process that hosts the components through which events flow from an external source to the next destination (hop).

一个Flume event被定义为拥有一个字节的有效负载的一个数据流单元和一个可选的字符串属性配置。Flume agent是一个JVM进程来控制组件完成事件流从一个外部来源传输到下一个目的地。


A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink. A similar flow can be defined using a Thrift Flume Source to receive events from a Thrift Sink or a Flume Thrift Rpc Client or Thrift clients written in any language generated from the Flume thrift protocol.When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it’s consumed by a Flume sink. The file channel is one example – it is backed by the local filesystem. The sink removes the event from the channel and puts it into an external repository like HDFS (via Flume HDFS sink) or forwards it to the Flume source of the next Flume agent (next hop) in the flow. The source and sink within the given agent run asynchronously with the events staged in the channel.

Flume source消费外部来源像web server传输给他的事件。外部来源发送以目标Flume source定义好的格式的event给Flume。例如,Avro Flume source用于接收Avro客户端或者流中的其他Flume中Avro sink发来的Avro events。一个相似的流可以用Thrift Flume Source 来接收来自Flume sink或者FluemThrift Rpc客户端或者一个用任何语言写的遵守Flume Thrift 协议的Thrift客户端的事件。当一个Flume Source接收一个事件时,它将事件存储在一个或者多个Cannel中。Channel是一个被动仓库用来保存事件直到它被Flume Sink消费掉。File channel就是个例子-它背靠着本地的文件系统。Sink将事件从Channel中移除并且将事件放到一个外部的仓库像HDFS(通过Flume HDFS sink)或者向前传输到流中另一个Flume Agent。Agent中Source和Sink异步地执行Channel中events。


The events are staged in a channel on each agent. The events are then delivered to the next agent or terminal repository (like HDFS) in the flow. The events are removed from a channel only after they are stored in the channel of next agent or in the terminal repository. This is a how the single-hop message delivery semantics in Flume provide end-to-end reliability of the flow.

Flume uses a transactional approach to guarantee the reliable delivery of the events. The sources and sinks encapsulate in a transaction the storage/retrieval, respectively, of the events placed in or provided by a transaction provided by the channel. This ensures that the set of events are reliably passed from point to point in the flow. In the case of a multi-hop flow, the sink from the previous hop and the source from the next hop both have their transactions running to ensure that the data is safely stored in the channel of the next hop.




Setting up an agent(设置Agent)

Flume agent configuration is stored in a local configuration file. This is a text file that follows the Java properties file format. Configurations for one or more agents can be specified in the same configuration file. The configuration file includes properties of each source, sink and channel in an agent and how they are wired together to form data flows.

Flume agent配置存储在一个本地配置文件中。这是一个跟Java 属性文件格式一样的文本文件。一个或者多个agent可以指定同一个配置文件来进行配置。配置文件包括每个source的属性,agent中的sink和channel以及它们是如何连接构成数据流。

Wiring the pieces together(碎片集合)

The agent needs to know what individual components to load and how they are connected in order to constitute the flow. This is done by listing the names of each of the sources, sinks and channels in the agent, and then specifying the connecting channel for each sink and source. For example, an agent flows events from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file channel called file-channel. The configuration file will contain names of these components and file-channel as a shared channel for both avroWeb source and hdfs-cluster1 sink.

agent需要知道每个组件加载什么和它们是怎样连接构成流。这通过列出agent中每个source、sink和channel和指定每个sink和source连接的channel。例如,一个agent流事件从一个称为avroWeb的Avro sources通过一个称为file-channel的文件channel流向一个称为hdfs-cluster1的HDFS sink。配置文档将包含这些组件的名字和avroWeb source和hdfs-cluster1 sink中间共享的file-channel。

Starting an agent(开始一个agent)

An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:


$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

Now the agent will start running source and sinks configured in the given properties file.


A simple example(一个简单的例子)

Here, we give an example configuration file, describing a single-node Flume deployment. This configuration lets a user generate events and subsequently logs them to the console.


# example.conf: A single-node Flume configuration


# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1


# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444


# Describe the sink

a1.sinks.k1.type = logger


# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100


# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.

Given this configuration file, we can start Flume as follows:


$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

Note that in a full deployment we would typically include one more option: --conf=<conf-dir>. The <conf-dir> directory would include a shell script flume-env.sh and potentially a log4j properties file. In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.

需要说明的是在一个完整的部署中我们应该通常会包含多一个选项:--conf=<conf-dir>.<conf-dir>目录包含一个shell脚本 flume-env.sh和一个潜在的log4j属性文档。在这个例子中,我们通过一个Java选项来强制Flume打印信息到控制台和没有自定义一个环境脚本。

From a separate terminal, we can then telnet port 44444 and send Flume an event:

通过一个独立的终端,我们可以telnet 端口4444和发送一个事件:

$ telnet localhost 44444


Connected to localhost.localdomain (

Escape character is '^]'.

Hello world! <ENTER>


The original Flume terminal will output the event in a log message.


12/06/19 15:32:19 INFO source.NetcatSource: Source starting

12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/]

12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }

Congratulations - you’ve successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail.

恭喜-你已经成功配置和部署了一个Flume agent!接下来的部分会覆盖agent配置的更多细节。


本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数


  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款