Apache Flume

computer/빅데이터 2016. 4. 4. 16:02

Apache Flume - 1.6.0 Guide

System Requirements

1. java 환경 - 자바 1.6 or 그 이상 ( 1.7 권장 )

2. 메모리 - Source, Channel, Sink들로 구성된 충분한 메모리

3. 디스크 용량 - channel, sink들로 구성된 충분한 디스크 용량

4. 폴더 권한 - agent가 사용하는 디렉토리는 읽기/쓰기 권한이 필요

구조

Data Flow Model

- flume event : 데이타 흐름의 하나.

SetUp(설치)

1. agent 설치하기

flume agent 구성은 로컬 구성 파일에 저장된다. 이것은 텍스트 파일이고 자바 properties 파일 포맷을 따른다. 한개 또는 그이상의 agent 구성은 하나의 configuration 파일에 명시된다. 그 configuration 파일은 agent 안에 있는 각 source, sink, channel의 구성을 포함한다. 그리고 그들이 어떻게 data flow형태를 같이 구성하는지도 포함한다.

2. 각각의 components 의 구성

flow 안에 있는 각 컴포넌트(Source, sink, channel)은 name,type, properties집합을 가지고 있다.

예)

Avro Source : 호스트네임(or IP Address), 데이터를 받을 포트번호

memory channel : max queue size,

HDFS sink : file system URI, 생성되는 file 들의 path, file rotation을 위한 frequency(빈도)

3. Starting agent

agent는 flume-ng shell 스크립트로 시작된다. 이것은 flume의 bin 디렉토리에 있다. agent 이름, config 디렉토리, config file을 명령어에 정확히 적는다.

$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

이제 agent가 실행된다.

4. simple example

# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

a1 이라는 single agent

a1 의 source는 port 44444으로 들어오는 데이터를 받는다.

channel은 event data를 memory에 저장한다.

sink는 콘솔에 log data를 남긴다.

이 환경설정 파일은 다양한 components로 명명되고, 각각의 타입과 구성파라메터에 따라 만들어진다.

주어진 환경설정 파일은 지정된 agent에 정의된다. 주어진 flume 프로세스가 실행될때, a flag is passed telling it which named agent to manifest.

환경설정 파일이 주어지면, 우리는 flume을 아래와 같이 시작할수 있다.

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

다른 터미널에서 포트 44444로 flume에게 이벤트를 보낸다.

$ telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello world! <ENTER>
OK

그럼 오리지널 flume 터미널은 로그 메세지로 이벤트를 출력한다.

12/06/19 15:32:19 INFO source.NetcatSource: Source starting
12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }

'computer > 빅데이터' 카테고리의 다른 글

Spark and Spark Streaming Unit Testing (0)	2016.08.05
[spark] spark cluster (0)	2016.04.29
[빅데이터] 왜 이슈가 되었나 (0)	2016.03.22

ABOUT ME

꿈으로 가는 길 꿈으로 가는 길

'computer > 빅데이터' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'computer > 빅데이터' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바