CDH3 Installation


테스트 환경



OS : CentOS 5.8 - 64bit
java : jdk1.6.32
사용자 및 그룹 : user : hadoop / group : hadoop
Java 1.6.32

CDH3 환경구성 순서는 아래와 같이 진행된다.
1. java 버전확인 
2. 클라우데라 CDH3 Package 다운 및 설치
3. Hadoop 소스 다운 및 설치
4. 필요한 서브 프로젝트 다운 및 설치
5 끝!



 Hadoop 은 최소 Mapreduce 와 HDFS 만 구성하면 Hadoop 환경이라고 할 수 있다. 하지만 그외 Hbase, Hive, Pig, Zookeeper 와 같은 Hadoop 서브프로젝트를 함께 사용해야 편리하고 생산성이 높다.

cloudera(클라우데라)에서는 하둡과 서브프로젝트를 호환성과 설치가 편리하도록 통합된 배포판을 다운받을 수 있는 서비스를 제공한다.

클라우데라
hadoop를 만든 맴버중 한명인 더크커팅이 현재 몸담고 있는 회사로써, 하둡 환경을 구성하기 쉽도록 배포판을 서비스중이다. 

 


Cloudera 배포판 설치 문서 URL
대부분은 아래의 문서를 참고하여 진행했다.





환경구성 


Java Version 확인


[root@localhost hadoop]# javac -version

javac 1.6.0_32

[root@localhost hadoop]# java -version

java version "1.6.0_32"

Java(TM) SE Runtime Environment (build 1.6.0_32-b05)

Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)

(설치 후 path 와 버전을 잘 확인한다.)
버전은  1.6.0_14 이상을 권장한다.

더 큰 포인터로 인한 메모리 오버헤드를 해소하기 위해 1.6.0_14 버전부터 '압축된 범용 객체 포인터' 기능이 사용된다.

< Hadoop 완벽가이드 1차 개정판 p352 참고 >

Step 1: Download the CDH3 Repository or Package.


클라우데라에서 직접 운영중인 저장소의 소스를 받고 설치하기 위해 관련 페키지를 설치하는 과정이다.
만일 클러스터 환경이라면 모든 Host 에서 진행한다.



3가지 방법중 하나를 선택하면 된다. 운영중인 저장소에서 소스를 받기 위해서는 아래 3가지 방법중 한가지를 선택하면 된다. 이 문서에서는 첫번째 방법으로 진행된다.

1. Download and install the CDH3 Package <-- 요걸루..

2. Add the CDH3 repository or
3. Build a Yum Repository


해당 rpm 은 클라우데라 홈페이지에서 받고 진행한다.

1. CDH3 Package 다운 및 설치

$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

[sudo] password for hadoop: pw

Loaded plugins: fastestmirror, security

Setting up Local Package Process

Examining cdh3-repository-1.0-1.noarch.rpm: cdh3-repository-1.0-1.noarch

Marking cdh3-repository-1.0-1.noarch.rpm to be installed

Loading mirror speeds from cached hostfile

 * base: centos.mirror.cdnetworks.com

 * extras: centos.mirror.cdnetworks.com

 * updates: centos.mirror.cdnetworks.com

Resolving Dependencies

--> Running transaction check

---> Package cdh3-repository.noarch 0:1.0-1 set to be updated

--> Finished Dependency Resolution


Dependencies Resolved


====================================================================================================================

 Package                     Arch               Version             Repository                                 Size

====================================================================================================================

Installing:

 cdh3-repository             noarch             1.0-1               /cdh3-repository-1.0-1.noarch              13 k


Transaction Summary

====================================================================================================================

Install       1 Package(s)

Upgrade       0 Package(s)


Total size: 13 k

Is this ok [y/N]: y

Downloading Packages:

Running rpm_check_debug

Running Transaction Test

Finished Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing     : cdh3-repository                                                                              1/1


Installed:

  cdh3-repository.noarch 0:1.0-1


Complete!




sudo 에러
위 단계에서  hadoop is not in the sudoers file.  This incident will be reported  상황과 대처법은 아래와 같다.


sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm


We trust you have received the usual lecture from the local System

Administrator. It usually boils down to these three things:


    #1) Respect the privacy of others.

    #2) Think before you type.

    #3) With great power comes great responsibility.


[sudo] password for hadoop: [pw]

hadoop is not in the sudoers file.  This incident will be reported. // 에러로 인해 진행 중지


hadoop is not in the sudoers file ' 이 메시지는 hadoop 사용자가 'sudoers file' 에 없기 때문에 뜬다.

해결 방법은 아래와 같다. 자세한 해결법과 원인은 여기에서 확인하자. 

visudo 를 이용하여 /etc/sudoers 파일을 열어서 'hadoop ALL=(ALL) ALL' 을 추가한다.



Step 2: Install CDH3 on all hosts

클라우데라 Hadoop yum 저장소 조회

$ yum search hadoop

Loaded plugins: fastestmirror, security

base                                                                                              3591/3591

cloudera-cdh3                                                                                |  951 B     00:00

cloudera-cdh3/primary                                                                    |  22 kB     00:00

cloudera-cdh3                                                                                74/74

============================================ Matched: hadoop ==================================================

cdh3-repository.noarch : Cloudera's Distribution including Apache Hadoop

flume.noarch : Flume is a reliable, scalable, and manageable distributed log collection application for collecting

             : data such as logs and delivering it to data stores such as Hadoop's HDFS.

flume-master.noarch : The flume master daemon is the central administration and data path control point for flume

                    : nodes.

flume-ng.noarch : Flume is a reliable, scalable, and manageable distributed log collection application for

                : collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.

flume-ng-agent.noarch : The flume agent daemon is a core element of flume's data path and is responsible for

                      : generating, processing, and delivering data.

flume-node.noarch : The flume node daemon is a core element of flume's data path and is responsible for generating,

                  : processing, and delivering data.

hadoop-0.20.noarch : Hadoop is a software platform for processing vast amounts of data

hadoop-0.20-conf-pseudo.noarch : Hadoop installation in pseudo-distributed mode

hadoop-0.20-datanode.noarch : Hadoop Data Node

hadoop-0.20-debuginfo.i386 : Debug information for package hadoop-0.20

hadoop-0.20-debuginfo.x86_64 : Debug information for package hadoop-0.20

hadoop-0.20-doc.noarch : Hadoop Documentation

hadoop-0.20-fuse.i386 : Mountable HDFS

hadoop-0.20-fuse.x86_64 : Mountable HDFS

hadoop-0.20-jobtracker.noarch : Hadoop Job Tracker

hadoop-0.20-libhdfs.i386 : Hadoop Filesystem Library

hadoop-0.20-libhdfs.x86_64 : Hadoop Filesystem Library

hadoop-0.20-namenode.noarch : The Hadoop namenode manages the block locations of HDFS files

hadoop-0.20-native.i386 : Native libraries for Hadoop Compression

hadoop-0.20-native.x86_64 : Native libraries for Hadoop Compression

hadoop-0.20-pipes.i386 : Hadoop Pipes Library

hadoop-0.20-pipes.x86_64 : Hadoop Pipes Library

hadoop-0.20-sbin.i386 : Binaries for secured Hadoop clusters

hadoop-0.20-sbin.x86_64 : Binaries for secured Hadoop clusters

hadoop-0.20-secondarynamenode.noarch : Hadoop Secondary namenode

hadoop-0.20-source.noarch : Source code for Hadoop

hadoop-0.20-tasktracker.noarch : Hadoop Task Tracker

hadoop-hbase.noarch : HBase is the Hadoop database. Use it when you need random, realtime read/write access to your

                    : Big Data. This project's goal is the hosting of very large tables -- billions of rows X

                    : millions of columns -- atop clusters of commodity hardware.

hadoop-hbase-doc.noarch : Hbase Documentation

hadoop-hbase-master.noarch : The Hadoop HBase master Server.

hadoop-hbase-regionserver.noarch : The Hadoop HBase RegionServer server.

hadoop-hbase-rest.noarch : The Apache HBase REST gateway

hadoop-hbase-thrift.noarch : The Hadoop HBase Thrift Interface

hadoop-hive.noarch : Hive is a data warehouse infrastructure built on top of Hadoop

hadoop-hive-metastore.noarch : Shared metadata repository for Hive.

hadoop-hive-server.noarch : Provides a Hive Thrift service.

hadoop-pig.noarch : Pig is a platform for analyzing large data sets

hadoop-zookeeper.noarch : A high-performance coordination service for distributed applications.

hadoop-zookeeper-server.noarch : The Hadoop Zookeeper server

hue.noarch : The hue metapackage

hue-common.i386 : A browser-based desktop interface for Hadoop

hue-common.x86_64 : A browser-based desktop interface for Hadoop

hue-filebrowser.noarch : A UI for the Hadoop Distributed File System (HDFS)

hue-jobbrowser.noarch : A UI for viewing Hadoop map-reduce jobs

hue-jobsub.noarch : A UI for designing and submitting map-reduce jobs to Hadoop

hue-plugins.noarch : Hadoop plugins for Hue

hue-shell.i386 : A shell for console based Hadoop applications

hue-shell.x86_64 : A shell for console based Hadoop applications

mahout.noarch : A set of Java libraries for scalable machine learning.

oozie.noarch : Oozie is a system that runs workflows of Hadoop jobs.

sqoop.noarch : Sqoop allows easy imports and exports of data sets between databases and the Hadoop Distributed File

             : System (HDFS).



Hadoop core  설치( 모든 호스트에서 진행 )


$ sudo yum install hadoop-0.20 hadoop-0.20-native

[sudo] password for hadoop: pw

Loaded plugins: fastestmirror, security

Loading mirror speeds from cached hostfile

 * base: centos.mirror.cdnetworks.com

 * extras: centos.mirror.cdnetworks.com

 * updates: centos.mirror.cdnetworks.com

Setting up Install Process

Resolving Dependencies

--> Running transaction check

---> Package hadoop-0.20.noarch 0:0.20.2+923.256-1 set to be updated

---> Package hadoop-0.20-native.i386 0:0.20.2+923.256-1 set to be updated

---> Package hadoop-0.20-native.x86_64 0:0.20.2+923.256-1 set to be updated

--> Finished Dependency Resolution


Dependencies Resolved


====================================================================================================================

 Package                         Arch                Version                       Repository                  Size

====================================================================================================================

Installing:

 hadoop-0.20                     noarch              0.20.2+923.256-1              cloudera-cdh3               30 M

 hadoop-0.20-native              i386                0.20.2+923.256-1              cloudera-cdh3               59 k

 hadoop-0.20-native              x86_64              0.20.2+923.256-1              cloudera-cdh3               63 k


Transaction Summary

====================================================================================================================

Install       3 Package(s)

Upgrade       0 Package(s)


Total size: 30 M

Total download size: 30 M

Is this ok [y/N]: y

Downloading Packages: pw

hadoop-0.20-0.20.2+923.256-1.noarch.rpm                                                      |  30 MB     00:23

Running rpm_check_debug

Running Transaction Test

Finished Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing     : hadoop-0.20                                                                                  1/3

  Installing     : hadoop-0.20-native                                                                           2/3

  Installing     : hadoop-0.20-native                                                                           3/3


Installed:

  hadoop-0.20.noarch 0:0.20.2+923.256-1                     hadoop-0.20-native.i386 0:0.20.2+923.256-1

  hadoop-0.20-native.x86_64 0:0.20.2+923.256-1


Complete!



Daemon 설치

이 과정은 호스트별 역할에 따라 실행 데몬을 지정한다.

$ sudo yum install hadoop-0.20-<daemon type>
또는
sudo yum install hadoop-0.20-*  


데몬 타입은 다음과 같이 치환된다. 

namenode
datanode
secondarynamenode
jobtracker
tasktracker



subProject 설치
각 하위 프로젝트는 각각의 install 가이드를 참고하면 된다. 

컴포넌트 리스트

To install the CDH3 components, follow the instructions in the following sections:

 
글 재주가 없어서 정리가 참 어렵다;;

'빅데이터' 카테고리의 다른 글

Hadoop Deprecated Properties  (0) 2012.05.29
관리 도구  (0) 2012.05.29
hadoop-en.sh 설정  (0) 2012.05.28
Hadoop Master Node  (0) 2012.05.28
Hadoop 클러스터 구성 &  (2) 2012.05.28