테스트 환경
OS : CentOS 5.8 - 64bit
java : jdk1.6.32
사용자 및 그룹 : user : hadoop / group : hadoop
Java 1.6.32
CDH3 환경구성 순서는 아래와 같이 진행된다.
1. java 버전확인
2. 클라우데라 CDH3 Package 다운 및 설치
3. Hadoop 소스 다운 및 설치
4. 필요한 서브 프로젝트 다운 및 설치
5 끝!
Hadoop 은 최소 Mapreduce 와 HDFS 만 구성하면 Hadoop 환경이라고 할 수 있다. 하지만 그외 Hbase, Hive, Pig, Zookeeper 와 같은 Hadoop 서브프로젝트를 함께 사용해야 편리하고 생산성이 높다.
cloudera(클라우데라)에서는 하둡과 서브프로젝트를 호환성과 설치가 편리하도록 통합된 배포판을 다운받을 수 있는 서비스를 제공한다.
클라우데라
hadoop를 만든 맴버중 한명인 더크커팅이 현재 몸담고 있는 회사로써, 하둡 환경을 구성하기 쉽도록 배포판을 서비스중이다.
Cloudera 배포판 설치 문서 URL
대부분은 아래의 문서를 참고하여 진행했다.
환경구성
Java Version 확인
[root@localhost hadoop]# javac -version
javac 1.6.0_32
[root@localhost hadoop]# java -version
java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)
더 큰 포인터로 인한 메모리 오버헤드를 해소하기 위해 1.6.0_14 버전부터 '압축된 범용 객체 포인터' 기능이 사용된다.
Step 1: Download the CDH3 Repository or Package.
클라우데라에서 직접 운영중인 저장소의 소스를 받고 설치하기 위해 관련 페키지를 설치하는 과정이다.
만일 클러스터 환경이라면 모든 Host 에서 진행한다.
3가지 방법중 하나를 선택하면 된다. 운영중인 저장소에서 소스를 받기 위해서는 아래 3가지 방법중 한가지를 선택하면 된다. 이 문서에서는 첫번째 방법으로 진행된다.
1. Download and install the CDH3 Package <-- 요걸루..
해당 rpm 은 클라우데라 홈페이지에서 받고 진행한다.
1. CDH3 Package 다운 및 설치
$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
[sudo] password for hadoop: pw
Loaded plugins: fastestmirror, security
Setting up Local Package Process
Examining cdh3-repository-1.0-1.noarch.rpm: cdh3-repository-1.0-1.noarch
Marking cdh3-repository-1.0-1.noarch.rpm to be installed
Loading mirror speeds from cached hostfile
* base: centos.mirror.cdnetworks.com
* extras: centos.mirror.cdnetworks.com
* updates: centos.mirror.cdnetworks.com
Resolving Dependencies
--> Running transaction check
---> Package cdh3-repository.noarch 0:1.0-1 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
====================================================================================================================
Package Arch Version Repository Size
====================================================================================================================
Installing:
cdh3-repository noarch 1.0-1 /cdh3-repository-1.0-1.noarch 13 k
Transaction Summary
====================================================================================================================
Install 1 Package(s)
Upgrade 0 Package(s)
Total size: 13 k
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : cdh3-repository 1/1
Installed:
cdh3-repository.noarch 0:1.0-1
Complete!
sudo 에러
위 단계에서 hadoop is not in the sudoers file. This incident will be reported 상황과 대처법은 아래와 같다.
$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:
#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.
[sudo] password for hadoop: [pw]
hadoop is not in the sudoers file. This incident will be reported. // 에러로 인해 진행 중지
' hadoop is not in the sudoers file ' 이 메시지는 hadoop 사용자가 'sudoers file' 에 없기 때문에 뜬다.
해결 방법은 아래와 같다. 자세한 해결법과 원인은 여기에서 확인하자.
visudo 를 이용하여 /etc/sudoers 파일을 열어서 'hadoop ALL=(ALL) ALL' 을 추가한다.
Step 2: Install CDH3 on all hosts
클라우데라 Hadoop yum 저장소 조회
$ yum search hadoop
Loaded plugins: fastestmirror, security
base 3591/3591
cloudera-cdh3 | 951 B 00:00
cloudera-cdh3/primary | 22 kB 00:00
cloudera-cdh3 74/74
============================================ Matched: hadoop ==================================================
cdh3-repository.noarch : Cloudera's Distribution including Apache Hadoop
flume.noarch : Flume is a reliable, scalable, and manageable distributed log collection application for collecting
: data such as logs and delivering it to data stores such as Hadoop's HDFS.
flume-master.noarch : The flume master daemon is the central administration and data path control point for flume
: nodes.
flume-ng.noarch : Flume is a reliable, scalable, and manageable distributed log collection application for
: collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.
flume-ng-agent.noarch : The flume agent daemon is a core element of flume's data path and is responsible for
: generating, processing, and delivering data.
flume-node.noarch : The flume node daemon is a core element of flume's data path and is responsible for generating,
: processing, and delivering data.
hadoop-0.20.noarch : Hadoop is a software platform for processing vast amounts of data
hadoop-0.20-conf-pseudo.noarch : Hadoop installation in pseudo-distributed mode
hadoop-0.20-datanode.noarch : Hadoop Data Node
hadoop-0.20-debuginfo.i386 : Debug information for package hadoop-0.20
hadoop-0.20-debuginfo.x86_64 : Debug information for package hadoop-0.20
hadoop-0.20-doc.noarch : Hadoop Documentation
hadoop-0.20-fuse.i386 : Mountable HDFS
hadoop-0.20-fuse.x86_64 : Mountable HDFS
hadoop-0.20-jobtracker.noarch : Hadoop Job Tracker
hadoop-0.20-libhdfs.i386 : Hadoop Filesystem Library
hadoop-0.20-libhdfs.x86_64 : Hadoop Filesystem Library
hadoop-0.20-namenode.noarch : The Hadoop namenode manages the block locations of HDFS files
hadoop-0.20-native.i386 : Native libraries for Hadoop Compression
hadoop-0.20-native.x86_64 : Native libraries for Hadoop Compression
hadoop-0.20-pipes.i386 : Hadoop Pipes Library
hadoop-0.20-pipes.x86_64 : Hadoop Pipes Library
hadoop-0.20-sbin.i386 : Binaries for secured Hadoop clusters
hadoop-0.20-sbin.x86_64 : Binaries for secured Hadoop clusters
hadoop-0.20-secondarynamenode.noarch : Hadoop Secondary namenode
hadoop-0.20-source.noarch : Source code for Hadoop
hadoop-0.20-tasktracker.noarch : Hadoop Task Tracker
hadoop-hbase.noarch : HBase is the Hadoop database. Use it when you need random, realtime read/write access to your
: Big Data. This project's goal is the hosting of very large tables -- billions of rows X
: millions of columns -- atop clusters of commodity hardware.
hadoop-hbase-doc.noarch : Hbase Documentation
hadoop-hbase-master.noarch : The Hadoop HBase master Server.
hadoop-hbase-regionserver.noarch : The Hadoop HBase RegionServer server.
hadoop-hbase-rest.noarch : The Apache HBase REST gateway
hadoop-hbase-thrift.noarch : The Hadoop HBase Thrift Interface
hadoop-hive.noarch : Hive is a data warehouse infrastructure built on top of Hadoop
hadoop-hive-metastore.noarch : Shared metadata repository for Hive.
hadoop-hive-server.noarch : Provides a Hive Thrift service.
hadoop-pig.noarch : Pig is a platform for analyzing large data sets
hadoop-zookeeper.noarch : A high-performance coordination service for distributed applications.
hadoop-zookeeper-server.noarch : The Hadoop Zookeeper server
hue.noarch : The hue metapackage
hue-common.i386 : A browser-based desktop interface for Hadoop
hue-common.x86_64 : A browser-based desktop interface for Hadoop
hue-filebrowser.noarch : A UI for the Hadoop Distributed File System (HDFS)
hue-jobbrowser.noarch : A UI for viewing Hadoop map-reduce jobs
hue-jobsub.noarch : A UI for designing and submitting map-reduce jobs to Hadoop
hue-plugins.noarch : Hadoop plugins for Hue
hue-shell.i386 : A shell for console based Hadoop applications
hue-shell.x86_64 : A shell for console based Hadoop applications
mahout.noarch : A set of Java libraries for scalable machine learning.
oozie.noarch : Oozie is a system that runs workflows of Hadoop jobs.
sqoop.noarch : Sqoop allows easy imports and exports of data sets between databases and the Hadoop Distributed File
: System (HDFS).
Hadoop core 설치( 모든 호스트에서 진행 )
$ sudo yum install hadoop-0.20 hadoop-0.20-native
[sudo] password for hadoop: pw
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
* base: centos.mirror.cdnetworks.com
* extras: centos.mirror.cdnetworks.com
* updates: centos.mirror.cdnetworks.com
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package hadoop-0.20.noarch 0:0.20.2+923.256-1 set to be updated
---> Package hadoop-0.20-native.i386 0:0.20.2+923.256-1 set to be updated
---> Package hadoop-0.20-native.x86_64 0:0.20.2+923.256-1 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
====================================================================================================================
Package Arch Version Repository Size
====================================================================================================================
Installing:
hadoop-0.20 noarch 0.20.2+923.256-1 cloudera-cdh3 30 M
hadoop-0.20-native i386 0.20.2+923.256-1 cloudera-cdh3 59 k
hadoop-0.20-native x86_64 0.20.2+923.256-1 cloudera-cdh3 63 k
Transaction Summary
====================================================================================================================
Install 3 Package(s)
Upgrade 0 Package(s)
Total size: 30 M
Total download size: 30 M
Is this ok [y/N]: y
Downloading Packages: pw
hadoop-0.20-0.20.2+923.256-1.noarch.rpm | 30 MB 00:23
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : hadoop-0.20 1/3
Installing : hadoop-0.20-native 2/3
Installing : hadoop-0.20-native 3/3
Installed:
hadoop-0.20.noarch 0:0.20.2+923.256-1 hadoop-0.20-native.i386 0:0.20.2+923.256-1
hadoop-0.20-native.x86_64 0:0.20.2+923.256-1
Complete!
Daemon 설치
이 과정은 호스트별 역할에 따라 실행 데몬을 지정한다.
$ sudo yum install hadoop-0.20-<daemon type>
또는
$ sudo yum install hadoop-0.20-*
데몬 타입은 다음과 같이 치환된다.
subProject 설치
각 하위 프로젝트는 각각의 install 가이드를 참고하면 된다.
컴포넌트 리스트
To install the CDH3 components, follow the instructions in the following sections:
- Flume. See Flume 0.9.x Installation or Flume 1.x Installation.
- Sqoop. See Sqoop Installation.
- Hue. For more information, see Hue Installation.
- Pig. See Pig Installation.
- Oozie. For more information, see Oozie Installation.
- Hive. See Hive Installation.
- HBase. For more information, see HBase Installation.
- ZooKeeper. For more information, see ZooKeeper Installation.
- Whirr. See Whirr Installation.
- Snappy. For more information, see Snappy Installation.
- Mahout. See Mahout Installation.
글 재주가 없어서 정리가 참 어렵다;;
'빅데이터' 카테고리의 다른 글
Hadoop Deprecated Properties (0) | 2012.05.29 |
---|---|
관리 도구 (0) | 2012.05.29 |
hadoop-en.sh 설정 (0) | 2012.05.28 |
Hadoop Master Node (0) | 2012.05.28 |
Hadoop 클러스터 구성 & (2) | 2012.05.28 |