분산 모드는 하나의 PC 에 구축한다.
하둡 데몬이 local 에서 구동하므로 간단하게 작은 규모의 클러스터 테스트가 가능하다.
OS 설정
SSH
hadoop 설치 및 설정
테스트
OS 설정
$ hostname
dist
/etc/hosts
java
$ javac -version
SSH
공개키 생성 및 공용키 생성
$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
// ssh-copy-id 명령으로 authorized_keys 키 생성
// 테스트
$ ssh localhost date
Mon Jun 4 19:04:02 KST 2012
hadoop 설치 및 설정
conf/core-site.xml
fs.default.name hdfs://localhost:9000
conf/hdfs-site.xml
dfs.replication 1 dfs.name.dir /home/hadoop/work/name dfs.data.dir /home/hadoop/work/data
conf/mapred-site.xml
mapred.job.tracker localhost:9001
conf/msters
localhost
conf/slaves
localhost
conf/hadoop-env.sh
line 9 : JAVA_HOME = 환경변수에 추가해도 JAVA HOME PATH 를 찾는다.(필수)
line 10 : HADOOP_HOME_WARN_SUPPRESS = HADOOP_HOME 추가했을 때 주의 메시지 안뜨게 한다.(옵션)
line 17 : HADOOP_HEAPSIZE = 데몬이 사용 할 Heapsize 설정한다. 기본 1gb 이지만 500mb 으로 수정(옵션)
# Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jdk1.6.0_32 export HADOOP_HOME_WARN_SUPPRESS="TRUE" #doop-env.sh Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. # export HADOOP_HEAPSIZE=2000 export HADOOP_HEAPSIZE=500 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. # export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10--
Format Hadoop namenode
$ hadoop namenode -format
12/06/04 17:00:09 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = test/192.168.116.140
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/brances/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC2012
************************************************************/
Re-format filesystem in /home/hadoop/work/name ? (Y or N) Y
12/06/04 17:00:15 INFO util.GSet: VM type = 64-bit
12/06/04 17:00:15 INFO util.GSet: 2% max memory = 9.6675 MB
12/06/04 17:00:15 INFO util.GSet: capacity = 2^20 = 1048576 entries
12/06/04 17:00:15 INFO util.GSet: recommended=1048576, actual=1048576
12/06/04 17:00:15 INFO namenode.FSNamesystem: fsOwner=hadoop
12/06/04 17:00:15 INFO namenode.FSNamesystem: supergroup=supergroup
12/06/04 17:00:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/06/04 17:00:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=10
12/06/04 17:00:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false acessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/06/04 17:00:15 INFO namenode.NameNode: Caching file names occuring more han 10 times
12/06/04 17:00:16 INFO common.Storage: Image file of size 112 saved in 0 seonds.
12/06/04 17:00:16 INFO common.Storage: Storage directory /home/hadoop/work/ame has been successfully formatted.
12/06/04 17:00:16 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at test/192.168.116.140
************************************************************/
Start All Hadoop
$ start-all.sh
starting namenode, logging to /var/hadoop-1.0.3/libexec/../logs/hadoop-hadop-namenode-dist.out
localhost: starting datanode, logging to /var/hadoop-1.0.3/libexec/../logs/adoop-hadoop-datanode- dist.out
localhost: starting secondarynamenode, logging to /var/hadoop-1.0.3/libexec../logs/hadoop-hadoop-secondarynamenode- dist.out
starting jobtracker, logging to /var/hadoop-1.0.3/libexec/../logs/hadoop-haoop-jobtracker- dist .out
localhost: starting tasktracker, logging to /var/hadoop-1.0.3/libexec/../los/hadoop-hadoop-tasktracker- dist .out
[hadoop@dist hadoop-1.0.3]$ netstat -a | grep 500
테스트
write Test
$ hadoop jar /var/hadoop-1.0.3/hadoop-test-1.0.3.jar TestDFSIO -write -nrFiles 2 -fileSize 30 TestDFSIO.0.0.4 12/06/04 17:29:42 INFO fs.TestDFSIO: nrFiles = 2 12/06/04 17:29:42 INFO fs.TestDFSIO: fileSize (MB) = 30 12/06/04 17:29:42 INFO fs.TestDFSIO: bufferSize = 1000000 12/06/04 17:29:42 INFO fs.TestDFSIO: creating control file: 30 mega bytes, 2 file s 12/06/04 17:29:43 INFO fs.TestDFSIO: created control files for: 2 files 12/06/04 17:29:43 INFO mapred.FileInputFormat: Total input paths to process : 2 12/06/04 17:29:44 INFO mapred.JobClient: Running job: job_201206041700_0008 12/06/04 17:29:45 INFO mapred.JobClient: map 0% reduce 0% 12/06/04 17:30:08 INFO mapred.JobClient: map 100% reduce 0% 12/06/04 17:30:26 INFO mapred.JobClient: map 100% reduce 100% 12/06/04 17:30:31 INFO mapred.JobClient: Job complete: job_201206041700_0008 12/06/04 17:30:31 INFO mapred.JobClient: Counters: 30 12/06/04 17:30:31 INFO mapred.JobClient: Job Counters 12/06/04 17:30:31 INFO mapred.JobClient: Launched reduce tasks=1 12/06/04 17:30:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32981 12/06/04 17:30:31 INFO mapred.JobClient: Total time spent by all reduces wait ing after reserving slots (ms)=0 12/06/04 17:30:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/04 17:30:31 INFO mapred.JobClient: Launched map tasks=2 12/06/04 17:30:31 INFO mapred.JobClient: Data-local map tasks=2 12/06/04 17:30:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15194 12/06/04 17:30:31 INFO mapred.JobClient: File Input Format Counters 12/06/04 17:30:31 INFO mapred.JobClient: Bytes Read=224 12/06/04 17:30:31 INFO mapred.JobClient: File Output Format Counters 12/06/04 17:30:31 INFO mapred.JobClient: Bytes Written=73 12/06/04 17:30:31 INFO mapred.JobClient: FileSystemCounters 12/06/04 17:30:31 INFO mapred.JobClient: FILE_BYTES_READ=173 12/06/04 17:30:31 INFO mapred.JobClient: HDFS_BYTES_READ=472 12/06/04 17:30:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64992 12/06/04 17:30:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=62914633 12/06/04 17:30:31 INFO mapred.JobClient: Map-Reduce Framework 12/06/04 17:30:31 INFO mapred.JobClient: Map output materialized bytes=179 12/06/04 17:30:31 INFO mapred.JobClient: Map input records=2 12/06/04 17:30:31 INFO mapred.JobClient: Reduce shuffle bytes=179 12/06/04 17:30:31 INFO mapred.JobClient: Spilled Records=20 12/06/04 17:30:31 INFO mapred.JobClient: Map output bytes=147 12/06/04 17:30:31 INFO mapred.JobClient: Total committed heap usage (bytes)=4 82017280 12/06/04 17:30:31 INFO mapred.JobClient: CPU time spent (ms)=5460 12/06/04 17:30:31 INFO mapred.JobClient: Map input bytes=52 12/06/04 17:30:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=248 12/06/04 17:30:31 INFO mapred.JobClient: Combine input records=0 12/06/04 17:30:31 INFO mapred.JobClient: Reduce input records=10 12/06/04 17:30:31 INFO mapred.JobClient: Reduce input groups=5 12/06/04 17:30:31 INFO mapred.JobClient: Combine output records=0 12/06/04 17:30:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=486 363136 12/06/04 17:30:31 INFO mapred.JobClient: Reduce output records=5 12/06/04 17:30:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3150 966784 12/06/04 17:30:31 INFO mapred.JobClient: Map output records=10 12/06/04 17:30:31 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 12/06/04 17:30:31 INFO fs.TestDFSIO: Date & time: Mon Jun 04 17:30:31 KST 2012 12/06/04 17:30:31 INFO fs.TestDFSIO: Number of files: 2 12/06/04 17:30:31 INFO fs.TestDFSIO: Total MBytes processed: 60 12/06/04 17:30:31 INFO fs.TestDFSIO: Throughput mb/sec: 12.430080795525171 12/06/04 17:30:31 INFO fs.TestDFSIO: Average IO rate mb/sec: 12.430662155151367 12/06/04 17:30:31 INFO fs.TestDFSIO: IO rate std deviation: 0.08504913397967223 12/06/04 17:30:31 INFO fs.TestDFSIO: Test exec time sec: 48.118 12/06/04 17:30:31 INFO fs.TestDFSIO:
read Test
>$ hadoop jar /var/hadoop-1.0.3/hadoop-test-1.0.3.jar TestDFSIO -read -nrfiles 2 -fileSize 30 TestDFSIO.0.0.4 12/06/04 17:35:20 INFO fs.TestDFSIO: nrFiles = 1 12/06/04 17:35:20 INFO fs.TestDFSIO: fileSize (MB) = 30 12/06/04 17:35:20 INFO fs.TestDFSIO: bufferSize = 1000000 12/06/04 17:35:21 INFO fs.TestDFSIO: creating control file: 30 mega bytes, 1 file s 12/06/04 17:35:21 INFO fs.TestDFSIO: created control files for: 1 files 12/06/04 17:35:22 INFO mapred.FileInputFormat: Total input paths to process : 1 12/06/04 17:35:22 INFO mapred.JobClient: Running job: job_201206041700_0010 12/06/04 17:35:23 INFO mapred.JobClient: map 0% reduce 0% 12/06/04 17:35:41 INFO mapred.JobClient: map 100% reduce 0% 12/06/04 17:35:56 INFO mapred.JobClient: map 100% reduce 100% 12/06/04 17:36:01 INFO mapred.JobClient: Job complete: job_201206041700_0010 12/06/04 17:36:01 INFO mapred.JobClient: Counters: 30 12/06/04 17:36:01 INFO mapred.JobClient: Job Counters 12/06/04 17:36:01 INFO mapred.JobClient: Launched reduce tasks=1 12/06/04 17:36:01 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16235 12/06/04 17:36:01 INFO mapred.JobClient: Total time spent by all reduces wait ing after reserving slots (ms)=0 12/06/04 17:36:01 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/04 17:36:01 INFO mapred.JobClient: Launched map tasks=1 12/06/04 17:36:01 INFO mapred.JobClient: Data-local map tasks=1 12/06/04 17:36:01 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14648 12/06/04 17:36:01 INFO mapred.JobClient: File Input Format Counters 12/06/04 17:36:01 INFO mapred.JobClient: Bytes Read=112 12/06/04 17:36:01 INFO mapred.JobClient: File Output Format Counters 12/06/04 17:36:01 INFO mapred.JobClient: Bytes Written=73 12/06/04 17:36:01 INFO mapred.JobClient: FileSystemCounters 12/06/04 17:36:01 INFO mapred.JobClient: FILE_BYTES_READ=89 12/06/04 17:36:01 INFO mapred.JobClient: HDFS_BYTES_READ=31457516 12/06/04 17:36:01 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43257 12/06/04 17:36:01 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=73 12/06/04 17:36:01 INFO mapred.JobClient: Map-Reduce Framework 12/06/04 17:36:01 INFO mapred.JobClient: Map output materialized bytes=89 12/06/04 17:36:01 INFO mapred.JobClient: Map input records=1 12/06/04 17:36:01 INFO mapred.JobClient: Reduce shuffle bytes=89 12/06/04 17:36:01 INFO mapred.JobClient: Spilled Records=10 12/06/04 17:36:01 INFO mapred.JobClient: Map output bytes=73 12/06/04 17:36:01 INFO mapred.JobClient: Total committed heap usage (bytes)=3 03366144 12/06/04 17:36:01 INFO mapred.JobClient: CPU time spent (ms)=2550 12/06/04 17:36:01 INFO mapred.JobClient: Map input bytes=26 12/06/04 17:36:01 INFO mapred.JobClient: SPLIT_RAW_BYTES=124 12/06/04 17:36:01 INFO mapred.JobClient: Combine input records=0 12/06/04 17:36:01 INFO mapred.JobClient: Reduce input records=5 12/06/04 17:36:01 INFO mapred.JobClient: Reduce input groups=5 12/06/04 17:36:01 INFO mapred.JobClient: Combine output records=0 12/06/04 17:36:01 INFO mapred.JobClient: Physical memory (bytes) snapshot=290 181120 12/06/04 17:36:01 INFO mapred.JobClient: Reduce output records=5 12/06/04 17:36:01 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2100 297728 12/06/04 17:36:01 INFO mapred.JobClient: Map output records=5 12/06/04 17:36:01 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read 12/06/04 17:36:01 INFO fs.TestDFSIO: Date & time: Mon Jun 04 17:36:01 KST 2012 12/06/04 17:36:01 INFO fs.TestDFSIO: Number of files: 1 12/06/04 17:36:01 INFO fs.TestDFSIO: Total MBytes processed: 30 12/06/04 17:36:01 INFO fs.TestDFSIO: Throughput mb/sec: 54.054054054054056 12/06/04 17:36:01 INFO fs.TestDFSIO: Average IO rate mb/sec: 54.054054260253906 12/06/04 17:36:01 INFO fs.TestDFSIO: IO rate std deviation: 0.006192093872671104 12/06/04 17:36:01 INFO fs.TestDFSIO: Test exec time sec: 39.889 12/06/04 17:36:01 INFO fs.TestDFSIO:
Troubleshooting Hadoop
Issue : Connection refused
Solution
ssh 설정
이건 잘된다.. Cluster Mode 는 왜 안되는걸까 ㅠㅠ
&
'빅데이터' 카테고리의 다른 글
hadoop site (0) | 2012.06.04 |
---|---|
[Error] incompatible namespaceids in namenode (0) | 2012.05.29 |
하둡 설정 파일 동기화 (0) | 2012.05.29 |
Hadoop Deprecated Properties (0) | 2012.05.29 |
관리 도구 (0) | 2012.05.29 |