4、使用Makefile进行打包运行
由于该版本的Eclipse plugin不能直接Run On Hadoop,解决的办法就是通过实现自己的Makefile来进行单独操作,实现Hadoop程序的运行。
针对上面我们使用的例子,编写如下Makefile:
- JarFile="WordCount-V0.1.jar"
- MainFunc="org.shirdrn.hadoop.WordCount"
- LocalOutDir="/tmp/output"
- all:help
- jar:
- jar -cvf ${JarFile} -C bin/ .
- run:
- hadoop jar ${JarFile} ${MainFunc} input output
- clean:
- hadoop fs -rmr output
- output:
- rm -rf ${LocalOutDir}
- hadoop fs -get output ${LocalOutDir}
- cat ${LocalOutDir}/part-r-00000
- help:
- @echo "Usage:"
- @echo " make jar - Build Jar File."
- @echo " make clean - Clean up Output directory on HDFS."
- @echo " make run - Run your MapReduce code on Hadoop."
- @echo " make output - Download and show output file"
- @echo " make help - Show Makefile options."
- @echo " "
- @echo "Example:"
- @echo " make jar; make run; make output; make clean"
(1)打包Jar文件
- [www.linuxidc.com @localhost hadoop]$ make jar
- jar -cvf "WordCount-V0.1.jar" -C bin/ .
- added manifest
- adding: org/(in = 0) (out= 0)(stored 0%)
- adding: org/shirdrn/(in = 0) (out= 0)(stored 0%)
- adding: org/shirdrn/hadoop/(in = 0) (out= 0)(stored 0%)
- adding: org/shirdrn/hadoop/IntSumReducer.class(in = 2320) (out= 901)(deflated 61%)
- adding: org/shirdrn/hadoop/WordCount.class(in = 2022) (out= 1066)(deflated 47%)
- adding: org/shirdrn/hadoop/TokenizerMapper.class(in = 2232) (out= 887)(deflated 60%)
(2)运行程序
- [www.linuxidc.com @localhost hadoop]$ make run
- hadoop jar "WordCount-V0.1.jar" "org.shirdrn.hadoop.WordCount" input output
- 10/10/08 08:46:54 INFO input.FileInputFormat: Total input paths to process : 13
- 10/10/08 08:46:55 INFO mapred.JobClient: Running job: job_201010080822_0001
- 10/10/08 08:46:56 INFO mapred.JobClient: map 0% reduce 0%
- 10/10/08 08:47:40 INFO mapred.JobClient: map 15% reduce 0%
- 10/10/08 08:47:59 INFO mapred.JobClient: map 30% reduce 0%
- 10/10/08 08:48:18 INFO mapred.JobClient: map 46% reduce 10%
- 10/10/08 08:48:24 INFO mapred.JobClient: map 61% reduce 15%
- 10/10/08 08:48:30 INFO mapred.JobClient: map 76% reduce 15%
- 10/10/08 08:48:33 INFO mapred.JobClient: map 76% reduce 20%
- 10/10/08 08:48:36 INFO mapred.JobClient: map 92% reduce 20%
- 10/10/08 08:48:44 INFO mapred.JobClient: map 100% reduce 25%
- 10/10/08 08:48:47 INFO mapred.JobClient: map 100% reduce 30%
- 10/10/08 08:48:55 INFO mapred.JobClient: map 100% reduce 100%
- 10/10/08 08:48:58 INFO mapred.JobClient: Job complete: job_201010080822_0001
- 10/10/08 08:48:58 INFO mapred.JobClient: Counters: 17
- 10/10/08 08:48:58 INFO mapred.JobClient: Job Counters
- 10/10/08 08:48:58 INFO mapred.JobClient: Launched reduce tasks=1
- 10/10/08 08:48:58 INFO mapred.JobClient: Launched map tasks=13
- 10/10/08 08:48:58 INFO mapred.JobClient: Data-local map tasks=13
- 10/10/08 08:48:58 INFO mapred.JobClient: FileSystemCounters
- 10/10/08 08:48:58 INFO mapred.JobClient: FILE_BYTES_READ=17108
- 10/10/08 08:48:58 INFO mapred.JobClient: HDFS_BYTES_READ=20836
- 10/10/08 08:48:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=34704
- 10/10/08 08:48:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=11807
- 10/10/08 08:48:58 INFO mapred.JobClient: Map-Reduce Framework
- 10/10/08 08:48:58 INFO mapred.JobClient: Reduce input groups=0
- 10/10/08 08:48:58 INFO mapred.JobClient: Combine output records=832
- 10/10/08 08:48:58 INFO mapred.JobClient: Map input records=624
- 10/10/08 08:48:58 INFO mapred.JobClient: Reduce shuffle bytes=17180
- 10/10/08 08:48:58 INFO mapred.JobClient: Reduce output records=0
- 10/10/08 08:48:58 INFO mapred.JobClient: Spilled Records=1664
- 10/10/08 08:48:58 INFO mapred.JobClient: Map output bytes=27728
- 10/10/08 08:48:58 INFO mapred.JobClient: Combine input records=2010
- 10/10/08 08:48:58 INFO mapred.JobClient: Map output records=2010
- 10/10/08 08:48:58 INFO mapred.JobClient: Reduce input records=832
(3)查看结果
- [www.linuxidc.com @localhost hadoop]$ make output
- version="1.0"> 1
- version="1.0"?> 8
- via 2
- virtual 3
- want 1
- when 1
- where 2
- where, 1
- whether 1
- which 8
- who 1
- will 8
- with 5
- worker 1
- would 5
- xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 1
上述截取了部分结果。
更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13