测试实践
1、通过Shell命令行,启动Hadoop
(1)格式化HDFS
[www.linuxidc.com @localhost hadoop-0.20.0]$ bin/hadoop namenode -format
格式化结果:
- 10/10/08 08:21:28 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = localhost/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.0
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009
- ************************************************************/
- 10/10/08 08:21:28 INFO namenode.FSNamesystem: fsOwner=shirdrn,shirdrn
- 10/10/08 08:21:28 INFO namenode.FSNamesystem: supergroup=supergroup
- 10/10/08 08:21:28 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 10/10/08 08:21:28 INFO common.Storage: Image file of size 97 saved in 0 seconds.
- 10/10/08 08:21:28 INFO common.Storage: Storage directory /tmp/hadoop/hadoop-shirdrn/dfs/name has been successfully formatted.
- 10/10/08 08:21:28 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
- ************************************************************/
(2)启动Hadoop后台线程
[www.linuxidc.com @localhost hadoop-0.20.0]$ bin/start-all.sh
执行结果:
- starting namenode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-namenode-localhost.out
- localhost: starting datanode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-datanode-localhost.out
- localhost: starting secondarynamenode, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-secondarynamenode-localhost.out
- starting jobtracker, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-jobtracker-localhost.out
- localhost: starting tasktracker, logging to /home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0/logs/hadoop-shirdrn-tasktracker-localhost.out
(3)查看确认进程是否全部启动
[www.linuxidc.com @localhost hadoop-0.20.0]$ jps
8100 DataNode
8398 TaskTracker
8230 SecondaryNameNode
7994 NameNode
8301 JobTracker
8459 Jps
可见,正常启动。
2、准备测试数据
上传测试数据,执行命令:
[www.linuxidc.com @localhost hadoop-0.20.0]$ bin/hadoop fs -put conf/ input
如果没有报错,说明上传成功。
可以通过如下命令进行验证:
- [www.linuxidc.com @localhost hadoop-0.20.0]$ bin/hadoop fs -ls /user/shirdrn/input
- Found 13 items
- -rw-r--r-- 1 shirdrn supergroup 6275 2010-10-08 08:24 /user/shirdrn/input/capacity-scheduler.xml
- -rw-r--r-- 1 shirdrn supergroup 535 2010-10-08 08:24 /user/shirdrn/input/configuration.xsl
- -rw-r--r-- 1 shirdrn supergroup 388 2010-10-08 08:24 /user/shirdrn/input/core-site.xml
- -rw-r--r-- 1 shirdrn supergroup 2396 2010-10-08 08:24 /user/shirdrn/input/hadoop-env.sh
- -rw-r--r-- 1 shirdrn supergroup 1245 2010-10-08 08:24 /user/shirdrn/input/hadoop-metrics.properties
- -rw-r--r-- 1 shirdrn supergroup 4190 2010-10-08 08:24 /user/shirdrn/input/hadoop-policy.xml
- -rw-r--r-- 1 shirdrn supergroup 259 2010-10-08 08:24 /user/shirdrn/input/hdfs-site.xml
- -rw-r--r-- 1 shirdrn supergroup 2815 2010-10-08 08:24 /user/shirdrn/input/log4j.properties
- -rw-r--r-- 1 shirdrn supergroup 275 2010-10-08 08:24 /user/shirdrn/input/mapred-site.xml
- -rw-r--r-- 1 shirdrn supergroup 10 2010-10-08 08:24 /user/shirdrn/input/masters
- -rw-r--r-- 1 shirdrn supergroup 10 2010-10-08 08:24 /user/shirdrn/input/slaves
- -rw-r--r-- 1 shirdrn supergroup 1243 2010-10-08 08:24 /user/shirdrn/input/ssl-client.xml.example
- -rw-r--r-- 1 shirdrn supergroup 1195 2010-10-08 08:24 /user/shirdrn/input/ssl-server.xml.example
3、在Eclipse上进行开发
(1)启动Eclipse 3.5.2,设置工作目录为/home/shirdrn/eclipse/eclipse-3.5.2/workspace。
这时,切换到Open Perspective,可以看到Map/Reduce视图,切换到这个视图,可以看到Eclipse IDE左侧的Project Explorer中出现了DFS Locations。在后面的实践中,我们创建一个Map/Reduce项目的时候,就会看到DFS Locations会显示当前相关的HDFS上的资源目录情况。
(2)创建并配置Map/Reduce项目
创建一个Map/Reduce项目,名称为hadoop,这时,需要在该选项卡上看到“Configure Hadoop install directory...”链接,打开,配置内容为,我们前面指定的$HADOOP_HOME的目录,即为/home/shirdrn/eclipse/eclipse-3.5.2/hadoop/hadoop-0.20.0;
点击Next,直到Finish,可以看到Eclipse IDE左侧的Project Explorer中的项目,展开后,可以看到除了src资源文文件以外,还有很多Hadoop相关的jar文件;
选中该项目hadoop,在包org.shirdrn.hadoop中分别创建Hadoop发行包中自带的、经过我们分解的WordCount例子的源代码,如下所示:
Mapper类为TokenizerMapper.java,如下所示:
- package org.shirdrn.hadoop;
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- public class TokenizerMapper extends
- Mapper<Object, Text, Text, IntWritable> {
- private final static IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(Object key, Text value, Context context)
- throws IOException, InterruptedException {
- StringTokenizer itr = new StringTokenizer(value.toString());
- while (itr.hasMoreTokens()) {
- word.set(itr.nextToken());
- context.write(word, one);
- }
- }
- }
Reducer类为IntSumReducer.java,如下所示:
- package org.shirdrn.hadoop;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- public class IntSumReducer extends
- Reducer<Text, IntWritable, Text, IntWritable> {
- private IntWritable result = new IntWritable();
- public void reduce(Text key, Iterable<IntWritable> values, Context context)
- throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- result.set(sum);
- context.write(key, result);
- }
- }
MapReduce Driver类为WordCount.java,如下所示:
- package org.shirdrn.hadoop;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.util.GenericOptionsParser;
- public class WordCount {
- public static void main(String[] args) throws Exception {
- Configuration conf = new Configuration();
- String[] otherArgs = new GenericOptionsParser(conf, args)
- .getRemainingArgs();
- if (otherArgs.length != 2) {
- System.err.println("Usage: wordcount <in> <out>");
- System.exit(2);
- }
- Job job = new Job(conf, "word count");
- job.setJarByClass(WordCount.class);
- job.setMapperClass(TokenizerMapper.class);
- job.setCombinerClass(IntSumReducer.class);
- job.setReducerClass(IntSumReducer.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
- FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
- System.exit(job.waitForCompletion(true) ? 0 : 1);
- }
- }