手机版
你好,游客 登录 注册
背景:
阅读新闻

Hadoop2.4.1尝鲜部署+完整版配置文件

[日期:2014-09-07] 来源:CSDN  作者:Gandalf_lee [字体: ]

引言

转眼间,Hadoop的stable版本已经升级到2.4.1了,社区的力量真是强大!3.0啥时候release呢?

今天做了个调研,尝鲜了一下2.4.1版本的分布式部署,包括NN HA(目前已经部署好了2.2.0的NN HA,ZK和ZKFC用现成的),顺便也结合官方文档 http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html  梳理、补全了关键的配置文件属性,将同类属性归类,方便以后阅读修改,及作为模板使用。

下面记录参照官方文档及过去经验部署2.4.1的过程。

--------------------------------------------------------------------------------

注意
1.本文只记录配置文件,不记录其余部署过程,其余过程和2.2.0相同,参见

http://www.linuxidc.com/Linux/2014-09/106289.htm

http://www.linuxidc.com/Linux/2014-09/106292.htm

2.配置中所有的路径、IP、hostname均需根据实际情况修改。

--------------------------------------------------------------------------------

Ubuntu 13.04上搭建Hadoop环境 http://www.linuxidc.com/Linux/2013-06/86106.htm

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置 http://www.linuxidc.com/Linux/2013-09/90600.htm

Ubuntu上搭建Hadoop环境(单机模式+伪分布模式) http://www.linuxidc.com/Linux/2013-01/77681.htm

Ubuntu下Hadoop环境的配置 http://www.linuxidc.com/Linux/2012-11/74539.htm

单机版搭建Hadoop环境图文教程详解 http://www.linuxidc.com/Linux/2012-02/53927.htm

搭建Hadoop环境(在Winodws环境下用虚拟机虚拟两个Ubuntu系统进行搭建) http://www.linuxidc.com/Linux/2011-12/48894.htm

--------------------------------------------------------------------------------

1.实验环境:
4节点集群,ZK节点3个,hosts文件和各节点角色分配如下:
hosts:
192.168.66.91 master
192.168.66.92 slave1
192.168.66.93 slave2
192.168.66.94 slave3


角色分配:
  Active NN Standby NN DN JournalNode Zookeeper FailoverController
master V    V V V
slave1  V V V V V
slave2    V V V 
slave3    V 

--------------------------------------------------------------------------------

2.hadoop-env.sh  修改以下三处即可
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_07


# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by the user that will run the hadoop daemons.  Otherwise there is the potential for a symlink attack.
export HADOOP_PID_DIR=/home/yarn/Hadoop/hadoop-2.4.1/hadoop_pid_dir
export HADOOP_SECURE_DN_PID_DIR=/home/yarn/Hadoop/hadoop-2.4.1/hadoop_pid_dir

--------------------------------------------------------------------------------

3.core-site.xml 完整文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you
    may not use this file except in compliance with the License. You may obtain
    a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
    required by applicable law or agreed to in writing, software distributed
    under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
    OR CONDITIONS OF ANY KIND, either express or implied. See the License for
    the specific language governing permissions and limitations under the License.
    See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://myhadoop</value>
        <description>NameNode UR,格式是hdfs://host:port/,如果开启了NN
            HA特性,则配置集群的逻辑名,具体参见我的 http://www.linuxidc.com/Linux/2014-09/106292.htm
        </description>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
        <description>Size of read/write buffer used in SequenceFiles.
        </description>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>master:2181,slave1:2181,slave2:2181</value>
        <description>注意,配置了ZK以后,在格式化、启动NameNode之前必须先启动ZK,否则会报连接错误
        </description>
    </property>
</configuration> 


--------------------------------------------------------------------------------

4.hdfs-site.xml  完整文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you
    may not use this file except in compliance with the License. You may obtain
    a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
    required by applicable law or agreed to in writing, software distributed
    under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
    OR CONDITIONS OF ANY KIND, either express or implied. See the License for
    the specific language governing permissions and limitations under the License.
    See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- NN HA related configuration **BEGIN** -->
    <property>
        <name>dfs.nameservices</name>
        <value>myhadoop</value>
        <description>
            Comma-separated list of nameservices.
            as same as fs.defaultFS in core-site.xml.
        </description>
    </property>
    <property>
        <name>dfs.ha.namenodes.myhadoop</name>
        <value>nn1,nn2</value>
        <description>
            The prefix for a given nameservice, contains a comma-separated
            list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
        </description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.myhadoop.nn1</name>
        <value>master:8020</value>
        <description>
            RPC address for nomenode1 of hadoop-test
        </description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.myhadoop.nn2</name>
        <value>slave1:8020</value>
        <description>
            RPC address for nomenode2 of hadoop-test
        </description>
    </property>
    <property>
        <name>dfs.namenode.http-address.myhadoop.nn1</name>
        <value>master:50070</value>
        <description>
            The address and the base port where the dfs namenode1 web ui will listen
            on.
        </description>
    </property>
    <property>
        <name>dfs.namenode.http-address.myhadoop.nn2</name>
        <value>slave1:50070</value>
        <description>
            The address and the base port where the dfs namenode2 web ui will listen
            on.
        </description>
    </property>
    <property>
        <name>dfs.namenode.servicerpc-address.myhadoop.n1</name>
        <value>master:53310</value>
    </property>
    <property>
        <name>dfs.namenode.servicerpc-address.myhadoop.n2</name>
        <value>slave1:53310</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
        <description>
            Whether automatic failover is enabled. See the HDFS High
            Availability documentation for details on automatic HA
            configuration.
        </description>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.myhadoop</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
        </value>
        <description>Configure the name of the Java class which will be used
            by the DFS Client to determine which NameNode is the current Active,
            and therefore which NameNode is currently serving client requests.
            这个类是Client的访问代理,是HA特性对于Client透明的关键!
        </description>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
        <description>how to communicate in the switch process</description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/yarn/.ssh/id_rsa</value>
        <description>the location stored ssh key</description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>1000</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/hdfs_dir/journal/</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://master:8485;slave1:8485;slave2:8485/hadoop-journal
        </value>
        <description>A directory on shared storage between the multiple
            namenodes
            in an HA cluster. This directory will be written by the active and read
            by the standby in order to keep the namespaces synchronized. This
            directory
            does not need to be listed in dfs.namenode.edits.dir above. It should be
            left empty in a non-HA cluster.
        </description>
    </property>
    <!-- NN HA related configuration **END** -->
    <!-- NameNode related configuration **BEGIN** -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/yarn/Hadoop/hadoop-2.4.1/hdfs_dir/name</value>
        <description>Path on the local filesystem where the NameNode stores
            the namespace and transactions logs persistently.If this is a
            comma-delimited list of directories then the name table is replicated
            in all of the directories, for redundancy.</description>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>1048576</value>
        <description>
        HDFS blocksize of 128MB for large file-systems.
        Minimum block size is 1048576.
        </description>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>10</value>
        <description>More NameNode server threads to handle RPCs from large
            number of DataNodes.</description>
    </property>
    <!-- <property> <name>dfs.namenode.hosts</name> <value>master</value> <description>If
        necessary, use this to control the list of allowable datanodes.</description>
        </property> <property> <name>dfs.namenode.hosts.exclude</name> <value>slave1,slave2,slave3</value>
        <description>If necessary, use this to control the list of exclude datanodes.</description>
        </property> -->
    <!-- NameNode related configuration **END** -->
    <!-- DataNode related configuration **BEGIN** -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///home/yarn/Hadoop/hadoop-2.4.1/hdfs_dir/data</value>
        <description>Comma separated list of paths on the local filesystem of
            a DataNode where it should store its blocks.If this is a
            comma-delimited list of directories, then data will be stored in all
            named directories, typically on different devices.</description>
    </property>
    <!-- DataNode related configuration **END** -->
</configuration> 


--------------------------------------------------------------------------------
5.yarn-site.xml

<?xml version="1.0"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you
    may not use this file except in compliance with the License. You may obtain
    a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
    required by applicable law or agreed to in writing, software distributed
    under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
    OR CONDITIONS OF ANY KIND, either express or implied. See the License for
    the specific language governing permissions and limitations under the License.
    See accompanying LICENSE file. -->
<configuration>
    <!-- ResourceManager and NodeManager related configuration ***BEGIN*** -->
    <property>
        <name>yarn.acl.enable</name>
        <value>false</value>
        <description>Enable ACLs? Defaults to false.</description>
    </property>
    <property>
        <name>yarn.admin.acl</name>
        <value>*</value>
        <description>
        ACL to set admins on the cluster. ACLs are of for comma-separated-usersspace comma-separated-groups.
        Defaults to special value of * which means anyone. Special value of just space means no one has access.
        </description>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>false</value>
        <description>Configuration to enable or disable log aggregation</description>
    </property>
    <!-- ResourceManager and NodeManager related configuration ***END*** -->
   
    <!-- ResourceManager related configuration ***BEGIN*** -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
        <description>The hostname of the RM.</description>
    </property>
   
    <property>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
        <description>The https adddress of the RM web application.</description>
    </property>
   
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
        <description>ResourceManager host:port for clients to submit jobs.</description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
        <description>ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.</description>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
        <description>ResourceManager host:port for NodeManagers.</description>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
        <description>ResourceManager host:port for administrative commands.</description>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
        <description>ResourceManager web-ui host:port.</description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
        <description>
        ResourceManager Scheduler class.
        CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
        </description>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
        <description>
        Minimum limit of memory to allocate to each container request at the Resource Manager.   
        In MBs
        </description>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <description>
        Maximum limit of memory to allocate to each container request at the Resource Manager.   
        In MBs.
        According to my configuration,yarn.scheduler.maximum-allocation-mb > yarn.nodemanager.resource.memory-mb
        </description>
    </property>
   
    <!--
    <property>
        <name>yarn.resourcemanager.nodes.include-path</name>
        <value></value>
        <description>
        List of permitted NodeManagers.   
        If necessary, use this to control the list of allowable NodeManagers.
        </description>
    </property>
    <property>
        <name>yarn.resourcemanager.nodes.exclude-path</name>
        <value></value>
        <description>
        List of exclude NodeManagers.   
        If necessary, use this to control the list of exclude NodeManagers.
        </description>
    </property>
    -->
    <!-- ResourceManager related configuration ***END*** -->
   
    <!-- NodeManager related configuration ***BEGIN*** -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
        <description>
        Resource i.e. available physical memory, in MB, for given NodeManager.   
        Defines total available resources on the NodeManager to be made available to running containers.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
        <description>
        Ratio between virtual memory to physical memory when setting memory limits for containers.
        Container allocations are expressed in terms of physical memory,
        and virtual memory usage is allowed to exceed this allocation by this ratio.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/yarn_dir/local</value>
        <description>
        Comma-separated list of paths on the local filesystem where intermediate data is written.
        Multiple paths help spread disk i/o.
        </description>
    </property>   
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/home/yarn/Hadoop/hadoop-2.4.1/yarn_dir/log</value>
        <description>
        Comma-separated list of paths on the local filesystem where logs are written.   
        Multiple paths help spread disk i/o.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
        <description>
        Default time (in seconds) to retain log files on the NodeManager.
        ***Only applicable if log-aggregation is disabled.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/yarn/log-aggregation</value>
        <description>
        HDFS directory where the application logs are moved on application completion.
        Need to set appropriate permissions.
        ***Only applicable if log-aggregation is enabled.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
        <description>
        Suffix appended to the remote log dir.
        Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam}.
        ***Only applicable if log-aggregation is enabled.
        </description>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>Shuffle service that needs to be set for Map Reduce applications.</description>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
        <description>Number of CPU cores that can be allocated for containers.</description>
    </property>
    <!-- NodeManager related configuration ***END*** -->
   
    <!-- History Server related configuration ***BEGIN*** -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>-1</value>
        <description>
        How long to keep aggregation logs before deleting them.
        -1 disables.
        Be careful, set this too small and you will spam the name node.
        </description>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>-1</value>
        <description>
        Time between checks for aggregated log retention.
        If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time.
        Be careful, set this too small and you will spam the name node.
        </description>
    </property>
    <!-- History Server related configuration ***END*** -->
   
    <property>
        <name>yarn.scheduler.fair.allocation.file</name>
        <value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
        <description>fairscheduler config file path</description>
        <!-- 官网文档居然找不到该属性!但该属性还是work的! -->
    </property>
</configuration> 

更多详情见请继续阅读下一页的精彩内容http://www.linuxidc.com/Linux/2014-09/106291p2.htm

linux
相关资讯       Hadoop部署  Hadoop2.4.1 
本文评论   查看全部评论 (1)
表情: 表情 姓名: 字数

       

评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款
第 1 楼
* liuxl_nwpu会员 发表于 2014/10/23 19:37:55
楼主您好,我刚刚在Windons系统安装了Cygwin,并且配置了SSH服务都成功了,在配置Hadoop 2.4.1的时候弄完配置文件最后格式化的时候总提示错误这是什么原因,楼主有时间的话给我解决下难题。QQ:463053187 ,谢谢了