Hadoop 2.7.0 Multi Node Cluster Setup on Ubuntu 15.04



Namenode > hadoopmnmaster > 192.168.56.11

Datanodes >  hadoopmnslave1 > 192.168.56.12
                      hadoopmnslave2 > 192.168.56.13
                      hadoopmnslave3 > 192.168.56.14

Clone Hadoop Single node cluster as hadoopmaster

Hadoopmaster Node

          $ sudo nano /etc/hosts

                      hadoopmnmaster   192.168.56.11
                      hadoopmnslave1   192.168.56.12
                      hadoopmnslave2   192.168.56.13
                      hadoopmnslave3   192.168.56.14

          $ sudo nano /etc/hostname

                      hadoopmnmaster

          $ cd /usr/local/hadoop/etc/hadoop

          $ sudo nano core-site.xml

                       replace localhost as hadoopmnmaster

          $ sudo nano hdfs-site.xml

                       replace value 1 as 3 (represents no of datanode)

          $ sudo nano yarn-site.xml

                       add the following configuration

                       <configuration>
                              <property>
                                  <name>yarn.resourcemanager.resource-tracker.address</name>
                                  <value>hadoopmnmaster:8025</value>
                       <property>
                       <property>
                                  <name>yarn.resourcemanager.scheduler.address</name>
                                  <value>hadoopmnmaster:8030</value>
                       <property>
                       <property>
                                  <name>yarn.resourcemanager.address</name>
                                  <value>hadoopmnmaster:8050</value>
                             </property>
                       </configuration>

          $ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

                       remove dfs.namenode.name.dir property section

          $ sudo rm -rf /usr/local/hadoop/hadoop_data

          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

          $ sudo chown -R chaal:chaal /usr/local/hadoop

Reboot hadoopmaster node

Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3

Hadoopslave Node (conf should be done on each slavenode)

          $ sudo nano /etc/hostname

                      hadoopmnslave<number>

reboot all nodes

Hadoopmaster Node

          $ sudo nano /usr/local/hadoop/etc/hadoop/masters

                       hadoopmnmaster

          $ sudo nano /usr/local/hadoop/etc/hadoop/slaves

                       remove localhost and add 

                       hadoopmnslave1
                       hadoopmnslave2
                       hadoopmnslave3

          $ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

                       replace dfs.datanode.data.dir property section

                       as dfs.namenode.name.dir 

          $ sudo rm -rf /usr/local/hadoop/hadoop_data

          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

          $ sudo chown -R chaal:chaal /usr/local/hadoop

          $ hadoop namenode -format

          $ start-all.sh

          $ jps (check in all 3 datanodes)


http://hadoopmnmaster:8088/


http://hadoopmnmaster:50070/

6 comments:

  1. Sascha Kruszka (TinyDragon)May 24, 2015 at 3:15 PM

    If you experience trouble with the datanodes and you see something like "in_use.lock acquired by nodename" in datanode log file, you can try this.

    On Master:

    stop-all.sh

    On Slaves:

    sudo rm -Rf /usr/local/hadoop/hadoop_store/hdfs/datanode
    sudo mkdir -p /usr/local/hadoop/hadoop_store/hdfs/datanode
    sudo chown -R chaal:chaal /usr/local/hadoop


    On Master:


    start-all.sh


    Further Information:


    The slave nodes may have old nameNode information. With this steps you are clearing the datanode directory. The slave node will reinitiate the datanode directory on next start

    BR

    ReplyDelete
  2. Is YARN required? Why should I use this here? And please tell me how to setup this if I don't want to use YARN?

    ReplyDelete
  3. don't mention it in xml
    check documentation for clear steps

    ReplyDelete
  4. ok. but tell me what is use of YARN

    ReplyDelete
  5. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

    ReplyDelete

 

Flickr Photostream

Twitter Updates