Namenode > hadoopmnmaster > 192.168.56.11
Datanodes >  hadoopmnslave1 > 192.168.56.12
                      hadoopmnslave2 > 192.168.56.13
                      hadoopmnslave3 > 192.168.56.14
Clone Hadoop Single node cluster as hadoopmaster
Hadoopmaster Node
          $ sudo nano /etc/hosts
                      hadoopmnmaster   192.168.56.11
                      hadoopmnslave1   192.168.56.12
                      hadoopmnslave2   192.168.56.13
                      hadoopmnslave3   192.168.56.14
          $ sudo nano /etc/hostname
                      hadoopmnmaster
          $ cd /usr/local/hadoop/etc/hadoop
          $ sudo nano core-site.xml
                       replace localhost as hadoopmnmaster
          $ sudo nano hdfs-site.xml
                       replace value 1 as 3 (represents no of datanode)
          $ sudo nano yarn-site.xml
                       add the following configuration
<value>hadoopmnmaster:8030</value>
                       <property> 
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmnmaster:8050</value>
</property>
</configuration>
                       <configuration>
<property>
                                  <name>yarn.resourcemanager.resource-tracker.address </name>
<value>hadoopmnmaster:8025</value>
                       <property> 
<property>
                                  <name>yarn.resourcemanager.scheduler.address</name><property>
<value>hadoopmnmaster:8025</value>
<property>
<value>hadoopmnmaster:8030</value>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmnmaster:8050</value>
</property>
</configuration>
          $ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
                       remove dfs.namenode.name.dir property section
          $ sudo rm -rf /usr/local/hadoop/hadoop_data
          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
          $ sudo chown -R chaal:chaal /usr/local/hadoop
Reboot hadoopmaster node
Clone Hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3
Hadoopslave Node (conf should be done on each slavenode)
          $ sudo nano /etc/hostname
                      hadoopmnslave <number>
reboot all nodes
Hadoopmaster Node
          $ sudo nano /usr/local/hadoop/etc/hadoop/masters
                       hadoopmnmaster
          $ sudo nano /usr/local/hadoop/etc/hadoop/slaves
                       remove localhost and add 
                       hadoopmnslave1
                       hadoopmnslave2
                       hadoopmnslave3
          $ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
                       replace dfs.datanode.data.dir property section
                       as dfs.namenode.name.dir 
          $ sudo rm -rf /usr/local/hadoop/hadoop_data
          $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
          $ sudo chown -R chaal:chaal /usr/local/hadoop
          $ hadoop namenode -format
          $ start-all.sh
          $ jps (check in all 3 datanodes)
http://hadoopmnmaster:8088/
http://hadoopmnmaster:50070/

If you experience trouble with the datanodes and you see something like "in_use.lock acquired by nodename" in datanode log file, you can try this.
ReplyDeleteOn Master:
stop-all.sh
On Slaves:
sudo rm -Rf /usr/local/hadoop/hadoop_store/hdfs/datanode
sudo mkdir -p /usr/local/hadoop/hadoop_store/hdfs/datanode
sudo chown -R chaal:chaal /usr/local/hadoop
On Master:
start-all.sh
Further Information:
The slave nodes may have old nameNode information. With this steps you are clearing the datanode directory. The slave node will reinitiate the datanode directory on next start
BR
yes thats what i do all time :)
ReplyDeleteIs YARN required? Why should I use this here? And please tell me how to setup this if I don't want to use YARN?
ReplyDeletedon't mention it in xml
ReplyDeletecheck documentation for clear steps
ok. but tell me what is use of YARN
ReplyDeletehttp://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
ReplyDelete