Difference between revisions of "Installing Hadoop on Zynq"

From Lofaro Lab Wiki
Jump to: navigation, search
 
(14 intermediate revisions by the same user not shown)
Line 134: Line 134:
 
$ vim ~/.bashrc   
 
$ vim ~/.bashrc   
 
#type  :$ to go to the last line and then press I to switch to Insert mode
 
#type  :$ to go to the last line and then press I to switch to Insert mode
 
+
       
# Set Hadoop-related environment variables  
+
        <nowiki>
export HADOOP_HOME=/usr/local/hadoop  
+
        # Set Hadoop-related environment variables  
export HADOOP_PREFIX=/usr/local/hadoop
+
        export HADOOP_HOME=/usr/local/hadoop  
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
+
        export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_COMMON_HOME=${HADOOP_HOME}
+
        export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
+
        export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
+
        export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
+
        export HADOOP_YARN_HOME=${HADOOP_HOME}
 +
        export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  
 
# Native Path
 
# Native Path
Line 148: Line 149:
 
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
 
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
  
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)  
+
        # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)  
export JAVA_HOME=/usr/local/java/jdk1.8.0_91   
+
        export JAVA_HOME=/usr/local/java/jdk1.8.0_91   
# Some convenient aliases and functions for running Hadoop-related commands  
+
        # Some convenient aliases and functions for running Hadoop-related commands  
  
unaliasfs&> /dev/null  
+
        unaliasfs&> /dev/null  
aliasfs="hadoop fs"  
+
        aliasfs="hadoop fs"  
unaliashls&> /dev/null  
+
        unaliashls&> /dev/null  
aliashls="fs -ls"   
+
        aliashls="fs -ls"   
 +
       
 +
   
  
  
# If you have LZO compression enabled in your Hadoop cluster and  
+
        # If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):  
+
        # compress job outputs with LZOP (not covered in this tutorial):  
# Conveniently inspect an LZOP compressed file from the command  
+
        # Conveniently inspect an LZOP compressed file from the command  
# line; run via:  
+
        # line; run via:  
#  
+
        #  
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo  
+
        # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo  
#
+
        #
# Requires installed 'lzop' command.  
+
        # Requires installed 'lzop' command.  
# lzohead () { hadoopfs -cat $1 | lzop -dc | head -1000 | less }   
+
        # lzohead () { hadoopfs -cat $1 | lzop -dc | head -1000 | less }   
# Add Hadoop bin/ directory to PATH  
+
        # Add Hadoop bin/ directory to PATH  
export PATH=$PATH:$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin  
+
        export PATH=$PATH:$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin  
 +
        </nowiki>
 
You need to close the terminal and open a new terminal to have the bash changes into effect. The shortcut to open the terminal is (Ctrl+Atl+t).
 
You need to close the terminal and open a new terminal to have the bash changes into effect. The shortcut to open the terminal is (Ctrl+Atl+t).
  
Line 174: Line 178:
 
5. Update yarn-site.xml
 
5. Update yarn-site.xml
 
$ sudo  vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
 
$ sudo  vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
 +
 +
 
Add the following snippets between the <configuration> ... </configuration> tags
 
Add the following snippets between the <configuration> ... </configuration> tags
 +
 
<property>
 
<property>
 
    <name>yarn.nodemanager.aux-services</name>
 
    <name>yarn.nodemanager.aux-services</name>
Line 180: Line 187:
 
   </property>
 
   </property>
 
   <property>
 
   <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
+
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
+
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 
   </property>
 
   </property>
 +
  
 
6. Update core-site.xml file
 
6. Update core-site.xml file
 
$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
 
$ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
 
Add the following snippets between the <configuration> ... </configuration> tags  
 
Add the following snippets between the <configuration> ... </configuration> tags  
 +
 +
        <nowiki>
 
<property>  
 
<property>  
<name>hadoop.tmp.dir</name>  
+
        <name>hadoop.tmp.dir</name>  
<value>/app/hadoop/tmp</value>  
+
        <value>/app/hadoop/tmp</value>  
<description>A base for other temporary directories.</description>
+
        <description>A base for other temporary directories.</description>
 
</property>   
 
</property>   
 
 
<property>
+
        <property>
<name>fs.default.name</name>
+
        <name>fs.default.name</name>
<value>hdfs://localhost:9000</value>  
+
        <value>hdfs://localhost:9000</value>  
<description>The name of the default file system.   
+
        <description>The name of the default file system.   
 
A URI whose scheme and authority determine the FileSystem  
 
A URI whose scheme and authority determine the FileSystem  
 
implementation.  The uri's scheme determines the config  
 
implementation.  The uri's scheme determines the config  
Line 202: Line 212:
 
implementation class. The uri's authority is used to determine  
 
implementation class. The uri's authority is used to determine  
 
the host, port, etc. for a filesystem.
 
the host, port, etc. for a filesystem.
</description>  
+
        </description>  
 
</property>
 
</property>
 +
        </nowiki>
 +
 +
  
 
Note: In hadoop 2.6 location is /usr/local/hadoop/etc/hadoop/yarn-site.xml
 
Note: In hadoop 2.6 location is /usr/local/hadoop/etc/hadoop/yarn-site.xml
  
 
7. Create the above temp folder and give appropriate permission
 
7. Create the above temp folder and give appropriate permission
sudo mkdir -p /app/hadoop/tmp
+
 
sudo chown hduser:hadoop -R /app/hadoop/tmp
+
 
sudo chmod 750 /app/hadoop/tmp
+
        sudo mkdir -p /app/hadoop/tmp
 +
        sudo chown hduser:hadoop -R /app/hadoop/tmp
 +
        sudo chmod 750 /app/hadoop/tmp
 +
 
 +
 
  
 
8. Create mapred-site.xml file from mapred-site.xml.template
 
8. Create mapred-site.xml file from mapred-site.xml.template
$ sudo cp  /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
+
 
 +
        $ sudo cp  /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
 +
 
 +
 
 
Add the following to /usr/local/hadoop/etc/hadoop/mapred-site.xml  between<configuration> ... </configuration>
 
Add the following to /usr/local/hadoop/etc/hadoop/mapred-site.xml  between<configuration> ... </configuration>
 
$ sudo vim  /usr/local/hadoop/etc/hadoop/mapred-site.xml
 
$ sudo vim  /usr/local/hadoop/etc/hadoop/mapred-site.xml
  <property>
+
    <name>mapreduce.framework.name</name>
+
        <property>
    <value>yarn</value>
+
            <name>mapreduce.framework.name</name>
  </property>
+
            <value>yarn</value>
  <property>
+
        </property>
    <name>mapreduce.jobhistory.address</name>
+
        <property>
  <value>localhost:10020</value>
+
            <name>mapreduce.jobhistory.address</name>
  <description>Host and port for Job History Server (default
+
          <value>localhost:10020</value>
0.0.0.0:10020)</description>
+
          <description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
+
        </property>
  
 
9. Create a temporary directory which will be used as base location for DFS.
 
9. Create a temporary directory which will be used as base location for DFS.
 
Now we create the directory and set the required ownerships and permissions:
 
Now we create the directory and set the required ownerships and permissions:
sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
+
 
sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
+
 
sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/
+
        sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
 +
        sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
 +
        sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/
  
 
If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section).
 
If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section).
Line 238: Line 260:
 
Add the following to /usr/local/hadoop/conf/hdfs-site.xml  between<configuration> ... </configuration>
 
Add the following to /usr/local/hadoop/conf/hdfs-site.xml  between<configuration> ... </configuration>
  
  <property>
+
          <property>
    <name>dfs.replication</name>
+
            <name>dfs.replication</name>
    <value>1</value>
+
            <value>1</value>
  </property>
+
          </property>
  <property>
+
          <property>
    <name>dfs.namenode.name.dir</name>
+
            <name>dfs.namenode.name.dir</name>
    <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
+
            <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
  </property>
+
          </property>
  <property>
+
          <property>
    <name>dfs.datanode.data.dir</name>
+
            <name>dfs.datanode.data.dir</name>
    <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
+
            <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
  </property>
+
          </property>
 
+
  
 
11. Format your namenode
 
11. Format your namenode
Line 256: Line 277:
  
 
Format hdfs cluster with below command  
 
Format hdfs cluster with below command  
$ hadoop namenode -format   
+
 
 +
        $ hadoop namenode -format   
 +
 
 
If the format is not working, double check your entries in .bashrc file. The .bashrc updating come into force only if you have opened a new terminal.
 
If the format is not working, double check your entries in .bashrc file. The .bashrc updating come into force only if you have opened a new terminal.
  
Line 262: Line 285:
 
Congratulations, your Hadoop single node cluster is ready to use. Test your cluster by running the following commands.
 
Congratulations, your Hadoop single node cluster is ready to use. Test your cluster by running the following commands.
  
$ start-dfs.sh      --starts NN,SNN,DN  --Type Yes if anything asked for
+
        $ start-dfs.sh      --starts NN,SNN,DN  --Type Yes if anything asked for
$ start-yarn.sh  --starts NodeManager,ResourceManager
+
        $ start-yarn.sh  --starts NodeManager,ResourceManager
 +
 
 +
        $ start-dfs.sh && start-yarn.sh  --In a single line
  
$ start-dfs.sh && start-yarn.sh  --In a single line
 
  
 
Type yes if asked for
 
Type yes if asked for
Line 271: Line 295:
 
13. Start your history-server.  
 
13. Start your history-server.  
 
Some of the component like pig heavily depends on history server
 
Some of the component like pig heavily depends on history server
$mr-jobhistory-daemon.sh start historyserver
+
 
$mr-jobhistory-daemon.sh stop historyserver  --If you want to stop
+
        $mr-jobhistory-daemon.sh start historyserver
 +
        $mr-jobhistory-daemon.sh stop historyserver  --If you want to stop
 +
 
 
14. Check if all the necessary hadoop daemon is running or not
 
14. Check if all the necessary hadoop daemon is running or not
 
$ jps  
 
$ jps  
4912 NameNode
+
 
5361 ResourceManager
+
        4912 NameNode
5780 Jps
+
        5361 ResourceManager
5209 SecondaryNameNode
+
        5780 Jps
5485 NodeManager
+
        5209 SecondaryNameNode
5251 DataNode
+
        5485 NodeManager
3979 JobHistoryServer
+
        5251 DataNode
 +
        3979 JobHistoryServer
  
 
If you see any of the daemon not running, You can visit the log files to figure out the problem. The log files are located at /usr/local/hadoop/logs.
 
If you see any of the daemon not running, You can visit the log files to figure out the problem. The log files are located at /usr/local/hadoop/logs.

Latest revision as of 01:40, 16 May 2017

The how to guide for compiling Hadoop source code needs to be completed before this.


Prerequisite to Hadoop Installation 1. You have installed Ubuntu 16.04 Desktop version in your Virtual Machine 12 2. You have installed Java(jdk 1.8) in your Ubuntu system. JAVA_HOME=/usr/local/java/jdk1.8.0_91 3. Check your hostname is Ubuntu $ hostname --should output Ubuntu Linux Configuration Before Hadoop Installation This document explains the procedure to setup a single node Hadoop cluster on Ubuntu 14.04. You are expected to know basic UNIX commands and VI editor commands. If you are not familiar with UNIX and VI commands, it’s recommended that you brush up your UNIX basics before proceeding. You need perform the steps (execute the commands) marked only in this color. We will setup single node Hadoop cluster using a dedicated Hadoop user 'hduser' 1. Login as Root $ sudo su

  1. whoami --should give root

2. Adding a dedicated Hadoop system user called hduser We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc).

3. Create a Group called hadoop

  1. sudo addgroup hadoop

4. Create an User hduser

  1. sudo adduser hduser
             It will ask you to enter password 2 times followed by some details, just press enter and Yes
             We have given password hadoop

5. Add hduser to hadoop group

  1. sudo adduser hduser hadoop

One line command for 4 & 5

  1. sudo adduser --ingroup hadoop hduser

6. Add the ‘hduser’ to sudoers list so that hduser can do admin tasks. $ sudo visudo Add a line under ##Allow member of group sudo to execute any command anywhere in the format. (Right click and Paste) hduser ALL=(ALL) ALL Press ctrl+x, Y enter enter This will add the user hduser and the group hadoop to your local machine.

7. Logout Your System and login again as hduser. 8. Change your resolution to 1440X900 (If needed again) 9. Open Terminal (ctrl+Alt+T) and change the font if needed. Maximize the Terminal. From Menu Bar go to Terminal->Preferences->Profiles->Edit->Check custom font->Click on Font->Increase it to 16-> Press Select ->Close->Close


10. Configuring SSH Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section. I assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. If not, there are several guides available. First, we have to generate an SSH key for the hduser user. #Install ssh server on your computer hduser@ubuntu:~$ sudo apt-get install openssh-server Enter Password(hadoop) and Y to continue.

If this did not work, then install openssh-server using Ubuntu Software center by searching for openssh-server.

11. Generate SSH for communication hduser@ubuntu:~$ ssh-keygen Just press Enter for what ever is asked. Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa):

Created directory '/home/hduser/.ssh'. 

Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.

The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2hduser@localhost The key's randomart image is:
[...snipp...] 

hduser@ubuntu:~$

The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information).

12. Copy Public Key to Authorized_key file & edit the permission

  1. now copy the public key to the authorized_keys file, so that ssh should not require passwords every time

hduser@ubuntu:~$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  1. Change permissions of the authorized_keys fie to have all permissions for hduser

hduser@ubuntu:~$chmod 700 ~/.ssh/authorized_keys


13. Start SSH If ssh is not running, then run it by giving the below command hduser@ubuntu:~$ sudo /etc/init.d/ssh restart Enter your Password(hadoop)

14. Test Your SSH Connectivity hduser@ubuntu:~$ ssh localhost Type 'Yes', when asked for. You should be able to connect without password. If you are asked to enter password here, then something went wrong. Please check your steps.

15. Disable IPV6 Hadoop and IPV6 do not agree on the meaning of 0.0.0.0 address, thus it is advisable to disable IPV6 adding the following lines at the end of /etc/sysctl.conf hduser@ubuntu:~$ sudo vim /etc/sysctl.conf Enter Your Password: hadoop

  1. disable ipv6

net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 16. Check if IPv6 is disabled. After a system reboot the output of hduser@ubuntu:~$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6 should be 1, meaning that IPV6 is actually disabled. Without reboot it would be showing you 0. Hadoop Installation

1. Download Hadoop For this tutorial, I am using hadoop- 2.7.3.tar.gz, but it should work with any other version. Download hadoop-2.7.3.tar.gz and save it to hduser/Desktop.

2. move the zip file to /usr/local/

Use Terminal(Ctrl+Alt+T)

$ sudo mv ~/Desktop/hadoop-2.7.3.tar.gz /usr/local/ Enter password: hadoop $ cd /usr/local

  sudo tar -xvf hadoop-2.7.3.tar.gz
  sudo rm hadoop-2.7.3.tar.gz
  sudo ln -s hadoop-2.7.3 hadoop
  sudo chown -R hduser:hadoop hadoop-2.7.3
  sudo chmod 777 hadoop-2.7.3

3. Edit hadoop-env.sh and configure Java. Add the following to /usr/local/hadoop/etc/hadoop/hadoop-env.sh by removing export JAVA_HOME=${JAVA_HOME} $ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true export HADOOP_HOME_WARN_SUPPRESS="TRUE" export JAVA_HOME=/usr/local/java/jdk1.8.0_91 First Export is to disable ipv6

Please Note: In hadoop 2.6,the location is /usr/local/hadoop/conf/hadoop-env.sh. But in 2.7 there is no conf folder.

4. Update $HOME/.bashrc Add the following lines to the end of the $HOME/.bashrc file of user hduser. If you use a shell other than bash, you should of course update its appropriate configuration files instead of .bashrc. $ vim ~/.bashrc

  1. type  :$ to go to the last line and then press I to switch to Insert mode
       
        # Set Hadoop-related environment variables 
        export HADOOP_HOME=/usr/local/hadoop 
        export HADOOP_PREFIX=/usr/local/hadoop
        export HADOOP_MAPRED_HOME=${HADOOP_HOME}
        export HADOOP_COMMON_HOME=${HADOOP_HOME}
        export HADOOP_HDFS_HOME=${HADOOP_HOME}
        export HADOOP_YARN_HOME=${HADOOP_HOME}
        export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

	# Native Path
	export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
	export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

        # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
        export JAVA_HOME=/usr/local/java/jdk1.8.0_91  
        # Some convenient aliases and functions for running Hadoop-related commands 

        unaliasfs&> /dev/null 
        aliasfs="hadoop fs" 
        unaliashls&> /dev/null 
        aliashls="fs -ls"  
        
     


        # If you have LZO compression enabled in your Hadoop cluster and  
        # compress job outputs with LZOP (not covered in this tutorial): 
        # Conveniently inspect an LZOP compressed file from the command 
        # line; run via: 
        # 
        # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo 
        #
        # Requires installed 'lzop' command. 
        # lzohead () { hadoopfs -cat $1 | lzop -dc | head -1000 | less }  
        # Add Hadoop bin/ directory to PATH 
        export PATH=$PATH:$HADOOP_HOME/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin 
        

You need to close the terminal and open a new terminal to have the bash changes into effect. The shortcut to open the terminal is (Ctrl+Atl+t).


5. Update yarn-site.xml $ sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml


Add the following snippets between the <configuration> ... </configuration> tags

<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>

 	</property>
 	<property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 	</property>


6. Update core-site.xml file $ sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml Add the following snippets between the <configuration> ... </configuration> tags

       
	<property> 
	        <name>hadoop.tmp.dir</name> 
	        <value>/app/hadoop/tmp</value> 
	        <description>A base for other temporary directories.</description> 	
	</property>  
	
        <property>
	        <name>fs.default.name</name> 				
	        <value>hdfs://localhost:9000</value> 
	        <description>The name of the default file system.  
		A URI whose scheme and authority determine the FileSystem 
		implementation.  The uri's scheme determines the config 
		property (fs.SCHEME.impl) naming theFileSystem 
		implementation class. The uri's authority is used to determine 
		the host, port, etc. for a filesystem.
	        </description> 
	</property>
        


Note: In hadoop 2.6 location is /usr/local/hadoop/etc/hadoop/yarn-site.xml

7. Create the above temp folder and give appropriate permission


       sudo mkdir -p /app/hadoop/tmp
       sudo chown hduser:hadoop -R /app/hadoop/tmp
       sudo chmod 750 /app/hadoop/tmp


8. Create mapred-site.xml file from mapred-site.xml.template

       $ sudo cp  /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml


Add the following to /usr/local/hadoop/etc/hadoop/mapred-site.xml between<configuration> ... </configuration> $ sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml

        <property>
           	<name>mapreduce.framework.name</name>
   	        <value>yarn</value>
        </property>
        <property>
   	        <name>mapreduce.jobhistory.address</name>
 	        <value>localhost:10020</value>
 	        <description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
        </property>

9. Create a temporary directory which will be used as base location for DFS. Now we create the directory and set the required ownerships and permissions:


       sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode
       sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode
       sudo chown hduser:hadoop -R /usr/local/hadoop_tmp/

If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section). 10. Update hdfs-site.xml file $ sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml Add the following to /usr/local/hadoop/conf/hdfs-site.xml between<configuration> ... </configuration>

         <property>
           <name>dfs.replication</name>
           <value>1</value>
         </property>
         <property>
           <name>dfs.namenode.name.dir</name>
           <value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
         </property>
         <property>
           <name>dfs.datanode.data.dir</name>
           <value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
         </property>

11. Format your namenode Open a new Terminal as the hadoop command will not work

Format hdfs cluster with below command

       $ hadoop namenode -format  

If the format is not working, double check your entries in .bashrc file. The .bashrc updating come into force only if you have opened a new terminal.

12. Starting your single-node cluster Congratulations, your Hadoop single node cluster is ready to use. Test your cluster by running the following commands.

       $ start-dfs.sh       --starts NN,SNN,DN  --Type Yes if anything asked for
       $ start-yarn.sh   --starts NodeManager,ResourceManager
       $ start-dfs.sh && start-yarn.sh  --In a single line


Type yes if asked for

13. Start your history-server. Some of the component like pig heavily depends on history server

       $mr-jobhistory-daemon.sh start historyserver
       $mr-jobhistory-daemon.sh stop historyserver  --If you want to stop

14. Check if all the necessary hadoop daemon is running or not $ jps

       4912 NameNode
       5361 ResourceManager
       5780 Jps
       5209 SecondaryNameNode
       5485 NodeManager
       5251 DataNode
       3979 JobHistoryServer

If you see any of the daemon not running, You can visit the log files to figure out the problem. The log files are located at /usr/local/hadoop/logs. E.g; If you don’t see data node running, then you should look into /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log and it should help you to debug the problem.

15. Check if home folder is created or not in hdfs $ hadoop fs -ls 16/06/23 13:47:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ls: `.': No such file or directory If You get the above error: that means Your hadoop home directory was not created successfully. Type the below command $ hadoop fs -mkdir -p /user/hduser (Deprecated) $ hdfs dfs -mkdir -p /user/hduser (Use this) Now you should not get error with below command. For the first time you will not get any output as the hdfs home folder is empty. $ hdfs dfs -ls

16. Check if the hadoop is accessible through browser by hitting the below URLs. NameNode http://localhost:50070 ResourceManager http://localhost:8088 MapReduce JobHistory Server http://localhost:19888 19888 is the http port of JobHistoryServer, where as 10020 is the service port which we had configured in step-8 That is all for this tutorial, you may continue with next article in the series “Setup Multi Node Hadoop Cluster on Ubuntu”.

Common Errors: 1. Error in datanode.log $ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clu sterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c

Solution: 1.Stop your cluster & issue the below command & then start your cluster again. sudo rm -rf /usr/local/hadoop_tmp/hdfs/datanode/*



This guide was obtained by Suraz Ghimire