Compile Hadoop Source Code

From Lofaro Lab Wiki
Jump to: navigation, search

This was done on a Ubuntu 14.04 laptop, and did not work when done in a virtual machine.

Download Hadoop 2.7.3 source from http://hadoop.apache.org/releases.html

Read the included BUILDING file which lists the dependencies.

Begin with the following commands:

 $ sudo apt-get purge openjdk*
 $ sudo apt-get install software-properties-common
 $ sudo add-apt-repository ppa:webupd8team/java
 $ sudo apt-get update
 $ sudo apt-get install oracle-java7-installer

After this, you must install Apache Maven and Protocol Buffers. Download Maven 3.5.0 binaries from here: https://maven.apache.org/download.cgi

Installing Maven (From README file)

1) Unpack the archive where you would like to store the binaries, e.g.:

   Unix-based operating systems (Linux, Solaris and Mac OS X)
     tar zxvf apache-maven-3.x.y.tar.gz
   Windows
     unzip apache-maven-3.x.y.zip

2) A directory called "apache-maven-3.x.y" will be created.

3) Add the bin directory to your PATH, e.g.:

   Unix-based operating systems (Linux, Solaris and Mac OS X)
     export PATH=/usr/local/apache-maven-3.x.y/bin:$PATH
   Windows
     set PATH="c:\program files\apache-maven-3.x.y\bin";%PATH%

4) Make sure JAVA_HOME is set to the location of your JDK

5) Run "mvn --version" to verify that it is correctly installed.

Next, download Google Protocol Buffers 2.5.0 source code from here: https://github.com/google/protobuf/releases?after=v2.6.1 Follow the instructions in the INSTALL file.

Install other dependencies:

 $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev


Finally, use Maven to build Hadoop

mvn clean install -DskipTests
mvn package -Pdist -Pdoc -Psrc -Dtar -DskipTests

code is in ~/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3


This step needs to be done before installing Hadoop. On the github in the sources section, the files are located in apache-maven-3.3.9-bin.tar.gz and protobuf-2.5.0.tar.gz