程序员达达

Install Hadoop on Ubuntu

Sofware version:
Ubuntu: 12.04
Hadoop: 1.1.2

Prerequisites

Sun Java 6
The instruction on Michael Noll’s webpage is not working for me. Then I follow this post and works!

$ sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu hardy main multiverse"
$ sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu hardy-updates main multiverse"
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
$ sudo add-apt-repository "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main"
$ sudo apt-get update 
$ sudo apt-get install sun-java5-jdk sun-java6-jdk oracle-java7-installer

The JDK will be placed in /usr/lib/jvm/java-6-sun"
Try to check the Sun’s JDK installation:

user@ubuntu:~$ java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Add a dedicated Hadoop system user:

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

Configure passwordless SSH
See the instructions here.

Disableing IPv6
Edit “/ect/sysctl.conf” and add the following lines:

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Hadoop

Installation
To install hadoop, you don’t need to compile it. Just extract all the files from tar.gz and put it into right place.

$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

Update $HOME/.bashrc
Add following lines:

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
 
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
 
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
 
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
 
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

You can also replace the $HADOOP_HOME by $HOME_PREFIX

Configuration


There are several files you should modify: hadoop-env.sh, conf/core-site.xml, conf/mapred-site.xml and conf/hdfs-site.xml

hadoop-env.sh
Change

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

To

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

For conf/*-site.xml, add between ... .

In file conf/core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>
 
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

In file conf/mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

In file conf/hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

Now you can play with your single node hadoop cluster!

Reference:
Michael Noll’s tutorial: Single Node
Michael Noll’s tutorial: Multiple Node

Comments are closed.