I was actually opposed to the MyLearning one because the first thing it recommended was Oracle Java 7 instead of OpenJDK 7, but I had some issues with OpenJDK 7 when trying this out so I had to go with Oracle.
Install Java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install oracle-java7-installer
Create Hadoop user
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
Where hduser is the Hadoop user you want to have.
Configuring SSH
su - hduser
ssh-keygen -t rsa -P ""
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
To be sure that SSH installation went well, you can open a new terminal and try to create ssh session using hduser
by the following command:
ssh localhost
reinstall ssh if localhost does not connect (you may need to add hduser
to sudo as below step)
sudo apt-get install openssh-server
Edit Sudoers
sudo visudo
Add at the end the line to add hduser into sudoers
hduser ALL=(ALL:ALL) ALL
To save press CTRL+X, type Y and press ENTER
Disable IPv6
sudo gedit /etc/sysctl.conf
or
sudo nano /etc/sysctl.conf
Copy the following lines at the end of the file:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
If you face a problem telling you, you don't have permissions, just run the previous command with the root account (In case sudo is not enough. For me it was)
Now reboot.
You can also do sudo sysctl -p
but I rather reboot.
After rebooting, check to make sure IPv6 is off:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
it should say 1. If it says 0, you missed something.
Installing Hadoop
There are several ways of doing this, the one the Guide suggests is to download from the Apache Hadoop site and decompress the file in your hduser
home folder. Rename the extracted folder to hadoop
.
The other way is to use a PPA that was tested for 12.04:
sudo add-apt-repository ppa:hadoop-ubuntu/stable
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install hadoop
NOTE: The PPA may work for some and for others will not. The one I tried was to download from the official site because I did not know about the PPA.
Update $HOME/.bashrc
You will need to update the .bashrc
for hduser
(and for every user you need to administer Hadoop). To open .bashrc
file, you will need to open it as root:
sudo gedit /home/hduser/.bashrc
or
sudo nano /home/hduser/.bashrc
Then you will add the following configurations at the end of .bashrc
file
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)`
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
Now, if you have OpenJDK7, it would look something like this:
export JAVA_HOME=/usr/lib/java-7-openjdk-amd64
The thing to watch out for in here is the folder where the Java resides with the AMD64 version. If the above does not work, you can try looking in that particular folder or setting the Java that will be in used with:
sudo update-alternatives --config java
Now for some helpful alias:
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
Configuring Hadoop
The following are configuration files we can use to do the proper configuration. Some of the files you will be using with Hadoop are (More information in this site):
start-dfs.sh
- Starts the Hadoop DFS daemons, the namenode and datanodes. Use this before start-mapred.sh
stop-dfs.sh
- Stops the Hadoop DFS daemons.
start-mapred.sh
- Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
stop-mapred.sh
- Stops the Hadoop Map/Reduce daemons.
start-all.sh
- Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers. Deprecated; use start-dfs.sh then start-mapred.sh
stop-all.sh
- Stops all Hadoop daemons. Deprecated; use stop-mapred.sh then stop-dfs.sh
But before we start using them, we need to modify several files in the /conf
folder.
hadoop-env.sh
Look for the file hadoop-env.sh
, we need to only update the JAVA_HOME variable in this file:
sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh
or
sudo nano /home/hduser/hadoop/conf/hadoop-env.sh
or in the latest versions it will be in
sudo nano /etc/hadoop/conf.empty/hadoop-env.sh
or
sudo nano /etc/hadoop/hadoop-env.sh
Then change the following line:
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
Note: if you get Error: JAVA_HOME is not set
Error while starting the services, you forgot to uncomment the previous line (just remove #).
core-site.xml
Now we need to create a temp directory for Hadoop framework. If you need this environment for testing or a quick prototype (e.g. develop simple hadoop programs for your personal test ...), I suggest to create this folder under /home/hduser/
directory, otherwise, you should create this folder in a shared place under shared folder (like /usr/local ...) but you may face some security issues. But to overcome the exceptions that may caused by security (like java.io.IOException), I have created the tmp folder under hduser space.
To create this folder, type the following command:
sudo mkdir /home/hduser/tmp
Please note that if you want to make another admin user (e.g. hduser2 in hadoop group), you should grant him a read and write permission on this folder using the following commands:
sudo chown hduser:hadoop /home/hduser/tmp
sudo chmod 755 /home/hduser/tmp
Now, we can open hadoop/conf/core-site.xml
to edit the hadoop.tmp.dir entry.
We can open the core-site.xml using text editor:
sudo gedit /home/hduser/etc/hadoop/core-site.xml
or
nano /home/hduser/etc/hadoop/core-site.xml
Then add the following configurations between <configure>
xml elements:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
Now edit mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
Now edit hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
Formatting NameNode
Now you can start working on the Node. First format:
~/hduser/hadoop/bin/hadoop namenode -format
or
./home/hduser/hadoop/bin/hadoop namenode -format
You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.
Starting Hadoop Cluster
You will need to navigate to hadoop/bin directory and run the ./start-all.sh
script.
cd ~/hduser/hadoop/bin/
start-all.sh
If you have a different version from the one shown in the guides (Which you will most likely have if doing this with the PPA or a newer version) then try it this way:
cd ~/hduser/hadoop/bin/
start-dfs.sh
start-mapred.sh
This will start a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.
Checking if Hadoop is running
There is a nice tool called jps
. You can use it to ensure that all the services are up. In your hadoop bin folder type:
jps
It should show you all Hadoop related processes.
NOTE: Since this was done around 6 months ago for me, if there is any part not working let me know.
Up to now you have a running Hadoop. There are many more things you can do that can be found in the link provided or in the official Juju Charm for Hadoop
Best Answer
Run following command on terminal to find it :
Replace hadoop/ with your hadoop folder name