Install Hadoop stack on OSX
SSH Setup and Key Generation
SSH setup is required to do different operations on a cluster such as starting, stopping, distributed daemon shell operations. To authenticate different users of Hadoop, it is required to provide public/private key pair for a Hadoop user and share it with different users.
The following commands are used for generating a key value pair using SSH. Copy the public keys form id_rsa.pub to authorized_keys, and provide the owner with read and write permissions to authorized_keys file respectively.
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Install HomeBrew
Paste the following command at the terminal:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Perform update all HomeBrew recipes
brew update
Install Hadoop
brew install hadoop
Hadoop will be install in the following directory ( x.x.x is the Hadoop version)
/usr/local/Cellar/hadoop/x.x.x
Configuring Hadoop in Pseudo Distributed Mode
#### Edit core-site.xml The core-site.xml file contains information such as the port number used for Hadoop instance, memory allocated for the file system, memory limit for storing the data, and size of Read/Write buffers.
The file can be located at /usr/local/Cellar/hadoop/x.x.x/libexec/etc/hadoop/core-site.xml. Open the core-site.xml and add the following properties in between <configuration> and </configuration> tags.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Edit hdfs-site.xml
The hdfs-site.xml file contains information such as the value of replication data, namenode path, and datanode paths of your local file systems. It means the place where you want to store the Hadoop infrastructure.
The file can be located at /usr/local/Cellar/hadoop/x.x.x/libexec/etc/hadoop/hdfs-site.xml. Open the core-site.xml and add the following properties in between <configuration> and </configuration> tags.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///usr/local/Cellar/hadoop/hdfs/namenode </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///usr/local/Cellar/hadoop/hdfs/datanode </value>
</property>
</configuration>
Edit yarn-site.xml
This file is used to configure yarn into Hadoop. It can be located at /usr/local/Cellar/hadoop/x.x.x/libexec/etc/hadoop/yarn-site.xml , open and add the following properties in between the <configuration> and </configuration> tags in this file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Edit mapred-site.xml
This file is used to specify which MapReduce framework we are using. By default, Hadoop contains a template of yarn-site.xml. First of all, it is required to copy the file from mapred-site.xml.template to mapred-site.xml file using the following command.
cp mapred-site.xml.template mapred-site.xml
It can be located at /usr/local/Cellar/hadoop/x.x.x/libexec/etc/hadoop/mapred-site.xml.template. After copied, Open mapred-site.xml file and add the following properties in between the <configuration> and </configuration> tags in this file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Verifying Hadoop Installation
The following steps are used to verify the Hadoop installation. Go to:
/usr/local/Cellar/hadoop/x.x.x/bin
Step 1: Name Node Setup
Set up the namenode using the command “hdfs namenode -format” as follows.
hdfs namenode -format
Step 2: Verifying Hadoop dfs
The following command is used to start dfs. Executing this command will start your Hadoop file system.
start-dfs.sh
Step 3: Verifying Yarn Script
The following command is used to start the yarn script. Executing this command will start your yarn daemons.
start-yarn.sh
Step 4: Accessing Hadoop on Browser
The default port number to access Hadoop is 50070. Use the following url to get Hadoop services on browser.
http://localhost:50070/
Step 5: Verify All Applications for Cluster
The default port number to access all applications of cluster is 8088. Use the following url to visit this service
http://localhost:8088/
Alias
To simplify life edit your ~/.profile using vim or your favorite editor and add the following two commands
alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"
and execute
source ~/.profile
Next time, we can run Hadoop just by typing
hstart
and stop using
hstop