Configure $HADOOP_HOME/etc/hadoop

Configure $HADOOP_HOME/etc/hadoop

cd $HADOOP_HOME/etc/hadoop

Setup slave, place hostname of the slave nodes are, in this case, same hostname because slave is on the same machine.

vi slaves:

localhost
master.hadoop.lan

The first to edit is core-site.xml file. This file contains information about the port number used by Hadoop instance, file system allocated memory, data store memory limit and the size of Read/Write buffers.

$ vi etc/hadoop/core-site.xml

Add the following properties between <configuration> ... </configuration> tags. Use localhost or your machine FQDN, such as hadoop.master.lan for hadoop instance.

<property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop.lan:9000/</value>
</property>

Next open and edit hdfs-site.xml file. The file contains information about the value of replication data, namenode path and datanode path for local file systems.

$ vi etc/hadoop/hdfs-site.xml

Here add the following properties between <configuration> ... </configuration> tags. On this guide we’ll use /mnt/common/hdfs/ directory to store our hadoop file system.

Replace the dfs.data.dir and dfs.name.dir values accordingly.

<property>
<name>dfs.data.dir</name>
<value>file:///mnt/common/hdfs/datanode</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///mnt/common/hdfs/namenode</value>
</property>

Because we’ve specified /mnt/common/hdfs/ as our hadoop file system storage, we need to create those two directories (datanode and namenode) from root account and grant all permissions to hadoop account, or whatever the user name that has installed hadoop by executing the below commands.

su - root
mkdir -p /mnt/common/hdfs/namenode
mkdir -p /mnt/common/hdfs/datanode

In my case, the user that has installed hadoop is bigdata2

chown -R bigdata2:bigdata2 /mnt/common/hdfs/
ls -al /opt/ #Verify permissions
exit

Exit root account to turn back to bigdata2 user

Please replace user bigdata2 with your user name, do NOT copy bigdata2 user name in this slide.

Next, create the mapred-site.xml file to specify that we are using yarn MapReduce framework.

$ vi etc/hadoop/mapred-site.xml

Add the following excerpt to mapred-site.xml file:

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> </configuration>

Now, edit yarn-site.xml file with the below statements enclosed between <configuration> ... </configuration> tags:

$ vi etc/hadoop/yarn-site.xml

Add the following excerpt to yarn-site.xml file:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

You should see something like below:

(base) bigdata2@bigdata2:~/hadoop/hadoop-2.7.7/etc/hadoop$ cat yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

set JAVA home variable for Hadoop environment by editing the below line from hadoop-env.sh file.

$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Edit the following line to point to your Java system path.

export JAVA_HOME=/home/bigdata2/java/jdk1.8.0_202

Last updated