HDPCA Exam Objective – Define and deploy a rack topology script

Note: This is post is part of the HDPCA exam objective series

What is Rack Awareness

To make sure that there is no single point of failure across the entire Hadoop infrastructure, and to ensure that the contention of resources is in a distributed manner, rack awareness plays an important role. Rack awareness is a concept in which Namenode is made aware of the layout of servers in a cluster, thus making intelligent decisions on block placement.

Note: The steps given in the Hortonworks Documentation seems outdated. With new version we do not actually configure a shell script, but use a default python script which is already present. The python script /etc/hadoop/conf/topology_script.py along with the topology mappings file /etc/hadoop/conf/topology_mappings.data is used to modify the Rack awareness settings in HDP cluster.

1. Change Rack Topology using commandline

In the exam, unless specified explicitly, do not use the command line method. The changes to Rack topology can be very easily configured using the ambari web UI. Follow the steps given below to change the rack topology using the command line.

For the purpose of this post we will have all the datanodes on serate racks as shown below.

dn1.localdomain    /rack01
dn2.localdomain    /rack02
dn3.localdomain    /rack03

1. First, verify if you have the topology python script and the topology data file present in the /etc/hadoop/conf directory on the namenode.

[root@nn1 ~]# ls -lrt /etc/hadoop/conf/topology*
-rw-r--r--. 1 hdfs hadoop  187 Jul 15 12:39 /etc/hadoop/conf/topology_mappings.data
-rwxr-xr-x. 1 root root   2358 Jul 15 12:39 /etc/hadoop/conf/topology_script.py

2. Also, ensure if the following property is present in the configuration file /etc/hadoop/conf/core-site.xml which defines the location of the topology script:

3. View the current topology configuration in the file /etc/hadoop/conf/topology_mappings.data file.

# cat /etc/hadoop/conf/topology_mappings.data
[network_topology]
dn3.localdomain=/default-rack
192.168.1.5=/default-rack
dn1.localdomain=/default-rack
192.168.1.3=/default-rack
dn2.localdomain=/default-rack
192.168.1.4=/default-rack

4. Modify the topology_mappings.data file to have all the datanodes on different racks. The topology file should only be modified on the NameNode and the ResourceManager. In our case it is nn1.localdomain and dn2.localdomain.

# cat /etc/hadoop/conf/topology_mappings.data
[network_topology]
dn3.localdomain=/rack03
192.168.1.5=/rack03
dn1.localdomain=/rack01
192.168.1.3=/rack01
dn2.localdomain=/rack02
192.168.1.4=/rack02

5. Now restart all the services in the cluster using the ambari UI. This can be done on the hosts tab. Select all the hosts and start all the services.

This will take some time and you may need to manually restart few components if they do not start on their own.

6. After all the components are restarted verify the rack topology under hosts tab in ambari. Below are before and after rack locations of the datanodes.

Before

After

2. Change Rack Topology using Ambari

Changing the rack topology using ambari is a piece of cake and should be used all the times unless specified otherwise.

1. Goto the hosts tab and select the datanode and use the actions drop-down with “set rack” option to define the new rack location for the datanode. You can also select multiple hosts and change the rack location if all the hosts are in the same rack.

The rack location must be set in the format /[location]. For example, we will set the rack location of datanode dn1.localdomain as /rack01.

Similarly, change the rack location of all the datanodes you want.

2. After setting the rack location for the desired datanodes, we have to restart all the components in HDP. To do this select all the hosts and from the actions drop-down, restart all the components.

3. After all the components are restarted verify the rack topology under hosts tab in ambari. Below are before and after rack locations of the datanodes.

Before

After

Related Post