CCA131 – Configure NameNode HA

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series

HDFS High Availability Overview

A single NameNode is a single point of failure in a Hadoop cluster. You can experience HDFS downtime from an unexpected NameNode Crash or planned maintenance of NameNode. Having a NameNode high availability setup avoids these single points of failures.

HDFS High Availability uses a pair of NameNodes

One Active and one Standby
Clients only contact the Active NameNode
DataNodes heartbeat into both NameNodes
Active NameNode writes its metadata to a quorum of JournalNodes
Standby NameNode reads from the JournalNodes to remain in sync with the Active NameNode

NameNode HA architecture in Hadoop CCA131

Active NameNode writes edits to the JournalNodes

– Software to do this is the Quorum Journal Manager (QJM) which is built into the NameNode
– Quorum Journal Manager waits for a success acknowledgment from the majority of JournalNodes. Majority commit means a single crashed or lagging JournalNode will not impact NameNode latency
– Uses a simple algorithm to ensure reliability even if edits are being written as a JournalNode fails.

Note: There is no Secondary NameNode when implementing HDFS High Availability. The Standby NameNode periodically performs the checkpointing.

Failover

At any given point in time, only one NameNode can be active. The other NameNode(s) act as standby NameNode(s). The standby NameNode maintains a copy of active NameNode’s state so that it can take over when the active NameNode goes down. There are 2 types of failovers:
1. Manual (detected and initiated by a user)
2. Automatic (detected and initiated by HDFS itself)

Automatic failover is controlled by the Apache Zookeeper. Apache Zookeeper is an opensource project and is a coordination service system also used by HBase. Zookeeper is one of the components in the CDH cluster. A daemon called the “ZooKeeper Failover Controller (ZKFC)” runs on each NameNode machine. Zookeeper needs a quorum of nodes for taking the decision of failover. Typical installations use 3 or 5 nodes of ZooKeeper. The ZooKeeper daemon has a low resource usage and can be installed alongside existing master daemons.

Enabling NameNode HA using Cloudera Manager

Enabling NameNode HA using Cloudera Manager is pretty easy stuff. Follow the steps given below:

1. Goto Cloudera Manager > HDFS. Select “Enable High Availability” option under the “Actions” drop-down.

Enable High Availability for NameNode using Cloudera Manager

This will start the NameNode High Availability wizard which will guide you through all the steps.

2. Enabling High Availability creates a new nameservice. Accept the default name nameservice1 or provide another name in Nameservice Name (geeklab).

define nameservice for NameNode High Availability CCA131

3. On the next screen, we need to assign roles for the standby NameNode and the JournalNodes.

Assign Roles NameNode HA CCA131 exam

We will select node04 as our standby NameNode and node01, node02, node03 as our JournalNodes.

select standby namenode for NameNode HA CCA131

select 3 JournalNodes for NameNode HA configuration using Cloudera Manager CCA 131

4. Define the JournalNode Edits directory on next screen. This is a directory on the local filesystem where NameNode edits are written.

define JournalNodes Edits directory CCA 131

There are some extra options which by default are selected. These are basically for clearing any existing data on standby NameNode directories and JournalNode edits directories. We will keep the settings as default here.

extra option NameNode HA configuration CCA131

5. Cloudera Manager will start configuring the NameNode High Availability on the next screen. You will get below error if you have some data on the current NameNode HDFS. This is expected to fail and can be ignored as we have data on our setup.

NameNode HA configuration error while formatting Current NameNode CCA131

6. That’s it. The NameNode high availability configuration is completed and you should see a congratulatory message.

NameNode HA configuration completion message CCA 131

To verify the High Availability, goto Cloudera Manager > HDFS > Instances. Here we can see 3 JournalNodes, 2 Zookeeper Failover Controllers and a Active NameNode as well as a Standby NameNode.

verify NameNode High Availability Configuration CCA 131

Perform HA testing on NameNode

We can perform a manual HA testing on the NameNode from the Cloudera Manager to verify the NameNode HA we just configured. We will manually stop the Active NameNode and see if the Standby NameNode take up the role of Active NameNode. Follow the steps outlined below to perform the HA testing:

1. Goto Cloudera Manager > HDFS > Instances. Select the active NameNode from the list of instances and select “stop” from the “Actions for selected” drop-down.

Namaode HA testing from Cloudera Manager CCA131

2. Cloudera Manager will go ahead and stop the Active NameNode as shown in the screenshot below.

stopping the active NameNode - HA testing CCA131

3. You should now see a warning in the Heath Tests section about having only one active NameNode as we do not have a standby NameNode working as of now.

verify NameNode Health after HA testing of NameNode CCA 131

4. Goto the instances page again in the HDFS service to verify the switchover of active NameNode from master to node04 host.

verify the Active NameNode after HA testing CCA 131 exam objectives

Now if you start the NameNode services on master host back again, it will not take up the Active NameNode role. Instead, it will keep acting as standby NameNode.

Disable NameNode High Availability

In case you want to disable the NameNode HA configuration follow the steps outlined below.

1. Goto Cloudera Manager > HDFS and select “Disable High Availability” from the “Actions” drop-down.

Disable High Availability of NameNode using Cloudera Manager CCA 131

2. On the next screen, we need to select the NameNode out of the 2 available NameNodes. Also instead of Standby NameNode, we have to choose a secondary NameNode.

select NameNode Host and Secondary NameNode Host CCA 131

Select secondary NameNode CCA 131

3. On the next screen, review the changes suggested by the Cloudera Manager and continue.

disable NameNode HA - review changes CCA 131

4. In the final step, Cloudera Manager disables the NameNode HA. The following screenshot is a partial screenshot of the successful final step:

disable NameNode HA final steps CCA 131