• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

HDPCA Exam Objective – Configure the Capacity Scheduler

by admin

Note: This is post is part of the HDPCA exam objective series

YARN Schedulers

The Hadoop YARN scheduler is responsible for assigning resources to the applications submitted by users. There are 3 types of schedulers in YARN.

  1. First in First out (FIFO) (Hadoop 1.x)
  2. Fair scheduler
  3. Capacity scheduler

First in First out (FIFO)

By default, YARN supports a First in First out (FIFO) scheduler, which executes jobs in the same order as they arrive using a queue of jobs. However, FIFO scheduling might not be the best option for large multi-user Hadoop deployments.

Fair scheduler

The Fair scheduler allows all jobs to receive an equal share of resources. The resources are assigned to newly submitted jobs as and when the resources become available until all submitted and running jobs have the same amount of resources.

Capacity scheduler

The Capacity scheduler allows a large cluster to be shared across multiple organizational entities while ensuring guaranteed capacity for each entity and that no single user or job holds all the resources. In order to achieve this, the Capacity scheduler defines queues and queue hierarchies, with each queue having a guaranteed capacity. The Capacity scheduler allows the jobs to use the excess resources (if any) from the other queues.

Note: For the HDPCA exam, we have to concentrate only on the configuration of capacity scheduler. We will not cover the other 2 schedulers. The FIFO scheduler is anyways never used in production environments. Also, HDPCA exam does not expect us to configure actual queues, only enabling capacity scheduler is expected.

Enabling the Capacity Scheduler (Command Line)

1. To enable the capacity scheduler, make sure you have the following property set in the yarn configuration file /etc/hadoop/conf/yarn-site.xml on the ResourceManager Host:

# vi /etc/hadoop/conf/yarn-site.xml

Property: yarn.resourcemanager.scheduler.class
Value: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

setting capacity scheduler property in hadoop HDPCA

2. Switch to the user “yarn” and run the below command which refreshes the current queues.

$ yarn rmadmin -refreshQueues
18/07/22 09:04:22 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
18/07/22 09:04:23 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]

Enabling the Capacity Scheduler (With Ambari)

1. To enable Capacacity scheduler using ambari, goto services > YARN > Configs. Search for the property yarn.resourcemanager.scheduler.class in the filter box. As shown below, currently Fair Share scheduler is set as the default scheduler.

verify current scheduler type - HDPCA exam objective

2. Modify the scheduler property to have the value org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler and click save to save the config.

modify scheduler org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler in ambari

3. Provide an appropriate description while saving the config.

Configure the Capacity Scheduler in ambari - HDPCA exam

4. We will have to restart the YARN service for the changes to take effect.

restart YARN service after changing scheduler type to capacity in ambari

Verify

You can verify the scheduler after restarting the YARN service. Search for the property “yarn.resourcemanager.scheduler.class” in the filter box. As shown below the scheduler type is now Capacity Scheduler.

verify the capacity scheduler type in ambari HDPCA exam

You can also verify the scheduler type in the yarn configuration file /etc/hadoop/conf/yarn-site.xml.

# cat /etc/hadoop/conf/yarn-site.xml
How to configure Capacity Scheduler Queues Using YARN Queue Manager

Filed Under: Hadoop, HDPCA, Hortonworks HDP

Some more articles you might also be interested in …

  1. HDPCA Exam Objective – View an application’s log file (Troubleshoot a failed job)
  2. HDPCA Practice Exam Questions and AWS Instance Setup Details
  3. HDPCA Exam Objective – Create a home directory for a user and configure permissions
  4. CCA 131 – Install Cloudera Manager server and agents
  5. HDPCA Exam Objective – Add a new node to an existing cluster
  6. HDPCA Exam Objective – Add an HDP service to a cluster using Ambari
  7. CCA131 – Create an HDFS user’s home directory
  8. CCA 131 – Add a service using Cloudera Manager
  9. HDPCA Exam Objective – Install HDP using the Ambari install wizard
  10. CCA 131 – Set up a local CDH repository

You May Also Like

Primary Sidebar

Recent Posts

  • ncat Command Examples in Linux
  • ncat: command not found
  • nautilus Command Examples in Linux
  • namei: command not found

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright