CCA 131 – Set up a local CDH repository

Note: This post is part of the CCA Administrator Exam (CCA131) objectives series

CDH can be installed using one of the following two methods:

  1. Installation using the operating system’s package manager (e.g. yum)
  2. Installation using Cloudera Manager
Note: Make sure you have followed the post “Perform OS-level configuration for Hadoop installation” before you proceed to configure the local CDH repository. This is necessary as we need to perform some task like disabling firewalld and SELinux before configuring repository.

In this post and the posts after this, we will see how to set up the local CDH repo for the Cloudera Manager. Later on, we can install the CDH using the Cloudera Manager. We will be having a setup as shown below for our Test Lab.

Before we start installing anything, we need to first set up the local CDH repository on the “master” node. If you have access to the internet from the CDH cluster, you do not have to perform all these steps. But in almost all production environments, you will not have direct access to the internet from the CDH cluster. All the nodes in our cluster are running on CentOS 7. Make sure you have the below entries in the /etc/hosts file for all the nodes in the cluster fully resolvable FQDN.

# cat /etc/hosts
192.168.1.10    master.localdomain
192.168.1.11    node01.localdomain
192.168.1.12    node02.localdomain
192.168.1.13    node03.localdomain

In order to host a local CDH repository we need to follow the below steps on the master node:

  1. Configure the webserver to host CDH local repository
  2. Download the Cloudera Manager(CM) repo tarball
  3. Create a yum repo file

1. Configure Web server to host CDH repository

We need to have a web server up and running in order to host the CDH repository.

1. Install and setup apache server.

# yum install httpd

2. You can either store the rpms in the default document root for httpd i.e. /var/www/html or you can create a soft link between the document root and the repository directory. For the purpose of this post, we are using the same document root directory “/var/www/html” as our repository location.

3. Start the httpd service and also enable it to start automatically upon boot.

# systemctl start httpd
# systemctl enable httpd

4. We also need to disable the default welcome page which opens up everytime you open the master server URL. To disable the default welcome page, first rename it to something else and then restart the webserver service:

# mv /etc/httpd/conf.d/welcome.conf /etc/httpd/conf.d/welcome.backup
# systemctl restart httpd

2. Download the Cloudera Manager (CM) repo tarball

1. Next step is to download the Cloudera Manager repo as tarball. Use the below command to download the latest version (5.9.3) as of writing this post. Please note, I will be downloading the tarball for CentOS 7 version.

# wget http://archive.cloudera.com/cm5/repo-as-tarball/5.9.3/cm5.9.3-centos7.tar.gz

2. Unzip and untar the CM repo tarball to the Document root of the apache i.e. /var/www/html.

# tar zxvf cm5.9.3-centos7.tar.gz -C /var/www/html

3. Open a browser and verify that you can view the RPMs under the URL http://master.localdomain/cm/5.9.3/RPMS/x86_64 as shown below:

This confirms that the web server works as expected and we can proceed with the further configuration of yum repository.

3. Create a yum repo file

1. Next step is to create a yum repo file for the CM repository. Create a new file /etc/yum.repos.d/cloudera-maner.repo and add the below lines.

# cat /etc/yum.repos.d/cloudera-manager.repo
[cloudera_manager]
name = Cloudera Manager
baseurl = http://master.localdomain/cm/5.9.3
gpgcheck = 0

Here,
baseurl is the location where we have hosted the CM repo tarball.

2. Let’s do a “yum cleanall” and download and make usable all the metadata for the currently enabled yum repositories, using the below commands:

# yum cleanall
# yum makecache

3. You can verify the cloudera manager repo using the “yum repolist” command.

# yum repolist

4. Follow steps 1 through 3 on all the cluster nodes (node01, node02, node03) and verify if you can view the new repository in the “yum repolist” command.

We are now ready to begin the actual installation of Cloudera Manager. We will see how to install the Cloudera manager in another post.

Related Post