• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

HDPCA Exam Objective – Install and configure Knox

by admin

Note: This is post is part of the HDPCA exam objective series

Knox Basics

Knox Gateway is another Apache project that addresses the concern of secured access to the Hadoop cluster from corporate networks. Knox Gateway provides a single point-to-point of authentication and access for Apache Hadoop services in a cluster. Knox runs as a cluster of servers in the DMZ zone isolating the Hadoop cluster within the corporate network. The key feature of Knox Gateway is that it provides perimeter security for Hadoop REST APIs by limiting the network endpoints required to access a Hadoop cluster. Thus, it hides the internal Hadoop cluster topology from end users. Knox provides a single point for authentication and token verification at the perimeter of the Hadoop cluster.

knox architecture HDPCA

Knox Gateways provides security for multiple Hadoop clusters, with these advantages:

  • Simplifies access: Extends Hadoop’s REST/HTTP services by encapsulating Kerberos to within the Cluster.
  • Enhances security: Exposes Hadoop’s REST/HTTP services without revealing network details, providing SSL out of the box.
  • Centralized control: Enforces REST API security centrally, routing requests to multiple Hadoop clusters.
  • Enterprise integration: Supports LDAP, Active Directory, SSO, SAML and other authentication systems.

In short, the knox works as a proxy by hiding the actual structure of the Hadoop cluster and providing a single point of access to all the services. For example, check the access methods to access webHDFS to get status of the directory /user using Knox gateway and without using Knox gateway.

Using Knox Gateway

# curl -k "http://nn1.localdomain:50070/webhdfs/v1/user?user.name=hdfs&op=GETFILESTATUS"

Without Using Knox gateway

# curl -k -u guest:guest-password "https://dn1.localdomain:8443/gateway/default/webhdfs/v1/user?op=LISTSTATUS"

here,
dn1.local domain – is the knox gateway host.

When I use knox gateway, nobody knows that I am using 50070 port internally. This way I can hide all the ports from the external world. I can also block all the port and just keep the knox gateway port (8443) open to a secure environment.

Knox supports only REST API calls for the following Hadoop services:

  • WebHDFS
  • WebHCat
  • Oozie
  • HBase
  • Hive
  • Yarn

As part of exam, we will install and configure the knox service and verify if we can access WebHDFS with knox gateway.

Install and Configure Knox

To add the Knox service using Ambari, goto services page and select “Add Service” from the “Actions” drop-down in the left sidebar.

install and configure knox HDPCA exam objective

1. Add Service Wizard

Ambari will open up an “Add Service Wizard” to add the Knox service. Select the Knox service from the list of services and click next.

add service wizard select knox service - ambari

2. Knox Gateway

We need to choose the Knox gateway in this step. We will keep the default selection of dn1 host as the Knox Gateway.

set knox gateway host - configure knox in ambari

3. Customize Services

We need to provide the “Knox Master secrete” of our choice on this page. You can modify any configuration settings here if you want. I will keep the settings as default and proceed.

customize knox service in ambari HDPCA

4. Dependent Configurations

Ambari will come up with some recommended changes in the configuration if any of the dependent service configurations requires. For me, it prompts for a recommended change in Ranger configuration. I will accept the recommended value and proceed.

dependent configuration recommendations - knox configuration in ambari

5. Review

You can review the configuration at this stage. No changes can be done after this.

review knox configuration in ambari HDPCA exam

6. Install, Start and Test

Ambari will start installing the Knox service components on the required hosts. Ambari will also start the Knox service components and will test them as well.

install startthe  and test knox service using ambari HDPCA exam objective

If everything works as expected, you should see a successful installation of the Knox service.

successful installation and configuration of knox service using ambari

7. Summary

You can review the summary page of the installation process here.

summary page knox installation and configuration using ambari

8. Restart of Services

As the ambari Dashboard suggests, we need to restart the Ranger service components. We will restart all the affected service components from ambari.

restart of service after installing knox in ambari

Verify the status of service

You can verify the status of the service in the “Services” page in ambari web UI. As shown below, the Knox service page shows that the “Knox Gateway” is started without any alerts.

verify knox service status in ambari - HDPCA exam

Default Knox Gateway port and path

The default gateway port for knox service is 8443 and the default gateway path is “gateway”. These properties are defined in the configuration file /etc/knox/conf/gateway-site.xml on the knox gateway host.

default gateway port and default gateway path

Verify Knox functionality

In production environments, you may have an external LDAP server for authentication. For the certification and testing purpose, we have a small Demo LDAP integrated with the Knox service.

1. Before starting the Demo LDAP, the configuration directory for Knox (/etc/knox/conf) will not have few configuration files present in it, like gateway-site.xml etc. Here is the content of the directory before starting Demo LDAP

$ ls -lrt /etc/knox/conf/
total 28
-rw-r--r--. 1 knox knox 2447 Jul 22 07:23 gateway-log4j.properties
drwxr-xr-x. 2 knox knox   94 Jul 22 07:23 topologies
-rw-r--r--. 1 knox knox 1814 Jul 22 07:23 ldap-log4j.properties
-rw-r--r--. 1 knox knox 2765 Jul 22 07:23 users.ldif

2. Lets start the knox Demo LDAP first. Goto Services > Knox and from the “Service Actions” drop-down start the demo LDAP.

start demo LDAP under knox service HDPCA exam

3. If you now check the configuration directory, you will find all the configuration files are now present.

# ls -lrt /etc/knox/conf/
total 28
-rw-r--r--. 1 knox knox 1436 May 11 13:21 shell-log4j.properties
-rw-r--r--. 1 knox knox 1485 May 11 13:21 knoxcli-log4j.properties
-rw-r--r--. 1 knox knox   91 May 11 13:21 README
-rw-r--r--. 1 knox knox  948 Jul 22 07:23 gateway-site.xml
-rw-r--r--. 1 knox knox 2447 Jul 22 07:23 gateway-log4j.properties
drwxr-xr-x. 2 knox knox   94 Jul 22 07:23 topologies
-rw-r--r--. 1 knox knox 1814 Jul 22 07:23 ldap-log4j.properties
-rw-r--r--. 1 knox knox 2765 Jul 22 07:23 users.ldif

4. We can use the user from the demo LDAP setup under knox. To find the usernames from demo LDAP, you can either view the file /etc/knox/conf/users.ldif or from ambari goto Services > Knox > Config.

knox config in ambari HDPCA

and view the property Advanced users-ldif in the config properties.

view Advanced users-ldif property in knox to get sample usernames and passwords

5. We will use the user guest with the password guest-password to validate the knox functionality. Let’s list out the contents of the directory /user using curl and webHDFS:

Invoke the LISTSTATUS operation on WebHDFS via the gateway

The syntax of the command used here is :

# curl -i -k -u [ldap user]:[ldap user's password] https://[knox gateway host]:[gateway port]/[gateway path][service]?[operation]

So here we have,
ldap user – guest
ldap user’s password – guest-password
knox gateway host – dn1.localdomain
gateway port – 8443
gateway path – /gateway/default
service – webHDFS ( webhdfs/v1/user)
operation to perform – LISTSTATUS

# curl -i -k -u guest:guest-password https://dn1.localdomain:8443/gateway/default/webhdfs/v1/user?op=LISTSTATUS
HTTP/1.1 200 OK
Date: Sun, 22 Jul 2018 09:43:50 GMT
Set-Cookie: JSESSIONID=hy8dpil2pvtksctrqzysggbs;Path=/gateway/default;Secure;HttpOnly
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: rememberMe=deleteMe; Path=/gateway/default; Max-Age=0; Expires=Sat, 21-Jul-2018 09:43:50 GMT
Cache-Control: no-cache
Expires: Sun, 22 Jul 2018 09:43:50 GMT
Date: Sun, 22 Jul 2018 09:43:50 GMT
Pragma: no-cache
Expires: Sun, 22 Jul 2018 09:43:50 GMT
Date: Sun, 22 Jul 2018 09:43:50 GMT
Pragma: no-cache
Content-Type: application/json; charset=UTF-8
X-FRAME-OPTIONS: SAMEORIGIN
Server: Jetty(6.1.26.hwx)
Content-Length: 957

{"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16388,"group":"hdfs","length":0,"modificationTime":1531849951840,"owner":"ambari-qa","pathSuffix":"ambari-qa","permission":"770","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16472,"group":"hdfs","length":0,"modificationTime":1532151400210,"owner":"hdfs","pathSuffix":"hdfs","permission":"700","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16468,"group":"hdfs","length":0,"modificationTime":1532178199524,"owner":"hdfs","pathSuffix":"test","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":0,"fileId":16471,"group":"hdfs","length":0,"modificationTime":1532150109945,"owner":"hdfs","pathSuffix":"user01","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}}

Here we have listed the directories under /user of cluster node dn1 using curl and knox gateway. The result (highlighted in red) is the result in JSON format. You can use any JSON formatter online, to format this json output.

jason formatted output from webHDFS - knox functionality verification

The output we have got is equivaled of executing the below command on dn1 cluster node.

$ hdfs dfs -ls /user
Found 4 items
drwxrwx---   - ambari-qa hdfs          0 2018-07-17 23:22 /user/ambari-qa
drwx------   - hdfs      hdfs          0 2018-07-21 11:06 /user/hdfs
drwxr-xr-x   - hdfs      hdfs          0 2018-07-21 18:33 /user/test
drwxr-xr-x   - hdfs      hdfs          0 2018-07-21 10:45 /user/user01

Filed Under: Hadoop, HDPCA, Hortonworks HDP

Some more articles you might also be interested in …

  1. HDPCA Exam Objective – Configure a local HDP repository
  2. HDPCA Exam Objective – Add a new node to an existing cluster
  3. HDPCA Exam Objective – Decommission a node (NodeManager)
  4. HDPCA Exam Objective – Configure NameNode HA
  5. HDPCA Exam Objective – Recover a snapshot
  6. HDPCA Exam Objective – Add an HDP service to a cluster using Ambari
  7. How to Configure Hive Authorization Using Apache Ranger
  8. HDPCA Exam Objective – Configure HiveServer2 HA ( Part 1 – Installing HiveServer )
  9. CCA 131 – Perform OS-level configuration for Hadoop installation
  10. Preparing for the CCA Administrator Exam (CCA131)

You May Also Like

Primary Sidebar

Recent Posts

  • ncat Command Examples in Linux
  • ncat: command not found
  • nautilus Command Examples in Linux
  • namei: command not found

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright