HDPCA Exam Objective – View an application’s log file (Troubleshoot a failed job)

Note: This is post is part of the HDPCA exam objective series

It is an integral part of Haddop administration to troubleshoot running or failed jobs. In order to troubleshoot a running/failed job, we must view the application’s log file. This post focuses on the HDPCA exam objective “View an application’s log file”. We will run a sample map reduce program and view the status of the program using the command line and ResourceManager UI.

Running an Example job

The HDP installation comes with few examples MapReduce jobs which we can run to test the yarn functionality. You can check the available examples by running the below command:

[hdfs@nn1 ~]$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

As you can see there are several example jobs can be run. The job is provided with the description on what it does.

Word Count

1. Lets us run the famous “word count” job and see if it runs properly. Lets copy a file from local to the HDFS with some sample data.

[hdfs@nn1 ~]$ cat /home/hdfs/test_file
This is a test file.
[hdfs@nn1 ~]$ hdfs dfs -put /home/hdfs/test_file /user/test/test_file

2. We can now run the “word count” example on the test_file we just uploaded to HDFS. The syntax to run the job is:

$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount [input directory] [output directory]

Here,
input directory – This will contain the input file(s)
output directory – This will contain the output file(s).

Note: Do not create the “output directory” to avoid errors. It will get created automatically when the job is run.

Lets run the job now:

$ hadoop jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /user/test /user/test/output

You can verify the output from the output directory.

$ hdfs dfs -ls /user/test/output
Found 2 items
-rw-r--r--   3 hdfs hdfs          0 2018-07-28 10:11 /user/test/output/_SUCCESS
-rw-r--r--   3 hdfs hdfs         31 2018-07-28 10:11 /user/test/output/part-r-00000
$ hdfs dfs -cat /user/test/output/part-r-00000
This 1
a 1
file. 1
is 1
test 1

Viewing application log file

There are 2 ways to view the log file of an application.
1. Using command line
2. Using ResourceManager UI

Using command line

You can view the currently running jobs with the “yarn application” command. To list all the options available, use the “yarn application” command without any arguements. The command need to be executed with the “yarn” user.

[yarn@nn1 ~]$ yarn application

You can list the running jobs using the below command:

$ yarn application -list
18/07/28 11:19:06 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050
18/07/28 11:19:08 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                Application-Id     Application-Name     Application-Type       User      Queue              State        Final-State        Progress                        Tracking-URL
application_1532705907874_0207      QuasiMonteCarlo            MAPREDUCE  ambari-qa    default            RUNNING          UNDEFINED              5%        http://nn2.localdomain:38307

Here each job has a unique applicationID. You can also filter the jobs using the -appStates option where you can filter the jobs based on their status (RUNNING, ACCEPTED, NEW, FINISHED etc).

$ yarn application -list -appStates FINISHED
18/07/28 11:22:16 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050
18/07/28 11:22:17 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200
Total number of applications (application-types: [] and states: [FINISHED]):13
                Application-Id     Application-Name     Application-Type       User      Queue              State        Final-State        Progress                        Tracking-URL
application_1531843386282_0002           word count            MAPREDUCE  ambari-qa    default           FINISHED          SUCCEEDED            100% http://dn2.localdomain:19888/jobhistory/job/job_1531843386282_0002

You can Also kill a running job using the Application ID:

$ yarn application -kill application_1532705907874_0212
18/07/28 11:24:25 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050
18/07/28 11:24:26 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200
Killing application application_1532705907874_0212
18/07/28 11:24:27 INFO impl.YarnClientImpl: Killed application application_1532705907874_0212

Now to view an application’s log, use the below command:

$ yarn logs -applicationId application_1531843386282_0001
18/07/28 11:30:05 INFO client.RMProxy: Connecting to ResourceManager at dn3.localdomain/192.168.1.5:8050
18/07/28 11:30:06 INFO client.AHSProxy: Connecting to Application History server at dn3.localdomain/192.168.1.5:10200
18/07/28 11:30:09 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
18/07/28 11:30:09 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_1531843386282_0001_01_000001 on nn1.localdomain_45454
LogAggregationType: AGGREGATED
==========================================================================
LogType:launch_container.sh
LogLastModifiedTime:Tue Jul 17 21:34:48 +0530 2018
LogLength:4381
LogContents:
#!/bin/bash

set -o pipefail -e
export PRELAUNCH_OUT="/hadoop/yarn/log/application_1531843386282_0001/container_1531843386282_0001_01_000001/prelaunch.out"
exec >"${PRELAUNCH_OUT}"
export PRELAUNCH_ERR="/hadoop/yarn/log/application_1531843386282_0001/container_1531843386282_0001_01_000001/prelaunch.err"
exec 2>"${PRELAUNCH_ERR}"
echo "Setting up env variables"
....
....

This will be a very long log file. You can view this log file to troubleshoot any failures during the application execution.

Viewing logs using ResourceManager UI

Another way to view the application logs is to use the ResourceManager UI. Point your web browser to http://[resource manaer hostname or IP]/8088, to access the ResourceManager Web UI.

http://[resource manaer hostname or IP]/8088

Or you can also use the ambari to go to ResourceManager UI.

In the Resouce Manager UI, you can view the applications which are filtered with their State – RUNNING, ACCEPTED, NEW etc. In my case, the job was finished, so I can look into the FINISHED jobs to find the application job. You can click the applicationId to find more details on the job.

From this page, you can also view the logs of the application.

Below are sample logs of a finished job:

Viewing Log files in Hadoop eco system

It is also important to know the log file locations of each of the ahdoop ecosystem components such as HDFS, ResourceManager, NodeManager etc. There will be 3 types of file for each component/deamon in hadoop ecosystem.

  • .log
  • .out
  • .log.[date]

.log extension log files

The log files with the .log extension show the log messages for the running daemons. If there are any errors encountered while the daemon is running, the stack trace of the error is logged in these files. The example below is a .log extension log file for the NodeManager.

.out extension log files

The log files with the .out extension are created and written to during start-up of deamons. It is very rare that these files get populated, but they can be helpful when trying to determine why Resource Manager, Node Manager, or the Job History Server daemons are not starting up. Example shown below is for the .out extension log file for the NodeManager Deamon.

.log.[date]

The log files with extension .log.[date] are created when the log files are rotated. These files are useful when an issue has occurred multiple times, and comparing these older log files with the most recent log file can help uncover patterns of occurrence.

Finding location of log files from ambari

The location of these log files can be easily found out from the ambari. For example to find YARN log file locations, go to Services > YARN > Configs. Search for the property “YARN_LOGS_DIR” in the search box.

References :
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_using-apache-hadoop/content/log_files.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_using-apache-hadoop/content/running_mapreduce_examples_on_yarn.html

Related Post