Using Loops (while, for) in awk scripts

The awk programming language contains many of the programming concepts that are used in shell scripting. Conditionals, such as the if statement and loops, such as the following can also be used in awk programming.

  • The while loop
  • The do while loop
  • The for loop

The if Statement

The if statement can have two branches: an if branch and an else branch. If the condition is true, the if branch is executed; if the condition is false, the else branch is executed.

if (condition) {
    statements 
} else { 
 statements }

You can nest if statements. When examining nested if statements and one or more of the if statements has an else statement, it is difficult to know with which if the else is operating. The simple rule is: Each else works on the closest if that does not yet have its own else; for example:

if (condition) { 
    if (condition) { 
     statements         
 } else {
     statements          
  } 
}

In the previous example, the else goes with the second if statement. Using indentation helps alleviate some of the potential confusion.

Conditional Printing With the awk Languag

In its simplest form, the if statement tests a condition and, if that condition is true, the statements are executed. To compare two numbers in the condition, use one of the following relational operators:

  • == Equal to
  • != Not equal to
  • > Greater than
  • >= Greater than or equal to
  • Less than or equal to

For example:

$ awk '{num = $5 / $6; if (num > 6) print $1, num}' data.file 
southeast 7.14286 
central 6.06383
$ awk '{if ($6 > .92) print $2, $6, $4}' data.file | sort 
CT .94 Watson 
NE .94 Nichols 
NW .98 Craig 
SO .95 Chin 
WE .97 Kelly

String Comparisons and Relational and Logical Operators

To compare two strings in a conditional, use one of the relational operators shown below:

  • == Equal to
  • != Not equal to
  • ~ Contains a /RE/
  • !~ Does not contain a /RE/

The ~ and !~ operators require some explanation. The left operand is the region of text being searched, which can be the entire record ($0) or a specific field of the record ($1). The right operand is a regular expression in slashes (/RE/) that is being searched for. If the text contains the regular expression, then the condition is true. You can use all the regular expression characters, that have been described, in the conditional’s regular expression.

The next examples show a pattern as a conditional statement (numeric or string). Each record is tested to see if the conditional is true for that record. If it is true, the ACTION (printing, in this case) is taken on that record. In the second example, an ACTION is not listed, so the default action is taken, to print the entire input record.

$ awk '{if ($2 ~ /E/) print $2, $1}' data.file 
WE western 
SE southeast 
EA eastern 
NE northeast
$ awk '$2 !~ /E/' data.file 
northwest       NW      Joel Craig      3.0 .98 3       4 
southwest       SW      Chris Foster    2.7 .8  2       18 
southern        SO      May Chin        5.1 .95 4       15 
north           NO      Val Shultz      4.5 .89 5       9 
central         CT      Sheri Watson    5.7 .94 5       13

Logical Operators

You use two logical operators between any two conditional expressions to join the expressions. The conditional expressions can be either numeric comparisons or string comparisons or a mixture of numeric and string comparisons. The logical AND requires that both of the joined expressions be true before the combination expression is true. The OR operator requires that only one of them is true. A unary NOT operator exists (the exclamation point) and you can use it to invert the logical value of an expression.

  • && Logical AND
  • || Logical OR
  • ! Logical NOT

For example:

$ awk '{if ( $2 ~ /E/ && $6 > .92) print $2, $6, $4}' data.file 
WE .97 Kelly 
NE .94 Nichols

The while Loop in the awk Language

When loops are included in a awk script or command, the loop executes one time for each record that is processed. For each record, the loop executes until the condition fails.

The while loop is the simplest loop available. It tests the same types of conditions that were used by the if statement. In loop constructs, the single statement, or multiple statements enclosed in curly brackets, is known as the body of the loop. If the condition is true, the body of the loop is executed; if the condition is false, awk continues with the awk program. The syntax for a while loop is:

while (condition)
    statement
while (condition) { 
    statements    
 }

The following example uses a while loop to print out each field of every record on a line by itself. It prints a blank line after each record is reported.

{ i = 1 }
{ while ( i <= NF )
    { print $i ; i++ } }

{ print "\n" }

The do while Loop

A do while loop follows the same philosophy of a while loop except that the condition is tested after the body of the loop is executed. The syntax is:

do  
    { statements }
while (condition)

The primary difference between a while loop and a do while loop is that the do while loop executes at least once. For example, suppose that the condition is false for a while loop and for a do while loop. In the while loop, the condition is tested first and because it is false, the body of the loop is not executed. In the do while loop, the body of the loop executes first, and then the condition is tested. Because the condition is false, the body of the loop is no longer executed, but it did execute once.

The following example uses a do while loop to print out each field of every record on a line by itself. It prints a blank line after each record is reported.

{ i = 1 }
{ do { 
      {  print $i ; i++ } 
 } while ( i <= NF ) 
}

{ print "\n" }

The for Loop in the awk Language

The for loop has two varieties, one borrowed from the C programming language and the other to process awk arrays.

for (setup_statement ; condition ; step_statement) 
    {statements }

for (indexvar in arrayname)
    {statements }

The following example shows the first type of for loop. The loop executes one time for every field in a record. When the value of a reaches the value of NF (the number of fields in the current record), the loop terminates, and the next record is read from the file to begin the process again.

$ cat for.awk 
{ for (a = 1; a <= NF; a++) print $a }
$ awk -f for.awk data.file 
northwest 
NW 
Joel 
Craig 
3.0 
.98 
3 
4 
western 
WE 
Sharon 
Kelly 
5.3 
... (output truncated)

Using Loops With Arrays

The second type of for loop is used explicitly with awk arrays, and it executes once for each array element.

for (index_var in array_name) { statements }

The index_var variable contains each array subscript in turn. With awk arrays, it is not always possible to know how many indices there are. Therefore, the syntax of this for loop allows the processing of arrays by creating a special variable (the first word in the for loop parentheses) to hold the value of each of the array indices. The second word in the parentheses is always in and the third word is the name of the array to be processed. The for loop executes one time for each index of the array (each element of the array). Each time through, the value of the special variable (the first word in the parentheses) changes to the next index of the array.
Oracle

Nonnumeric Array Indices

Arrays in awk differ from those in some programming languages in that they do not have to be declared before they can be used. Further, the index (the value in the square brackets) does not have to be numeric. The index value is associative, which means that it can be anything that relates to the data being stored.

$ cat bigcats 
lions 
tigers 
leopards 
tigers 
leopards 
leopards 
tigers 
leopards
$ awk '{a[$1]++} 
> END {for (c in a) print c, a[c]}' bigcats 
tigers 3 
lions 1 
leopards 4

In the preceding example, the file being processed contains a list of big cats. The awk statement reads each record, creating an array called a. The index of the element is the first field of the record (the name of the cat). The value of each array element always starts at 0 (when the array element is created). That element is incremented (with the ++ operator). Thus, at the end of the first statement, a variable with the name:

a[lions] 

is given the value of 1.

This continues through the entire file. When a cat that has already been processed before it is read, the array element for that index already exists. Therefore, the current value is incremented by 1. So, at the end of the file, one array element exists for each type of cat and the value of each element is the number of times that cat was encountered in the file.

The for loop creates a variable c to represent each of the indices (it is not known how many there are going to be) and processes the a array. For each element, the value of c (a cat name] and the value of the element (the number of times the cat name has been encountered) prints.

The break and continue Statements

You use the break statement to break out of a loop or stop its execution entirely. The statements in the loop body that follow the break statement are not executed, and execution resumes with the first line after the body of the loop.

while (condition) {
    statements 
    break  
    statements 
} 
more_awk_statements

The continue Statement

The continue statement stops the current iteration of the loop body, returning to the loop control expression.

while (condition) {
    statements  
    continue  
    statements 
}

The continue statement stops the execution of the body of statements in a loop, and execution is sent to the control portion of the loop. In a while loop or a do while loop, the control portion is the condition expression. The condition is tested again, and if it is true, then the body of the loop is entered again, and the process continues.

In a for loop (not the kind explicitly used with arrays), execution moves to the step portion of the loop (the third expression in the parentheses). The step statement is executed and the condition is tested again. If the condition is true, the loop is executed again. In a for loop that is used with arrays (for (indexvar in arrayname)), execution begins at the top of the loop body again but with the next array element being processed. The old array element is forgotten and dropped.

The effect of the continue statement for each of the types of looping statements is:

while and do while                                    # The condition is tested again 
for (setup_statement; condition; step_statement)      # The step is taken, and then the condition is tested 
for (indexvar in arrayname)                           # The next array element is processed

The next and exit Statements

Use the next and exit statements for program flow control. The next statement stops processing the current record, reads the next record from the input file, and starts over at the beginning of the awk program. The exit statement terminates the awk program. An exit status might be given, and it passes to awk’s parent process.

It is possible to stop processing any record of a file. The next record is read in and the awk program begins again with this new record. The statement that tells awk to get the next record is the next statement. Typically, a test is performed and, if the test is true (or false, depending on what is appropriate), the current record is discarded and the next record is read.

The exit statement is used in a similar manner in that a test is usually made. If the test is true (or false, depending on what is appropriate) then the program terminates, which is accomplished with the exit statement.

When exit is called, the entire awk process quits. The only argument allowed for exit is an optional exit status. As the awk process terminates, it passes an exit status to its parent process. If an argument is used with exit, this is the exit status that is returned to awk’s parent process.

Related Post