Patterns, Actions, and Variables

[pattern] { action }
pattern [{ action }]

Pattern Elements

Patterns control the execution of rules. A rule is executed when its pattern matches the current input record.

/regular expression/

a regular expression

expression

a single expression

begpat, endpat

a pair of patterns

BEGIN

special patterns to supply startup or cleanup actions

END

BEGINFILE

special patterns to supply startup or cleanup actions perfile basis

ENDFILE

empty

the empty pattern matches every input record

Expressions as Patterns

awk '$1 == "li" { print $2 }' mail-list

Regular Expressions as Patterns

awk '$1 ~ /li/ { print $2 }' mail-list
awk '/edu/ && /li/' mail-list
awk '/edu/ || /li/' mail-list
awk '! /li/' mail-list

Specifying Record Ranges with Patterns

A range pattern is used to match ranges of consecutive input records.

begpat,endpat

cat myfile
no      first   100     65
on      user1   1       12
on      user2   4       345
off     user3   12      73
no      last    2       123

awk '$1 == "on", $1 == "off" { printf "%s %-3s %-3s\n", $2, $3, $4 }' myfile

The BEGIN and END Special Patterns

The BEGIN and END patterns supply startup and cleanup actions.
BEGIN and END rules must have actions.

Startup and cleanup actions

awk 'BEGIN { print "Analysis of \"li\"" }
   /li/ { ++n }
   END { print "\"li\" appears in", n, "records." }' mail-list

Input/output from BEGIN and END rules

Danger

  • Be aware of referencing $0.

  • The next and nextfile statements are not allowed.

The BEGINFILE and ENDFILE Special Patterns

FILENAME is set to the name of the current file, and FNR is set to zero. ERRNO is set. The next statement is not allowed.

The Empty Pattern

An empty pattern match every input record.

awk '{ print $1 }' mail-list

Using Shell Variables in Programs

  • Variable substitution via quoting:

printf "Enter search pattern: "; read pattern
Enter search pattern: ri
awk '$1 ~ '"/$pattern/"'{ nmatches++ }
END { print nmatches, "found."}' mail-list
1 found.
  • awk’s variable assignment, assign the shell variable’s value to an awk variable.

printf "Enter search pattern: "; read pattern
Enter search pattern: li
awk -v pat="$pattern" '$1 ~ pat { nmatches++ }
END { print nmatches, "found."}' mail-list
2 found.

Actions

An action consists of one or more awk statements.
Action could be omitted if pattern i defined.
awk '/li/' mail-list

Types of statements:

  • Expressions

  • Control statements

  • Compound statements

  • Input statements

  • Output statements

  • Deletion statements

Control Statements in Actions

The if-else Statement

if (condition) then-body [else else-body]

awk '{ if ( $2 ~ /99/ ) print }' mail-list

The while Statement

while (condition)
body
awk '{ i = 1 ; while ( i <= 3 ) { print $i ; i++ } }' inventory-shipped

The do-while Statement

do
body
while (`condition`)
awk '{ i = 1 ; do { print $0 ; i++ } while ( i <= 5 ) }' inventory-shipped

The for Statement

for (initialization; condition; increment)
body
awk '{ for ( i = 1 ; i <= 3 ; i++ ) print $i }' inventory-shipped

The switch Statement

switch (expression) {
case value or regular expression:
case-body
default:
default-body
}
awk '{ switch ($1) {
case "Bill":
    print $1, "was here"
    break
case "Julie":
    print $1, "was here"
    break
default:
    break
} }' mail-list

The break Statement

The break statement jumps out of the innermost for , while , or do loop.

The continue Statement

The continue statement is used only inside for , while , and do loops causing the next cycle around the loop to begin immediately.

The next Statement

The next statement forces awk to immediately stop processing the current record and go on to the next record.

awk '{ if ( $1 !~ /Bill|Julie/ ) next ; else print }' mail-list

The nextfile Statement

The nextfile statement instructs awk to stop processing the current datafile.

awk '{ if ( $1 !~ /Bill|Julie/ ) print ; else nextfile }' mail-list

The exit Statement

exit [return code]

awk '{ if ( $1 == "Bill" ) exit 1 "Bill scares me" ; else print }' mail-list

Predefined Variables

Built-in Variables That Control awk

BINMODE #

specifies use of binary mode for all I/O

CONVFMT

controls the conversion of numbers to strings (“%.6g”)

FIELDWIDTHS #

space-separated list of columns

FPAT #

regexp that tells gawk to create the fields based on regexp match

FS

input field separator

IGNORECASE #

if non-zero/null, comparison & regexp matching are case-independent

LINT #

if true, provides warnings about constructs

OFMT

controls the conversion of numbers to strings

OFS

output field separator

ORS

output record separator

PREC #

working precision of arbitrary-precision floating-point numbers

ROUNDMODE #

rounding mode to use for arbitrary-precision arithmetic on numbers

RS

input record separator

SUBSEP

subscript separator used in indices of array’s separation

TEXTDOMAIN #

used for internationalization (“messages”)

Built-in Variables That Convey Information

ARGC

number of command-line arguments

ARGV

command-line arguments stored in an array

ARGIND #

index in ARGV of the current file

ENVIRON

associative array containing the values of the environment

ERRNO #

string describing the error (getline or close)

FILENAME

name of the current input file

FNR

current record number in the current file

NF

number of fields in the current input record

FUNCTAB #

array of all functions in the program

NR

number of input records awk has processed

PROCINFO #

array of informations about the running awk program

RLENGTH

length of the substring matched by match()

RSTART

start index in characters of the substring matched by match()

RT #

input text that matched the text denoted by RS

SYMTAB #

array of all defined global variables and arrays in the program

awk -v foo=4 'BEGIN { SYMTAB["foo"] = "toto" ; print foo, ENVIRON["HOME"] }'

Using ARGC and ARGV

awk 'BEGIN { for ( i = 0 ; i < ARGC ; i++ )
printf "\tARGV[%d] = %s\n", i, ARGV[i] }' toto tata