Reading Input Files

Record number

NR The total number of input records seen so far.

w | awk 'NR > 2 { print }'

Record splitting with standard awk

RS The record separator, by default a newline.

awk 'BEGIN { RS = ":" } { print $0 }' /etc/passwd

Record splitting with gawk

RT
RS is a single character, RT contains the same single character.
RS is a regular expression, RT contains the actual input text that matched the regular expression.
echo record 1 AAAA record 2 BBBB record 3 |
awk 'BEGIN { RS = "\n|( *[[:upper:]]+ *)" }
{ print "Record =", $0,"and RT = [" RT "]" }'
Record = record 1 and RT = [ AAAA ]
Record = record 2 and RT = [ BBBB ]
Record = record 3 and RT = [
]

Fields

NF The number of fields in the current input record.
The input record is automatically separated into chunks called fields.

$0

All fields.

$1

First field.

$NF

Last field.

awk '/li/ { print $1, $NF }' mail-list

Contents of a field

Change fields content.

awk '{ nboxes = $3 ; $3 = $3 - 10 ; print nboxes, $3 }' inventory-shipped

Create new field.

awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped

How fields are separated

FS The input field separator, a space by default.

  • Single character

awk 'BEGIN { FS = ":" ; OFS = ":"} ;$3 > 999 { print $1,$6,$7 }' /etc/passwd
  • Regexp

echo ' a b c d e  ' | awk 'BEGIN { FS = "[ \t\n]+" } {print $2}'

Each character a separate field

echo ab cd | awk 'BEGIN { FS = "" }
{ for (i = 1; i <= NF; i++) print "Field", i, "is", $i }'

FS from the command line

awk -F: '$5 == ""' /etc/passwd
# same as
awk 'BEGIN { FS = ":" } $5 == "" { print }' /etc/passwd

Field-splitting summary

  • FS == ” “
    Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored.
    This is the default.
  • FS == any other single character
    Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences.
    The character can even be a regexp metacharacter; it does not need to be escaped.
  • FS == regexp
    Fields are separated by occurrences of characters that match regexp .
    Leading and trailing matches of regexp delimit empty fields.
  • FS == “”
    Each individual character in the record becomes a separate field.
    (This is a common extension; it is not specified by the POSIX standard.)
  • FIELDWIDTHS == list of columns
    Based on character position.
  • FPAT == regexp
    On the text surrounding text matching the regexp.

Record-splitting summary

  • RS == “\n”
    Records are separated by the newline character (\n).
    In effect, every line in the datafile is a separate record, including blank lines.
    This is the default.
  • RS == any single character
    Records are separated by each occurrence of the character. Multiple successive occurrences delimit empty records.
  • RS == “”
    Records are separated by runs of blank lines. When FS is a single character, then the newline character always serves as a field separator, in addition to whatever value FS may have. Leading and trailing newlines in a file are ignored.
  • RS == regexp
    Records are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty records.

Multiple-line records

cat addresses
Jane Doe
123 Main Street
Anywhere, SE 12345-6789

John Smith
456 Tree-lined Avenue
Smallville, MW 98765-4321

awk 'BEGIN { RS = "" ; FS = "\n" } {
print "Name is:", $1
print "Address is:", $2
print "City and state are:", $3
print "#######"
}' addresses

Explicit input with getline

awk 'BEGIN {
"date \"+%F %T\"" | getline current_time
close("date")
print "Report printed on " current_time
}'
Report printed on 2020-01-23 00:01:31

Getline summary

Variant

Effect

awk/gawk

getline

Sets $0 , NF , FNR , NR , and RT

awk

getline var

Sets var , FNR , NR , and RT

awk

getline < file

Sets $0 , NF , and RT

awk

getline var < file

Sets var and RT

awk

command | getline

Sets $0 , NF , and RT

awk

command | getline var

Sets var and RT

awk

command |& getline

Sets $0 , NF , and RT

gawk

command |& getline var

Sets var and RT

gawk

Input with a timeout

PROCINFO The elements of this array provide access to information about the running AWK program.

PROCINFO[“input_name”, “READ_TIMEOUT”] = timeout in milliseconds

awk 'BEGIN {PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000}
{while ((getline < "/dev/stdin") > 0)
print $0}'
awk 'BEGIN { PROCINFO["-", "READ_TIMEOUT"] = 5000 }
{ print "You entered: " $0 }'