Desktop
System
- Ansible from scratch
- Ansible
- AWX
- Using Docker
- MySQL Replication
- Nginx
- Percona XtraDB Cluster
- SELinux Samba share
- Sphinx
- Systemd
Bash
Awk
- Getting Started with awk
- Running awk and gawk
- Regular Expressions
- Reading Input Files
- Record number
- Record splitting with standard awk
- Record splitting with gawk
- Fields
- Contents of a field
- How fields are separated
- Each character a separate field
- FS from the command line
- Field-splitting summary
- Record-splitting summary
- Multiple-line records
- Explicit input with getline
- Getline summary
- Input with a timeout
- Printing Output
- Expressions
- Patterns, Actions, and Variables
- Arrays in awk
- Functions
Reading Input Files¶
Record splitting with standard awk¶
RS The record separator, by default a newline.
awk 'BEGIN { RS = ":" } { print $0 }' /etc/passwd
Record splitting with gawk¶
RT
RS is a single character, RT contains the same single character.
RS is a regular expression, RT contains the actual input text that
matched the regular expression.
echo record 1 AAAA record 2 BBBB record 3 |
awk 'BEGIN { RS = "\n|( *[[:upper:]]+ *)" }
{ print "Record =", $0,"and RT = [" RT "]" }'
Record = record 1 and RT = [ AAAA ]
Record = record 2 and RT = [ BBBB ]
Record = record 3 and RT = [
]
Fields¶
NF The number of fields in the current input record.
The input record is automatically separated into chunks called fields.
$0 |
All fields. |
$1 |
First field. |
$NF |
Last field. |
awk '/li/ { print $1, $NF }' mail-list
Contents of a field¶
Change fields content.
awk '{ nboxes = $3 ; $3 = $3 - 10 ; print nboxes, $3 }' inventory-shipped
Create new field.
awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
How fields are separated¶
FS The input field separator, a space by default.
Single character
awk 'BEGIN { FS = ":" ; OFS = ":"} ;$3 > 999 { print $1,$6,$7 }' /etc/passwd
Regexp
echo ' a b c d e ' | awk 'BEGIN { FS = "[ \t\n]+" } {print $2}'
Each character a separate field¶
echo ab cd | awk 'BEGIN { FS = "" }
{ for (i = 1; i <= NF; i++) print "Field", i, "is", $i }'
FS from the command line¶
awk -F: '$5 == ""' /etc/passwd
# same as
awk 'BEGIN { FS = ":" } $5 == "" { print }' /etc/passwd
Field-splitting summary¶
- FS == ” “Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored.This is the default.
- FS == any other single characterFields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences.The character can even be a regexp metacharacter; it does not need to be escaped.
- FS == regexpFields are separated by occurrences of characters that match regexp .Leading and trailing matches of regexp delimit empty fields.
- FS == “”Each individual character in the record becomes a separate field.(This is a common extension; it is not specified by the POSIX standard.)
- FIELDWIDTHS == list of columnsBased on character position.
- FPAT == regexpOn the text surrounding text matching the regexp.
Record-splitting summary¶
- RS == “\n”Records are separated by the newline character (\n).In effect, every line in the datafile is a separate record, including blank lines.This is the default.
- RS == any single characterRecords are separated by each occurrence of the character. Multiple successive occurrences delimit empty records.
- RS == “”Records are separated by runs of blank lines. When FS is a single character, then the newline character always serves as a field separator, in addition to whatever value FS may have. Leading and trailing newlines in a file are ignored.
- RS == regexpRecords are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty records.
Multiple-line records¶
cat addresses
Jane Doe
123 Main Street
Anywhere, SE 12345-6789
John Smith
456 Tree-lined Avenue
Smallville, MW 98765-4321
awk 'BEGIN { RS = "" ; FS = "\n" } {
print "Name is:", $1
print "Address is:", $2
print "City and state are:", $3
print "#######"
}' addresses
Explicit input with getline¶
awk 'BEGIN {
"date \"+%F %T\"" | getline current_time
close("date")
print "Report printed on " current_time
}'
Report printed on 2020-01-23 00:01:31
Getline summary¶
Variant |
Effect |
awk/gawk |
---|---|---|
getline |
Sets $0 , NF , FNR , NR , and RT |
awk |
getline var |
Sets var , FNR , NR , and RT |
awk |
getline < file |
Sets $0 , NF , and RT |
awk |
getline var < file |
Sets var and RT |
awk |
command | getline |
Sets $0 , NF , and RT |
awk |
command | getline var |
Sets var and RT |
awk |
command |& getline |
Sets $0 , NF , and RT |
gawk |
command |& getline var |
Sets var and RT |
gawk |
Input with a timeout¶
PROCINFO The elements of this array provide access to information about the running AWK program.
PROCINFO[“input_name”, “READ_TIMEOUT”] = timeout in milliseconds
awk 'BEGIN {PROCINFO["/dev/stdin", "READ_TIMEOUT"] = 5000}
{while ((getline < "/dev/stdin") > 0)
print $0}'
awk 'BEGIN { PROCINFO["-", "READ_TIMEOUT"] = 5000 }
{ print "You entered: " $0 }'