Regular Expressions

Use regular expressions

/regexp/

awk '/li/ { print $2 }' mail-list

exp ~ /regexp/

awk '$1 ~ /^J/' inventory-shipped

exp !~ /regexp/

awk '$1 !~ /^J/' inventory-shipped
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'

Dynamic regexps

ls -l | awk 'BEGIN { digits_regexp = "[[:digit:]]+" } $5 ~ digits_regexp { print }'
ls -l | awk 'BEGIN { digits_regexp = "[[:alpha:]]+" } $4 ~ digits_regexp { print }'

POSIX character classes

Class

Description

[:alnum:]

Alphanumeric characters

[:alpha:]

Alphabetic characters

[:blank:]

Space and TAB characters

[:cntrl:]

Control characters

[:digit:]

Numeric characters

[:graph:]

Characters that are both printable and visible 1

[:lower:]

Lowercase alphabetic characters

[:print:]

Printable characters 2

[:punct:]

Punctuation characters 3

[:space:]

Space characters 4

[:upper:]

Uppercase alphabetic characters

[:xdigit:]

Characters that are hexadecimal digits

Ignore case

IGNORECASE = [0,1]

echo -e "ab\ncd" | awk 'BEGIN { IGNORECASE = 1 } /A/ { print }'

Footnote

1

A space is printable but not visible, whereas an ‘ a ’ is both.

2

Characters that are not control characters.

3

Characters that are not letters, digits, control characters, or space characters.

4

Such as space, TAB, and formfeed, to name a few.