Arrays in awk

The Basics of Arrays

Introduction to Arrays

Arrays in awk are associative; numeric indices are converted automatically to strings. Each array is a collection of pairs, an index and its corresponding array element value.

Referring to an Array Element

array[index-expression]

To determine whether an element exists:

indx in array

awk 'BEGIN {
tab["dog"] = "chien"
tab["cat"] = "chat"
tab["one"] = "un"
} { if ("one" in tab) print tab["one"]
exit }'

Assigning Array Elements

array[index-expression] = value

Basic Array Example

cat tab_ex
5 I am the Five man
2 Who are you? The new number two!
4 . . . And four on the floor
1 Who is number one?
3 I three you.

awk '{
if ($1 > max)
max = $1
arr[$1] = $0
}
END {
for (x = 1; x <= max; x++)
if (x in arr)
print arr[x]
}' tab_ex
1 Who is number one?
2 Who are you? The new number two!
3 I three you.
4 . . . And four on the floor
5 I am the Five man

Scanning All Elements of an Array

for (var in array)
body
sudo journalctl | awk '
$5 ~ /(.+)(:|\[.+\])*:{0,1}/{ name = gensub(/(.+)(\[.+\].*)/, "\\1", "g",  $5 )
if ( counter[name] ~ /[0-9]+/ ) counter[name]++ ; else counter[name] = 1
if ( max < length(name) ) max = length(name)
}
END { command = "sort -nk 2"
for ( i in counter )
printf "%-"max"s %d\n", i, counter[i] | command
close(command)
}'

awk '{
for (i = 1; i <= NF; i++)
    used[$i] = 1
}
END {
    for (x in used) {
        if (length(x) > 4) {
            ++num_long_words
            print x
}}
print num_long_words, "words longer than 4 characters"
}' tab_ex

Using Predefined Array Scanning Orders with gawk

PROCINFO[“sorted_in”]

“@unsorted” arbitrary order, which is the default awk behavior.
“@ind_str_asc” ascending order, compared as strings.
“@ind_num_asc” ascending order, but force them to be treated as numbers
“@val_type_asc” ascending order, by the type assigned to the element.
“@val_str_asc” ascending order, by element values. Scalar values are compared as strings.
“@val_num_asc” ascending order, by element values. Scalar values are compared as numbers.
“@ind_str_desc” like “@ind_str_asc” , but ordered from high to low.
“@ind_num_desc” like “@ind_num_asc” , but ordered from high to low.
“@val_type_desc” like “@val_type_asc” , but ordered from high to low.
“@val_str_desc” like “@val_str_asc” , but ordered from high to low.
“@val_num_desc” like “@val_num_asc” , but ordered from high to low.
awk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_asc" }
{ a[$1] = $1 }
END { for (i in a) print i }' mail-list

awk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_desc" }
{ a[$1] = $1 }
END { for (i in a) print i }' mail-list

Using Numbers to Subscript Arrays

Array subscripts are always strings. The predefined variable CONVFMT can affect how your program accesses elements of an array.

awk 'BEGIN {
xyz = 12.153
data[xyz] = 1
if (xyz in data)
printf "%s is in data\n", xyz
else
printf "%s is not in data\n", xyz}'
12.153 is in data

awk 'BEGIN {
xyz = 12.153
data[xyz] = 1
CONVFMT = "%2.2f"
if (xyz in data)
printf "%s is in data\n", xyz
else
printf "%s is not in data\n", xyz}'
12.15 is not in data

Using Uninitialized Variables as Subscripts

echo 'line 1
line 2
line 3' | awk '{ l[lines] = $0; ++lines }
END {
for (i = lines - 1; i >= 0; i--)
print l[i]
}'
line 3
line 2

echo 'line 1
line 2
line 3' | awk '{ l[lines++] = $0 }
END {
for (i = lines - 1; i >= 0; i--)
print l[i]
}'
line 3
line 2
line 1

The delete Statement

delete array[index-expression]
delete array

Multidimensional Arrays

Multidimensional arrays are supported through concatenation of indices into one string. awk converts the indices into strings. The separator used is the value of the built-in variable SUBSEP.

Scanning Multidimensional Arrays

for (combined in array) {
split(combined, separate, SUBSEP)
}

Arrays of Arrays

awk 'BEGIN {
a[1][2] = "a b c d"
for (i in a)
if (isarray(a[i]))
for (j in a[i])
print a[i][j]
else
print a[i]
}'
a b c d