awk(1)
awk [opt] program [input]
-F <sepstr> field separator string (can be regex)
program awk program
input file or stdin if not file given
Input processing
Input is processed in two stages:
- Splitting input into a sequence of
records
. By default split atnewline
character, but can be changed via the builtinRS
variable. - Splitting a
record
intofields
. By default strings withoutwhitespace
, but can be changed via the builtin variableFS
or command line option-F
.
Fields are accessed as follows:
$0
wholerecord
$1
field one$2
field two- ...
Program
An awk
program is composed of pairs of the form:
pattern { action }
The program is run against each record
in the input stream. If a pattern
matches a record
the corresponding action
is executed and can access the
fields
.
INPUT
|
v
record ----> ∀ pattern matched
| |
v v
fields ----> run associated action
Any valid awk expr
can be a pattern
.
An example is the regex pattern /abc/ { print $1 }
which prints the first
field if the record matches the regex /abc/
. This form is actually a short
version for $0 ~ /abc/ { print $1 }
, see the regex comparison operator
below.
Special pattern
awk provides two special patterns, BEGIN
and END
, which can be used
multiple times. Actions with those patterns are executed exactly once.
BEGIN
actions are run before processing the first recordEND
actions are run after processing the last record
Special variables
RS
record separator: first char is the record separator, by defaultFS
field separator: regex to split records into fields, by defaultNR
number record: number of current recordNF
number fields: number of fields in the current record
Special statements & functions
-
printf "fmt", args...
Print format string, args are comma separated.
%s
string%d
decimal%x
hex%f
float
Width can be specified as
%Ns
, this reservesN
chars for a string. For floats one can use%N.Mf
,N
is the total number including.
andM
. -
sprintf("fmt", expr, ...)
Format the expressions according to the format string. Similar as
printf
, but this is a function and return value can be assigned to a variable. -
strftime("fmt")
Print time stamp formatted by
fmt
.%Y
full year (eg 2020)%m
month (01-12)%d
day (01-31)%F
alias for%Y-%m-%d
%H
hour (00-23)%M
minute (00-59)%S
second (00-59)%T
alias for%H:%M:%S
-
S ~ R
,S !~ R
The regex comparison operator, where the former returns true if the string
S
matches the regexR
, and the latter is the negated form. The regex can be either a constant or dynamic regex.
Examples
Filter records
awk 'NR%2 == 0 { print $0 }' <file>
The pattern NR%2 == 0
matches every second record and the action { print $0 }
prints the whole record.
Negative patterns
awk '!/^#/ { print $1 }' <file>
Matches records not starting with #
.
Range patterns
echo -e "a\nFOO\nb\nc\nBAR\nd" | \
awk '/FOO/,/BAR/ { print }'
/FOO/,/BAR/
define a range pattern of begin_pattern, end_pattern
. When
begin_pattern
is matched the range is turned on and when the
end_pattern
is matched the range is turned off. This matches every record
in the range inclusive.
An exclusive range must be handled explicitly, for example as follows.
echo -e "a\nFOO\nb\nc\nBAR\nd" | \
awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
Access last fields in records
echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
Access last fields with arithmetic on the NF
number of fields variable.
Split on multiple tokens
echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
Use regex as field separator.
Capture in variables
# /proc/<pid>/status
# Name: cat
# ...
# VmRSS: 516 kB
# ...
for f in /proc/*/status; do
cat $f | awk '
/^VmRSS/ { rss = $2/1024 }
/^Name/ { name = $2 }
END { printf "%16s %6d MB\n", name, rss }';
done | sort -k2 -n
We capture values from VmRSS
and Name
into variables and print them at the
END
once processing all records is done.
Capture in array
echo 'a 10
b 2
b 4
a 1' | awk '{
vals[$1] += $2
cnts[$1] += 1
}
END {
for (v in vals)
printf "%s %d\n", v, vals[v] / cnts [v]
}'
Capture keys and values from different columns and some up the values.
At the END
we compute the average of each key.
Run shell command and capture output
cat /proc/1/status | awk '
/^Pid/ {
"ps --no-header -o user " $2 | getline user;
print user
}'
We build a ps
command line and capture the first line of the processes output
in the user
variable and then print it.