CIS 218 – Advanced UNIX
(g)awk
CIS 218 Advanced UNIX 1
Overview
• awk is a programming language
• Awk uses syntax based on grep and sed for
handling numbers and text
• awk provides field level addressability.
And within a field (word) using substring
commands
• awk works field by field
CIS 218 Advanced UNIX 2
awk command syntax
• There are two ways to execute an awk
program/script:
– awk [-F field-separator] „program‟ target-file
– awk [-F field-separator] -f program.file target
• From our discussion of sed, and
Refrigerator Rule No. 5, I would hope you
are firmly committed to the second form!
CIS 218 Advanced UNIX 3
awk Variables
• There are a number of awk variables that
are very useful
– FS (The field separator, defaults to white space)
– OFS (Output field separator, can be critical)
– NR (Number of records, a sequential counter)
– NF (Number of fields in the current record)
– FILENAME (Name of the current target file)
CIS 218 Advanced UNIX 4
awk Variables (cont.)
– $0 (The entire line as read from the target file)
– $n (Where n is the nth field in the record. This
is how we get field level addressability in awk)
• nawk, gawk, etc give us more variables, the
most significant two are:
– ARGC (the count of the command line
arguments)
– ARGV (an array of the command line
arguments)
CIS 218 Advanced UNIX 5
Parts of a program
• All programs are composed of one or more
of the following three constructs:
– sequence (a series of instructions, one
following the next, executed sequentially)
– selection (the ability of the code to decide
which instructions to execute, conditional
execution)
– iteration (adding looping so that selected code
will be repeated over an over)
CIS 218 Advanced UNIX 6
awk Program Format
• Awk programs are composed of
pattern {action} pairs (actions must be
enclosed in French braces {} )
– a pattern without a corresponding action takes
the default action, print $0
– an action without a corresponding pattern is
applied to every line
– each input line is submitted to every
pattern/action pair
CIS 218 Advanced UNIX 7
awk Program Format (cont.)
• Placement of the open French brace is critical
– pattern { both patterns are
action 1 executed for lines
action 2 matching the pattern }
– pattern lines matching the pattern
{action 1 are printed, and both
action 2 actions are performed on
} every line!
CIS 218 Advanced UNIX 8
Patterns
• In an awk program, the pattern is the
selection tool that decides what actions are
applied to which lines.
• Patterns can be:
– relational expressions
– regular expressions
– magic patterns
CIS 218 Advanced UNIX 9
Relational Expression patterns
Symbol Meaning Symbol Meaning
Greater than !~ doesn't contain
RE
>= Greater than or && logical and
equal to
!= not equal to || logical or
CIS 218 Advanced UNIX 10
Regular Expression patterns
• Must be enclosed in slashes /RE/
• Anchors apply to the entire line if they are
used as the only pattern
• Remember, you can use regular expressions
in relational patterns with ~ and !~ to apply
them to fields
• Both true regular expressions and fixed
patterns can be used as REs in awk
CIS 218 Advanced UNIX 11
Pre/Post Processing
• There are two in awk:
– BEGIN {the action associated is performed before the
target file is opened}
– END {the action associated is performed after the target
file is successfully closed}
• Both are coded in UPPER CASE
CIS 218 Advanced UNIX 12
# comments
• Like most scripting languages # indicates a
comment
• awk scripts should be well documented
• Comments should explain what you are
doing and why.
CIS 218 Advanced UNIX 13
print
• The print command is the simplistic output
tool for awk. Basically and “echo”/
• You can direct print to send its data to a file
with the > operator
• Generally print is used for simple output or
debugging output
CIS 218 Advanced UNIX 14
printf
• Similar in concept to the “C” language command.
The format of a printf command is:
printf (“formatting string”,variables)
• The formatting characters correspond to the
variables one for one in both lists.
• Each formatting character is prefixed by %
CIS 218 Advanced UNIX 15
printf (cont.)
• The formatting specifiers contain then
following characters:
– - indicates that the data should be left justifed
– n indicates the minimum width of the field
– .n indicates the maximum width of the field
“%-5s”
indicates a string field, left justified, of
width 5 bytes
CIS 218 Advanced UNIX 16
printf formatting characters
Format Meaning Format Meaning
%c single ASCII %G shortest of %E or
character %f
%d decimal integer %i decimal integer
%e scientific notation %o octal number
%E SCIENTIFIC %s string
NOTATION
%f floating point %x hexadecimal (lc)
%g shortest of %f or %X HEXADECIMAL
%e
CIS 218 Advanced UNIX 17
printf spacing characters
• There are two characters available to
change the spacing of your text:
– \n inserts a newline character. You must use
this if you want your output to occur on
successive lines.
– \t inserts a tab character
CIS 218 Advanced UNIX 18
getline
• getline is used to read from the keyboard
• It can also capture the results of a command
but this form is seldom used
• Read from the keyboard using
getline variable B A is greater than B
A >= B A is greater than or equal to B
A != B A is not equal to B
A ~ /RE/ A contains the regular
expression RE
CIS 218 Advanced UNIX 27
if
• A sample if
CIS 218 Advanced UNIX 28
exit
• The input file is closed
• Control is transferred to the action
associated with the END magic pattern if
there is one
• Generally used as a bailout in case of
catastrophic errors
CIS 218 Advanced UNIX 29
for loop
• This is a counted loop
• executes until the counter reaches the target
value
• Increment (count up) or decrement (count
down)
• also works with the elements of an array
• multiple verbs must be enclosed in { }
CIS 218 Advanced UNIX 30
for loop example
CIS 218 Advanced UNIX 31
while loop
• The while loop is an example of conditional
execution
• The loop cycles as long as the condition
specified is true
• A while loop always checks to see if it
should execute
• multiple verbs must be enclosed in { }
CIS 218 Advanced UNIX 32
while loop example
CIS 218 Advanced UNIX 33
do/while
• Even though it has a while in it, this is an
example of until logic.
• Until logic is shunned by conscientious
coders.
• „nuff said
CIS 218 Advanced UNIX 34
break
• Used to exit from a loop
• Control is passed to the line following the
end of the loop
• Causes an exit from the loop but NOT the
awk script. If you want to bail out of the
whole script, use the exit command.
CIS 218 Advanced UNIX 35
break example
CIS 218 Advanced UNIX 36
continue
• Causes awk to skip the rest of the body of
the loop for the current value
• In a for loop the counter is incremented, and
the next cycle of the loop is started
• In a while loop, the next iteration of the
loop starts
CIS 218 Advanced UNIX 37
continue example
CIS 218 Advanced UNIX 38
next
• Causes the script to start over
• takes the next element from standard input
or the target file
• Like exit, this command effects the whole
script
CIS 218 Advanced UNIX 39
next example
CIS 218 Advanced UNIX 40