awk, A Unix Power Tool awk by zwj23860

VIEWS: 113 PAGES: 15

									                awk, A Unix Power Tool




ITSW 1407/KRF
                                                            1




awk
q   A programming language for handling common
    data manipulation tasks with only a few lines of
    code
q   awk is a pattern action language
q   The language looks a little like C but automatically
    handles input, field splitting, initialization, and
    memory management
     u Built-in
              string and number data types
     u No variable type declarations
q   awk is a great prototyping language
     u Start with a few lines and keep adding until it does what
        you want

                                                            2
ITSW 1407/KRF




                                                                   1
History
 q   Originally designed/implemented in 1977 by Al
     Aho, Peter Weinberger, and Brian Kernigan
     u Inpart as an experiment to see how grep and sed could
       be generalized to deal with numbers as well as text
     u Originally intended for very short programs
     u But people started using it and the programs kept
       getting bigger and bigger!
 q   In 1985, new awk, or nawk, was written to add
     enhancements to facilitate larger program
     development
     u Major      new feature is user defined functions


ITSW 1407/KRF
                                                              3




 q   Other enhancements in nawk include:
     u Dynamic   regular expressions
         l Text substitution and pattern matching functions
     u Additional built-in functions and variables
     u New operators and statements
     u Input from more than one file
     u Access to command line arguments
 q   nawk also improved error messages which makes
     debugging considerably easier under nawk than
     awk
 q   On many systems, nawk has replaced awk
     u On       ours, both exist

                                                              4
ITSW 1407/KRF




                                                                  2
Tutorial
 q   Program structure
 q   Running an awk program
 q   Error messages
 q   Output from awk
 q   Record selection
 q   BEGIN and END
 q   Number crunching
 q   Handling text
 q   Built-in functions
 q   Control flow
 q   Arrays

 ITSW 1407/KRF
                                                          5




Structure of an awk Program
 q   An awk program consists of:
      u An  optional BEGIN segment         BEGIN {action}
         l For processing to execute       pattern {action}
           prior to reading input
                                           pattern {action}
      u pattern - action pairs
         l Processing for input data        .
         l For each pattern matched,        .
           the corresponding action is
           taken                            .
      u An optional END segment            pattern {action}
         l Processing after end of input   END    {action}
           data

                                                          6
 ITSW 1407/KRF




                                                              3
Pattern-Action Structure
 q   Every program statement has to have a pattern,
     an action, or both
 q   Default pattern is to match all lines
 q   Default action is to print current record
 q   Patterns are simply listed; actions are enclosed in
     { }s
 q   awk scans a sequence of input lines, or records,
     one by one, searching for lines that match the
     pattern
     u Meaning  of match depends on the pattern
     u /Beth/ matches if the string "Beth" is in the record
     u $3 > 0 matches if the condition is true


ITSW 1407/KRF
                                                              7




Running an awk Program
 q   There are several ways to run an awk program
     u awk   'program' input_file(s)
        l program and input files are provided as command-
          line arguments
     u awk 'program'
        l program is a command-line argument; input is taken
          from standard input (yes, awk is a filter!)
     u awk -f program_file_name input_files
        l program is read from a file




                                                              8
ITSW 1407/KRF




                                                                  4
Errors
 q   If you make an error, awk will provide a diagnostic
     error message
     awk '$3 == 0 [ print $1 }' emp.data
     awk: syntax error near line 1
     awk: bailing out near line 1
 q   Or if you are using nawk
     nawk '$3 == 0 [ print $1 }' emp.data
     nawk: syntax error at source line 1
       context is
       $3 == 0 >>> [ <<<
                 1 extra }
         1 extra [
     nawk: bailing out at source line 1
         1 extra }
         1 extra [

ITSW 1407/KRF
                                                     9




Some of the Built-In Variables
 q   NF - Number of fields in current record
 q   NR - Number of records read so far
 q   $0 - Entire line
 q   $n - Field n
 q   $NF - Last field of current record




                                                    10
ITSW 1407/KRF




                                                           5
Simple Output From awk
q   Printing every line
     u If an action has no pattern, the action is performed for
        all input lines
         l { print } will print all input lines on stdout
         l { print $0 } will do the same thing

q   Printing certain fields
     u Multiple  items can be printed on the same output line
       with a single print statement
     u { print $1, $3 }
     u Expressions separated by a comma are, by default,
       separated by a single space when output


ITSW 1407/KRF
                                                            11




q   NF, the Number of Fields
     u Any   valid expression can be used after a $ to indicate a
       particular field
     u One built-in expression is NF, or Number of Fields
     u { print NF, $1, $NF } will print the number of fields, the
       first field, and the last field in the current record
q   Computing and printing
     u You   can also do computations on the field values and
       include the results in your output
     u { print $1, $2 * $3 }




                                                            12
ITSW 1407/KRF




                                                                    6
q   Printing line numbers
     u The  built-in variable NR can be used to print line
       numbers
     u { print NR, $0 } will print each line prefixed with its line
       number
q   Putting text in the output
     u You   can also add other text to the output besides what
       is in the current record
     u { print "total pay for", $1, "is", $2 * $3 }
     u Note that the inserted text needs to be surrounded by
       double quotes



ITSW 1407/KRF
                                                                13




Fancier Output
q   Lining up fields
     u Like  C, awk has a printf function for producing
       formatted output
     u printf has the form
         l printf( format, val1, val2, val3, … )

       { printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
     u When using printf, formatting is under your control so
       no automatic spaces or NEWLINEs are provided by
       awk. You have to insert them yourself.
       { printf("%-8s %6.2f\n", $1, $2 * $3 ) }




                                                                14
ITSW 1407/KRF




                                                                      7
awk as a Filter
 q   Since awk is a filter, you can also use pipes with
     other filters to massage its output even further
 q   Suppose you want to print the data for each
     employee along with their pay and have it sorted
     in order of increasing pay

     awk '{ printf("%6.2f %s\n", $2 * $3, $0) }' emp.data | sort




ITSW 1407/KRF
                                                                      15




Selection
 q   awk patterns are good for selecting specific lines
     from the input for further processing
 q   Selection by comparison
     u $2       >=5 { print }
 q   Selection by computation
     u $2       * $3 > 50 { printf("%6.2f for %s\n", $2 * $3, $1) }
 q   Selection by text content
     u $1 == "Susie"
     u /Susie/

 q   Combinations of patterns
     u $2       >= 4 || $3 >= 20

                                                                      16
ITSW 1407/KRF




                                                                           8
Data Validation
 q   Validating data is a common operation
 q   awk is excellent at data validation
     u NF != 3 { print $0, "number of fields not equal to 3" }
     u $2 < 3.35 { print $0, "rate is below minimum wage" }
     u $2 > 10 { print $0, "rate exceeds $10 per hour" }
     u $3 < 0 { print $0, "negative hours worked" }
     u $3 > 60 { print $0, "too many hours worked" }




ITSW 1407/KRF
                                                             17




BEGIN and END
 q   Special pattern BEGIN matches before the first
     input line is read; END matches after the last input
     line has been read
 q   This allows for initial and wrap-up processing
     BEGIN { print "NAME RATE HOURS"; print "" }
           { print }
     END { print "total number of employees is", NR }




                                                             18
ITSW 1407/KRF




                                                                  9
Computing with awk
q   Counting is easy to do with awk
$3 > 15 { emp = emp + 1}
END { print emp, "employees worked more than 15 hrs"}
q   Computing Sums and Averages is also simple
         { pay = pay + $2 * $3 }
     END { print NR, "employees"
           print "total pay is", pay
           print "average pay is", pay/NR
         }




ITSW 1407/KRF
                                                          19




Handling Text
q   One major advantage of awk is its ability to
    handle strings as easily as many languages
    handle numbers
q   awk variables can hold strings of characters as
    well as numbers, and awk conveniently translates
    back and forth as needed
q   This program finds the employee who is paid the
    most per hour
$2 > maxrate { maxrate = $2; maxemp = $1 }
END { print "highest hourly rate:", maxrate, "for", maxemp }



                                                          20
ITSW 1407/KRF




                                                               10
 q   String concatenation
     u New strings can be created by combining old ones
         { names = names $1 " " }
     END { print names }
 q   Printing the last input line
     u Although  NR retains its value after the last input line
      has been read, $0 does not
         { last = $0 }
     END { print last }




ITSW 1407/KRF
                                                              21




Built-in Functions
 q   awk contains a number of built-in functions.
     length is one of them.
 q   Counting lines, words, and characters using
     length ( a poor man's wc )
         { nc = nc + length($0) + 1
           nw = nw + NF
          }
     END { print NR, "lines,", nw, "words,", nc, "characters" }




                                                              22
ITSW 1407/KRF




                                                                   11
Control Flow Statements
q   awk provides several control flow statements for
    making decisions and writing loops
q   If-Else
         $2 > 6 { n = n + 1; pay = pay + $2 * $3 }
     END { if (n > 0)
               print n, "employees, total pay is", pay,
               "average pay is", pay/n
           else
               print "no employees are paid more than
      $6/hour"
         }


ITSW 1407/KRF
                                                          23




Loop Control
q   While
     # interest1 - compute compound interest
     # input: amount rate years
     # output: compound value at end of each year
     { i=1
        while (i <= $3) {
                 printf("\t%.2f\n", $1 * (1 + $2) ^ i)
                 i=i+1
        }
     }



                                                          24
ITSW 1407/KRF




                                                               12
q   For
     # interest2 - compute compound interest
     # input: amount rate years
     # output: compound value at end of each year
     { for (i = 1; i <= $3; i = i + 1)
                   printf("\t%.2f\n", $1 * (1 + $2) ^ i)
     }




ITSW 1407/KRF
                                                                  25




Arrays
q   awk provides arrays for storing groups of related
    data values
     # reverse - print input in reverse order by line
           { line[NR] = $0 } # remember each line
     END { i = NR                # print lines in reverse order
                 while (i > 0) {
                        print line[i]
                        i=i-1
                 }
     }



                                                                  26
ITSW 1407/KRF




                                                                       13
Useful "One(or so)-liners"
 q END { print NR }
 q NR == 10
 q { print $NF }
 q       {field = $NF }
   END { print field }
 q NF > 4
 q $NF > 4
 q       { nf = nf + NF }
   END { print nf }


ITSW 1407/KRF
                                              27




 q /Beth/ { nlines = nlines + 1 }
  END { print nlines }
 q $1 > max { max = $1; maxline = $0 }
   END         { print max, maxline }
 q NF > 0
 q length($0) > 80
 q { print NF, $0}
 q { print $2, $1 }
 q { temp = $1; $1 = $2; $2 = temp; print }
 q { $2 = ""; print }

                                              28
ITSW 1407/KRF




                                                   14
q  { for (i = NF; i > 0; i = i - 1) printf("%s ", $i)
     printf("\n")
   }
q { sum = 0
     for (i = 1; i <= NF; i = i + 1) sum = sum + $i
     print sum
  }
q         { for (i = 1; i <= NF; i = i + 1) sum = sum + $i }
   END { print sum }


ITSW 1407/KRF
                                                       29




                                                               15

								
To top