sed awk perl short course

Document Sample
sed awk perl short course Powered By Docstoc
					grep, sed & awk
                 Overview
   Using 'unix' to do useful stuff. Simple
    commands strung together.
   Mainly exercise based – There are a lot
    of exercises.
   All this is on:

    holmes.cancres.nottingham.ac.uk/UNIX
                 Unix Pipelines
   $ cat somefile
       displays to output

   $cat somefile | grep 'fred'
       Finds lines containing 'fred' in file

   Take file - process it – dump information
       To another program (use pipe |)
       To a file (Use redirect >)
            Unix Pipelines
               Arg1   125.6     124.9
               His2   128.3     130.1
               Leu3   115.2     116.2

   cat myfile | awk '{print $1,$2-$3}'

                  Arg1   0.7
                  His2   -1.8
                  Leu3   -1.0
                  Why?
   You can do this in excel?
   Commands are:
   More systematic: define actions
    Then carry out analysis.
   Scales well: Can apply to many files
    (Scripts)
   Keeps original data
                  AIMS
   Prove you don't have to do repetitive
    tasks!
   Script based analysis is more
    scientific than ad hoc playing.
Tools for analysis

grep, sed & awk
                  Introduction

   What are they for?
       Extracting or modifying data in (usually) text
        files
       Do in seconds tedious jobs that would take
        hours with a conventional text editor/excel.
   How do they work?
       1. Read in line from a file
       2. Operate on the line
       3. Output any results
       4. Goto 1.
                 grep

The grep command searches a file for
  a pattern. The following command
$ grep 'cat' grep1.dat
searches the file grep1.dat in the
  current directory for the text string
  cat. Outputs all lines containing this
  string.
The string can be a quite complicated
  pattern – a regular expression
      Regular expressions

To match both dog and Dog:
$ grep ‘[Dd]og' grep1.dat
But:
$ grep ‘[Cc]at' grep1.dat
Will also pick up Cattle. Use:
$ grep ‘[Cc]at$' grep1.dat
‘$’ matches end of line, (‘^’ matches start)
Regular expressions can be very
  complex!
        Options for grep
$ grep –v ‘[Dd]og' grep1.dat
  Prints out all lines not matching the
   pattern
$ grep –n ‘[Dd]og' grep1.dat
  Gives the numbers of the lines that
   match
$ grep –l ‘Horse’ grep*.dat
  Searches for Horse in grep1.dat and
   grep2.dat. Lists files containing the
   pattern
                 sed & awk

   A couple of crude, yet powerful Unix
    utilities
   sed
       Stream Editor; performs regular
        expression matches and edits
        accordingly
   awk
       Aho, Weinberger & Kernighan; file
        element extraction and formatting
            sed Concepts

   Stream editor; replace, change, etc
   Uses ‘grep’ like pattern matching
   Line by line (Record by record)
   Default output to screen
   Never alters original data set
      More sed Concepts
 Implicitly global:
$ sed ‘s/Mary/Fred/’ sed1.dat
 or you specify the ‘address’ you want

  changed:
$ sed ‘1s/Mary/Fred/’sed1.dat
 Multiple (ranges) addresses are comma

  separated
    1,7function affects lines 1 through 7
 If an address is a regular expression,
  enclose it in slashes; /reg-exp/function
$ sed ‘/lamb/s/little/big/’
  sed1.dat
    Common sed Commands

   Delete
     d
         1d
         /reg-exp/d
   Substitution
     s
         s/reg-exp/new-text/
         s/reg-exp/new-text/g
   Putting sed commands in
              files

Don’t have to give sed instructions on
  command line:
$ sed –f mary.sed sed1.dat
Where mary.sed contains:
       2d
       s/Mary/Fred/
If add line:
       #!/bin/sed –f
To top and ‘chmod u+x mary.sed’, then:
$ ./mary.sed sed1.dat
                 sed Summary
   Sed allows you to automate editing
    tasks
       Useful in shell scripts
   sed is global by nature
       But with proper ‘address matching’ you can
        edit just what you want, how you want
   Your original data set is left
    untouched
       Experiment; see what works
       Tweak the code until you get what you want
       Redirect the output to another file
                    awk Concepts
   Reads file line by line (record by
    record)
        Program is executed upon each record
   Splits each record into fields
   Each individual field is $1, $2, $3, $4….
   E.g. in awk1.dat:
•   -rw-rw----   1 charlie   user   22631 Oct   4 15:30 awk.htm
•      $1        $2 $3         $4     $5   $6   $7 $8     $9

   A full record is referred to as $0
   Definitions of fields can be altered
   Default output is the screen; redirect to
    file or pipeline
     Simple awk Examples

$ awk ‘{print $9,$3,$8}’ awk1.dat
    Prints the ninth, third and eighth

     elements of each record of awk1.dat
     (Apply action to all lines)
$ awk ‘/Fred/ {print $9,$3,$8}’
  awk1.dat
$ awk '$1>6 && $4==”FRED” {print $1}'
  awk1.dat
    Restricts the action to lines that match

     the pattern
        More awk Concepts

   Maintains a number of internal
    system variables
   NF: number of fields in current
    record
    $ awk ‘NF>3 {print NF,$0}’ awk1.dat
    NR: number of the current record
    $ awk ‘NR==3 {print $0}’ awk1.dat
    Basic awk Usage & Syntax
   Same ideas as for sed, 'standard output' is to the screen;

   Apply awk commands to an existing text file;
    $ awk ‘{commands}’ file.txt


   Apply awk to a pipeline;
    $ cat file.txt | awk ‘{commands}’ | grep ‘stuff’ >
      output.txt


   Use an awk file with the -f option
    $ awk -f file.awk file.txt
   Add the ‘hash bang’ wrapper and the -f to the awk file.
      #!/bin/awk -f
    $ filter.awk file.txt
  A Simple awk Script

#!/bin/awk -f
#awk1.awk my first awk script
{
print $9,$3,$8,$2+$4
}
                  Comments #

   Like any programming language,
    comment lines are important
       sha’ bang         #!
           Special comment line; typical of UNIX shell
            scripts
              #!/bin/korn #!/bin/csh #!/bin/sh #!/bin/perl
       comment           #
           Use them to document your code
              # filter.awk
              # filters the text file by…….
      Special patterns
Awk has two special patterns:
BEGIN: matches an action done before
the first record
END: matches an action done after the
last record:
#!/bin/awk -f
#awk2.awk
BEGIN { OFS=“\t”}
          {
          print $9,$3,$8
          }
END       { print “END”}
        Other awk Features

   Awk scripts can look a lot like C
    programs:
   Conditionals
      if(condition){action}
        if(NF != 10){
          print NR”.” $0}
      for(set_counter;test_counter;increment_
      counter)   {action}
        for(i=0; i<= 12;i++)
         {print i}
      while (condition) {action}
    Mathematical Functions &
         Calculations

   int()
   log()
   trig functions
   Calculations
      print $1,$2,($3 * 10),$4
      print int($3 / 10)
      Variables: Equals vs Is
             Equal To

   Assigning a variable
     Use = to assign a variable; use the ==
      to check its value
     variable = expression
       counter = 0
     variable == value
       if(counter == 7){action…}
     if (counter = 7){action}
     WILL NOT WORK
      Arrays in awk

BEGIN {counter = 0}
{
counter = (counter + 1)
array1[counter] = $1
array2[counter] = $2
array3[counter] = $3
}
               Print vs. Printf
   Typically a print statement is all you
    need
       especially if you use ‘OFS=“\t”, or
        similar
       it also has a ‘built-in’ new-line
   printf is just like in C

        END{
        for (i = 1;i <= counter;i++)
        printf(“%4s %4d
        %3s\n”,array1[i],array2[i],array3[i])
        }
              awk Summary
   Fast and powerful: 5 awk will do a
    lot of work in a few lines of code or
    straight from the command line
   Versatile enough to handle fairly
    large and robust tasks
       Conditionals
       Arrays
       Print, Printf
       Etc...