sed – A Stream Editor
What sed does
Read lines of text from files or stdin
Apply modifications to the lines (delete, substitute text, etc.)
Print the lines to stdout
Tobias Gradl
LSS – System Simulation
What it’s good for
Filter the (often long) output of scientific programs for relevant
Introduction to Unix – 7 information
Re-format files (remove/add linebreaks, etc.)
Add HTML tags to text files
... and much more.
Tobias Gradl (LSS – System Simulation) 2007-03-22 1 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 2 / 25
Application Example Calling sed
Change linetypes of figures in your Master’s thesis
0.14 Input sources
0.12
The commands given with -e are applied to the file file.txt:
0.1 sed -e ’/./!d’ file.txt
Convergence rate
0.14
0.08 sed can be used as a filter, too:
0.12
0.06 cat file.txt | sed -e ’/./!,d’
0.1
0.04 Commands can be read from a file:
Convergence rate
0.02
0.08 sed -f commands.sed file.txt
HHG
0.06 ParExPDE
0
2 4 6 8 10
V cylce0.04
12
number
14 16 18 20
Important options
0.02
-n Don’t print any of the input lines to stdout, unless told to
HHG
0
ParExPDE by the commands
2 4 6 8 10 12 14 16 18 20
V cylce number -r Use extended regular expressions
Works with sed because Postscript images are essentially text files
Tobias Gradl (LSS – System Simulation) 2007-03-22 3 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 4 / 25
How sed works Commands
Simple commands
p Print contents of pattern space (for starters: the current line)
d Delete contents of pattern space and start over with next
line from input
Commands
Input delete empty lines Output Example
Hello, World! $ sed -e ’’ hello.txt
Hello
Cheers! Pattern Space
Toby World
Hello, World! $ sed -e ’d’ hello.txt
$ sed -n -e ’p’ hello.txt
Hello
World
$
Tobias Gradl (LSS – System Simulation) 2007-03-22 5 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 6 / 25
Commands Addresses
Command sequences Purpose
Several commands can be combined to a sed program For every command, you can specify the lines of the input it should be
on the command line: with a semicolon: $ sed -e ’d; p’, applied to.
in a file: every command on a new line.
Syntax
Caveats n command command is only applied to line n.
Subsequent commands use the lines modified by their preceeding n,m command command is applied to all lines from line n to line m.
commands. So, the p in ’d; p’ would print nothing.
n and m can, instead of numbers, also be regular expressions of the form
After some commands (e. g. d), the script is started from the
/regex /.
beginning. So, the p in ’d; p’ is actually never called.
Tobias Gradl (LSS – System Simulation) 2007-03-22 7 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 8 / 25
Addresses The substitute command: s///
Syntax
s/regex /replacement /flags
Example 1
$ sed -n -e ’10,$ p’
If text matches regex, it is replaced by replacement. The flags specify
the exact behaviour.
Prints all lines, starting at line 10. $ is a placeholder for the last line.
Flags
Example 2
g Replace all occurrences of regex in the current line.
$ sed -e ’/^#/ d’
Without this flag, only the first occurrence is replaced.
Deletes all lines starting with a #. p Print the pattern space if a substitution has been applied.
Necessary when sed -n is used.
... and others (refer to documentation).
Tobias Gradl (LSS – System Simulation) 2007-03-22 9 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 10 / 25
The substitute command: s/// Further reading
Example 1
Display all comment lines, but without the starting #.
$ sed -n -e ’s/^#//p’
man sed and info sed: not suited for beginners.
http://legolas.mdh.se/~dat95abs/sed_tutorial.txt: very
Example 2 good tutorial.
An automatic translator? lots of other tutorials on the web.
$ sed -e ’s/thou/you/g’ ~simastue/full.txt
But what happens to the line “In what particular thought to work I know
not”? ⇒ We need regular expressions.
Tobias Gradl (LSS – System Simulation) 2007-03-22 11 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 12 / 25
Regular Expressions Regular Expressions
Regular expressions are...
a means for describing strings in a flexbile way. For example, Remarks
“Hello *world” describes a string contatining the words “Hello” There are different versions of regular expressions: “basic” and
and “world”, separated by an arbitrary amount of spaces. “extended”. They differ mostly in how they interpret (quoted)
relatively easy to evaluate for computers (→ finite state machines, parentheses and other special characters (( vs. \(, etc.). We cover
Chomsky 3 languages) the “extended” version.
Different programs may interpret regular expressions differently →
Application examples manpage, try & error.
Specifying valid input parameters of programs. The following introduction does not claim to be exhaustive. For the
E. g. mode string for chmod (simplified): many little details consult man 1 grep and man 7 regex, for
[ugoa][+-=][rwx](,[ugoa][+-=][rwx])* example.
Extracting certain parts of a program’s output.
Tobias Gradl (LSS – System Simulation) 2007-03-22 13 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 14 / 25
Regex Basics Repetition Operators
Syntax
Syntax
1 single characters are regexs
? match preceding item at most once
2 regexs can be concatenated
* match preceding item zero or more times
3 regexs can be joined with the | operator. Then any of the joined
regexs is matched + match preceding item one or more times
4 Precedence is normally in the above order, but can be overridden with {n} match preceding item n times
parentheses {n,} match preceding item at least n times
{n,m} match preceding item between n and m times
Example
1 a Example
2 Hallo AB+A matches ABBA, but also ABA, ABBBA, etc.
3 Hallo|Hello ((A|D)C){2} matches ACDC, but also ACAC and DCDC
4 H(a|e)llo
Tobias Gradl (LSS – System Simulation) 2007-03-22 15 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 16 / 25
Special Characters and Quoting Placeholders (predefined)
Syntax
. matches any character
ˆ $ match beginning and end of a line
\b matches a word boundary
Regular expressions assign a special meaning to many characters (e. g., ()
{} + *). If you want to use these characters in their literal meaning, you etc.
have to quote them with a backslash.
(But not in every situation. Total freakouts caused by these exceptions Example
have been observed.) Count empty and non-empty lines:
$ tobias@faui00o:~$ egrep ’^$’ ~simastue/full.txt | wc -l
2502
$ tobias@faui00o:~$ egrep ’^.+$’ ~simastue/full.txt | wc -l
5381
$ tobias@faui00o:~$ wc -l ~simastue/full.txt
7883 /home/inf10/simastue/full.txt
Tobias Gradl (LSS – System Simulation) 2007-03-22 17 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 18 / 25
Placeholders (defining your own): [...] Predefined Character Classes
Syntax
Many programs offer predefined character classes for your convenience.
Among the most popular are
Syntax
[:blank:] all space and tab characters
[list ] matches any character in list
[:lower:] all lower case characters
[^list ] matches any character not in list
Note: the brackets are confusing. For using character classes in lists, you
[c1 -c2 ] matches any character between c1 and c2
still need the surrounding []: [[:blank:]]
Example
Example
[a-zA-Z0-9] matches all alphanumeric characters. Stay tuned for an
[[:alnum:]] is equal to [a-zA-Z0-9] in English.
easier version of this. But beware: character classes are not the same for all languages!
$ cat ~tobias/diacritics.txt | LC_ALL=en_EN grep ’[[:alnum:]]’
$ cat ~tobias/diacritics.txt | LC_ALL=de_DE grep ’[[:alnum:]]’
¨¨¨ ¨¨¨ ß
AOU aou
Tobias Gradl (LSS – System Simulation) 2007-03-22 19 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 20 / 25
Back-references Advanced (Counter-?)Examples
Floating point numbers
[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
(found on http://www.regular-expressions.info)
Syntax
\n E-mail addresses
According to the RFC822 standard, and found on
The back-reference \n contains the n-th previously matched http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html:
sub-expression.
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
Example \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
$ echo "Column 1 : Column 2" | sed -re ’s/([^:]+)( : )([^:]+)/\3\2\1/’ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\0
Column 2 : Column 1 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
Tobias Gradl (LSS – System Simulation) 2007-03-22 21 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 22 / 25
E-mail adresses (2) E-mail adresses (3)
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()@,;:\\".\[\] \000-
?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\@,;
\t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\"
)+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\
*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\
|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?: r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\]
\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031 |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\".\[\] \0
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(? .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()@,;:\\".\[\] \000-\031]+(?:(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? (?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\".\[\] \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|"(?:[^\"\r\\]| ^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^() ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\]]))|" ?:(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
)*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\ ?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? \["()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[ ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
Tobias Gradl (LSS – System Simulation) 2007-03-22 23 / 25 Tobias Gradl (LSS – System Simulation) 2007-03-22 24 / 25
E-mail adresses (4)
])+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*) ... and now go find the error!
Tobias Gradl (LSS – System Simulation) 2007-03-22 25 / 25