Unix Tutorial - Data manipulation programs under Unix.
Your central computer account on gul2 uses the Linux operating system.
Linux is one of many variants of Unix. Many basic though very useful
data manipulation programs exist under Linux. We will look at the
following programs:
cat - concatenate files
sort - sort lines of text files
uniq - remove duplicate lines from a sorted file
diff - find differences between two files
echo - display a line of text
sed - a Stream EDitor
tr - translate or delete characters
grep - search for lines in a file matching a pattern
head - output the first part of files
tail - output the last part of files
wc - print the number of bytes, words, and lines in files
cut - remove sections from each line of files
Additionally, we can use "output redirection" (the '>' symbol) to
redirect the output of a program to a file instead of the screen,
"append" (the '>>' symbol) to add(append) to files, and "piping" (the
'|' symbol) to "pipe" the output of one program into another program.
A more detailed description is given in the course web-site under
“Lectures – Unix Tutorial”.
Copy the three data files to your file space
On gul2 you can do this with the command get-ep208-unix .
Starting with files file1.dat file2.dat file3.dat we can perform the
following operations (you can view the contents of a file with the
command “cat filename”):
Join the contents into a single file:
$ cat file1.dat file2.dat file3.dat > file4.dat
the output is sent to the file file4.dat, view it with „cat
file4.dat‟
Sort into alphabetical order:
$ sort file4.dat > file5.dat
output to file5.dat, view it with „cat file5.dat‟
Remove multiple entries:
$ uniq file5.dat > file6.dat
We can combine the last two operations with:
$ sort file4.dat | uniq > file6.dat
(note that there is no file5.dat required)
Look at the difference:
$ diff file5.dat file6.dat
Append the word „dog‟ to the file:
$ echo dog >> file6.dat
Again sort into alphabetical order:
$ sort file6.dat > file7.dat
Replace the word „house‟ with the word „home‟:
$ sed "s/house/home/g" file7.dat > file8.dat
Translate all letters between a and z with their upper case values:
$ cat file8.dat | tr a-z A-Z > file9.dat
Search for the word „home‟ in the file:
$ grep home file9.dat
Search for the word „HOME‟ in the file:
$ grep HOME file9.dat
Search for the word „home‟ ignoring case:
$ grep -i home file9.dat
And again giving the line number of any occurrences:
$ grep -i -n home file9.dat
Display the first 8 lines:
$ head -n8 file9.dat
Display the last 5 lines:
$ tail -n5 file9.dat
Count the number of lines, words and characters in the file:
$ wc file9.dat
Display the first three characters of each line:
$ cut -b1-3 file9.dat
Display the seconds to fourth character of each line:
$ cut -b2-4 file9.dat
Lab Exercise - Unix
Download the file lep.dat and perform the following analysis.
1. Use wc to determine how many particles are there in the list.
2. Use cut, sort and uniq to form a unique list of particle species;
how many species of particles are there?
3. Use grep and wc to determine how many particles there are of each
species.
4. Use cut, sort, head, and tail to determine the maximum and minimum
particle momenta. Which particles are they?
5. Use piping; i.e. dont waste time creating intermediate files,
repeat
the exercise (except part 3) with the file lep2.dat; the file
contains many more particles.