„Learn Programming with Perl‟ Bioinformatics Teaching Laboratory, University of Cambridge This course takes students from having no knowledge of how to program, to being able to write useful applications, using the language variously described as “the duct tape of the internet” and “the lingua franca of bioinformatics”. It begins with the fundamental aspects of the language – variables, functions, loops and control flow. It goes on to explore input and output, error handling, and data analysis using pattern matching with regular expressions. During the course, we will focus on using the core aspects of Perl to perform the basic tasks of almost any program – reading data in, processing it, and writing out results. Students will therefore come a way with an application framework that they can easily adapt and extend to suit their own particular needs. Each course topic is introduced and placed in context, and then complete example code is provided to illustrate the subject under discussion, which is gone through line by line and then implemented by the students. For more advanced students related challenges are provided without solutions, although training staff are happy to provide help and guidance. Throughout the course, the emphasis is on making sure that each student understands and appreciates what is going on, and therefore the course handbook provides plenty of room for the students‟ own notes, while the provision of comprehensive answers to class questions is a top priority. “An excellent practical introduction to PERL” – Lisa Mullan, Scientific Training Officer, EMBL-EBI – The course covers the following topics: What is Perl and where can I get it? Here we will examine what we mean by “multi-platform interpreted scripting language”, and look at how we can obtain and install Perl for any particular operating system. Other resources such as CPAN will also be introduced. What is programming? We will address the fundamental problem of writing a program – that computers are obedient, but pedantic and stupid – and look at how we can deal with this. We will also look at the peculiarities of programming in a rapidly evolving field such as bioinformatics. Getting a Perl program to run There are certain practical aspects involved in getting our code to execute, and we will deal with these at this stage. Getting Perl to check our code for us Perl provides us with many helpful tools. Here we look at one which can go through our code and warn us if any problems are detected. Simple mathematical functions We begin our exploration of functions using the mathematical operators familiar to us from school – addition, subtraction, multiplication and division. The ideas of operator precedence and nesting will be introduced. Scalar variables Now we can convert the program we wrote to handle functions so that we can use scalar variables – boxes to put some data in. The importance of descriptive variable names will be emphasised. Using strict Another tool for helping us to write better code. Here we will discuss the idea of variable scope. (Note: Although subroutines are covered later in the course, it is important that students are taught good programming practices and principles from the beginning of their training.) Loops Here we take advantage of one of the most useful things a program can do – perform an action repeatedly. Lists Now we can handle repeat actions, we can move on to use more complex multi-item variables, such as a list or array of items. We will look at how we can store items to an array, and how we can access individual elements of it. Writing to files So far, all our programs have sent their output directly to our computer‟s screen. Here we learn how to use more permanent storage for our results, by opening up a file, writing data to it, and then closing the file once we have finished. Reading from files We can check that we have correctly written out our data by opening up the file and reading its data in, then displaying the data on screen. We will introduce a simple method of handling any errors that may occur as we do this, and introduce the idea of defensive programming. Pattern matching Next we explore one of Perl‟s most powerful capabilities – its ability to define and look for particular patterns in data, using regular expressions and the match operator. User interaction We can now go on to use pattern matching with a switch statement in order to control the actions that our program performs according to user input. Subroutines We introduce the idea of moving frequently called code into subroutines, and look at how we can pass information to these and get information back from them. Reading in sequence data Much of the data that bioinformaticians – and others – need to work with comes in files of a particular format. We begin looking at these by examining how we can extract multiline sequence data from a file so that we can work with it. Looking for motifs in DNA sequence We want to know if our data contains a particular sequence pattern, so we will add code to our program that finds it and displays its location or locations if it exists. Looking for protein sequence We then adapt our program to translate the DNA into RNA, and then look for sections of this that would express a specified protein sequence. Working with multiple sequences in a single file We need to adopt a slightly different approach when we have details of more than one sequence in a single file. For this we use an associative array, or hash – an array of data that instead of using a numerical index, allows us to access data elements by name. Working with sequences in multiple files We then go on to look briefly at how we can work with all the files in a directory. Reading in structure data Protein data is held in a structured flat file format, and we look at how we can design data structures in our programs that we can use to work with this information. To demonstrate what this now allows us to do, we create a program that searches through the comments section of a Swissprot file to find a keyword that appears under a particular comment topic heading. Microarray list data Another type of information that we will learn how to handle describes microarray experiments. We look at how to determine from the file itself how the data it contains is structured, and learn how to deal with 2-dimensional data. Microarray list data Finally, we adapt what we have learnt about grid-type data to perform simple analysis of 2D graphical images. At the end of the course, there will be an opportunity to address any specialist areas of interest that students may have, and provide pointers to sources of further information. „Learning Programming with Perl‟ will be taught by Paul Weston, author of „Bioinformatics Software Engineering: Delivering Effective Applications‟, published by John Wiley and Sons. With 20 years experience in application development, and a background that includes sequence assembly pipelining, distributed computing, and online gaming, he still enjoys writing Perl. He is a Senior Computer Programmer in the Mouse Informatics Group at the Wellcome Trust Sanger Institute.