Creating an Appropriate Programming Language for Student Compiler Project

Document Sample
Creating an Appropriate Programming Language for Student Compiler Project Powered By Docstoc
					                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                       Vol. 9, No. 6, 2011

Creating an Appropriate Programming Language for
             Student Compiler Project
                                                         Elinda Kajo Mece
                                               Department of Informatics Engineering
                                                 Polytechnic University of Tirana
                                                         Tirana, Albania
                                                         ekajo@fti.edu.al


Abstract— Finding an appropriate and simple source language, to           Compiler frameworks are widely used as a simple tool for
be used in implementing student compiler project, is one of               implementing new languages based on existing ones. The
challenges, especially in cases when the students are not familiar        complexity begins to increase if the differences between the
with high level programming languages. This paper presents a              existing language and the new one become significant [4].
new programming language intended principally for beginners
and didactic purposes in the course of compiler design. SimJ, a
                                                                          That is why we used Java as a base language for SimJ. For this
reduced form of the Java programming language, is designed for            purpose we have chosen Polyglot [4,5] as a compiler
a simple and faster programming. More readable code, no                   framework for creating compiler for languages similar to Java.
complexity, and basic functionality are the primary goals of
SimJ. The language includes the most important functions and                    II.    THE POLYGLOT FRAMEWORK
data structures needed for creating simple programs found
generally in beginners programming text books. The Polyglot
compiler framework is used for the implementation of SimJ.                Polyglot is an extensible Java compiler toolkit designed for
Keywords- compiler design; new programming language; polyglot
                                                                          experimentation with new language extensions. The base
framework                                                                 polyglot compiler, jlc ("Java language compiler"), is a mostly-
                                                                          complete Java front end [1]; that is, it parses [1,2] and
       I.    INTRODUCTION                                                 performs semantic checking on Java source code. The
A compiler course takes a significant place in computer                   compiler outputs Java source code. Thus, the base compiler
science curricula. This course is always associated with an               implements the identity translation. Language extensions are
implementing project. Being a multidimensional course, it                 implemented on top of the base compiler by extending the
requires the students to be familiar with high level                      concrete and abstract syntax and the type system [4].
programming languages among the other things. The first                   After type checking the language extension, the abstract
impact with these high level languages is almost always                   syntax tree (AST) [1,14] is translated into a Java AST and the
considered confusing because of their complexity. This                    existing code is output into a Java source file which can then
becomes more obvious in object-oriented languages like Java               be compiled with javac.
[8]. Object-orientation [15] hinders to learn Java step-by-step           Polyglot supports the easy creation of compilers for languages
from basic principles, because right from the beginning the               similar to Java. The Polyglot framework is useful for domain-
learner has to define at least one public class with a method             specific languages, exploration of language design, and for
with signature public static void main(String[] args). So the             simplified versions of Java for pedagogical use. As mentioned
teacher has two choices here: trying to explain most of the               above, the last part is where we intend to focus on this paper.
concepts involved (classes, methods, types, arrays, etc.) or just         A Polyglot extension is a source-to-source compiler that
provide the surrounding program text and let the learner add              accepts a program written in a language extension and
code to the body of the method main.                                      translates it to Java source code [4,5]. It also may invoke a
SimJ is a simple, Java based programming language. It is                  Java compiler such as javac to convert its output to bytecode
conceived and designed to ease teaching of basic                          [13]. A SimJ oriented view of this process, including the
programming to beginners. We believe that they should learn               eventual compilation to Java bytecode, is shown in figure 1.
easily the basic concepts, before they are exposed to more
complex programming issues. It is much simpler for a new
programmer to write println ("Hello world) instead of writing
a confusing line like System.out.println ("Hello world"). This
simple but concise example shows the importance of the first
impact with programming languages. The role of SimJ is to                         Figure 1. The Polyglot Compiler Framework Architecture
make this impact less “painful”.




                                                                     36                                http://sites.google.com/site/ijcsis/
                                                                                                       ISSN 1947-5500
                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                     Vol. 9, No. 6, 2011
                                                                         public class A {
The first step in compilation is parsing input source code to              public static void main(String[] args) {
produce an AST. Polyglot includes an extensible parser                               try {
generator, PPG [5], which allows the implementer to define                              BufferedReader reader = new BufferedReader(
the syntax of the language extension (SimJ in our case) as a set                   new InputStreamReader (System.in));
of changes to the base grammar for Java [7]. The extended                               System.out.print(“Your name:” );
AST may contain new kinds of nodes either to represent                                  String name = reader.readLine();
syntax added to the base language or to record new                                      System.out.print(“\nHello, ” + name + “!”);
information in the AST.                                                              }
The core of the compilation process is a series of compilation                catch (IOException ioexeption) {
passes applied to the abstract syntax tree. Both semantic                               System.out.println(ioexeption);
analysis and translation [1] to Java may comprise several such                       }
passes. The pass scheduler selects passes to run over the AST              }
of a single source file, in an order defined by the extension,           }
ensuring that dependencies between source files are not
violated. Each compilation pass, if successful, rewrites the             class A {
AST, producing a new AST that is the input to the next pass.                main() {
A language extension may modify the base language pass                         print(“Your name:”);
schedule by adding, replacing, reordering, or removing                         String name = readLine();
compiler passes. The rewriting process is entirely functional;                 print(“\nHello, ” + name + “!”);
compilation passes do not destructively modify the AST.                     }
Compilation passes do their work using objects that define               }
important characteristics of the source and target languages. A
type system object acts as a factory for objects representing
types and related constructs such as method signatures[4,5].                          Figure 2. Example code writen in Java and SimJ
The type system object also provides some type checking
functionality. A node factory [4] constructs AST nodes for its          The simplified versions of the printing methods are quite
extension. In extensions that rely on an intermediate language,         obvious, since they are almost always used in simple
multiple type systems and node factories may be used during             programs. It is also important to mention that, compared to
compilation. After all compilation passes complete, the usual           Java, the structure of the program is unchanged thus
result is a Java AST. A Java compiler such as javac is invoked          preserving its object-orientation character.
to compile the Java code to bytecode.                                   Another important goal of this language is to help teaching of
                                                                        compiler design [1].
          III.    SIMJ PROGRAMMING LANGUAGE                             SimJ language specification [3,10,11] shown in figure 3 is
SimJ (stands for Simple Java) is a simplified version of the            very simple, short, equipped with the fundamental and mostly
Java programming language conceived especially for                      used parts of a programming language at the beginning level
beginners. The language is very simple, easy to learn and is            [9,7]. Related work (i.e. MiniJava [1]) shows that simplicity is
very similar to Java. Previous work has been done in this field         the primary characteristic of these languages.
(i.e. the J0 programming language [5] but these languages are           As mentioned previously we think that similarities with Java
quite different compared to Java syntax [7]. We think that              are important but also they should not lose their identity. In
similarity with Java is very important in order to allow the            MiniJava for example the System.out. println(), that is the
programmer to switch to Java without any problems regarding             same as in Java, is defined to do the printing but the meaning
the syntax when he thinks is ready to explore the full potential        of System.out in this language cannot be found. With SimJ we
and the advanced features of it.                                        try to address these problems by creating a simple but well
Figure 2 shows an example of the same code written in Java              defined language that syntactically talking is not a reduced
and in SimJ. This example shows, as mentioned above, that               exact copy of the mother language but has its own identity.
the code in SimJ is clearly more readable than the one in Java.
Generally, programming courses and textbooks for beginners
include many programs that during their execution require or
the input of the user. In Java this part it’s definitely neither         Program ::= MainClass ( Class )*
                                                                         MainClass ::= "class" Identifier "{" "main" "(" ")" "{" Statement "}" "}"
simple nor easy to implement at the beginning level. We                  Class      ::= "class" Identifier "{" (Variable)* (Method)* "}"
address this problem by removing the complex part and                    Variable ::= Type Identifier ";"
leaving only the “understandable” one (i.e. readLine()).                 Method ::= Type Identifier "(" (Type Identifier ("," Type Identifier)*)?
                                                                         ")" "{" (Variable)* (Statement)* "return" Expression ";" "}"
                                                                         Type ::= "boolean"
                                                                         | "int"
                                                                         | "char"
                                                                         | "string"




                                                                   37                                    http://sites.google.com/site/ijcsis/
                                                                                                         ISSN 1947-5500
                                                                                 (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                           Vol. 9, No. 6, 2011
 | "int" "[" "]"                                                                                IV.       IMPLEMENTATION
 | Identifier

 Statement ::= "{" ( Statement )* "}"
                                                                                           For the implementation of SimJ we have used Polyglot as a
 | "if" "(" Expression ")" Statement "else" Statement                                      framework that improves and simplifies compiler design for
 | "while" "(" Expression ")" Statement                                                    languages similar to Java. This process consists in creating a
 | "for" "(" Expression ";" Expression ";" Expression ")" Statement                        new language extension. Extensions (in our case SimJ) usually
 | "switch" "(" Expression ")" "{" ("case" Expression ":"
 Statement "break" ";")* "default" ":" Statement "}"                                       have the following sub packages [5]:
 | "print" "(" Expression ")" ";"
 | "println" "(" Expression ")" ";"                                                            •      ext.simj.ast – AST nodes specific to SimJ
 | "readLine" "(" ")" ";"
 | "readInt" "(" ")" ";"
                                                                                                      language.
 | Identifier "=" Expression ";"                                                               •      ext.simj.extension – New extension and
 | Identifier "[" Expression "]" "=" Expression ";"                                                   delegate objects specific to SimJ.
 Expression ::= Expression ( "||" | "&&" | "<" | ">" | "!=" | "==" | "+" | "-"
 | "*" | "/" ) Expression
                                                                                               •      ext.simj.types – Type objects and typing
 | Expression "[" Expression "]"                                                                      judgments specific to SimJ.
 |Expression "."Identifier"("(Expression("," Expression)*)?")"                                 •      ext.simj.visit – Visitors specific to SimJ.
 | <INTEGER>
 | <STRING>                                                                                    •      ext.simj.parse – The parser and lexer for the
 | <CHARACTER>                                                                                        SimJ language.
 | "true"
 | "false"
                                                                                           In    addition,    our     extension    defines   the     class
 | Identifier
 | "this"                                                                                  ext.simj.ExtensionInfo [5], which contains the
 | "new" "int" "[" Expression "]"                                                          objects which define how the language is to be parsed and
 | "new" Identifier "(" ")"                                                                type checked. There is also a class ext.simj.Version
 | "!" Expression
 | "(" Expression ")"                                                                      defined [5], which specifies the version number of SimJ. The
 Identifier ::= <IDENTIFIER>                                                               Version class is used as a check when extracting extension-
                                                                                           specific type information from .class files.
Figure 3: SimJ language specification                                                      The design process of SimJ includes the following tasks [5]:
This is an important point that helps reducing possible
ambiguities and makes the language more understandable.                                        •      Syntactic differences between SimJ and Java are
SimJ includes the basic building blocks of a programming                                              defined based on the Java grammar found in polyglot/
language. From this point of view it is quite similar with Java                                       ext/jl/parse/java12.cup.
[8,7]. We have implemented the basic primitive data types                                      •      Any new AST nodes that SimJ requires are defined
(figure 2):                                                                                           based on the existing Java nodes found in polyglot.ast
                                                                                                      (interfaces) and polyglot.ext.jl.ast (implementations).
     •      boolean – true or false                                                            •      Semantic differences between SimJ and Java are
     •      int – integers                                                                            defined. The Polyglot base compiler (jlc) implements
     •      char – characters                                                                         most of the static semantic of Java as defined in the
                                                                                                      Java Language Specification [7].
     •      string – sequence of characters (string in SimJ for
            simplicity is considered a primitive data type)                                    •      Translation from SimJ to Java is defined. The
                                                                                                      translation produces a legal Java program that can be
     •      int[] – array of integers
                                                                                                      compiled by javac.
Mostly used control flow statements [9,8] are implemented in
                                                                                               We implement SimJ by creating a Polyglot extension with
SimJ (figure 2). Their syntax is the same as in Java
                                                                                           the characteristics described above. Implementation follows
considering that they have no redundant complexity to be
                                                                                           these steps [5]:
removed:
                                                                                               •      build.xml is modified and a target for SimJ is
     •      if else
                                                                                                      added. This is done based on the skeleton extension
     •      for
                                                                                                      found in polyglot/ext/skel. Running the
     •      while
                                                                                                      customization script polyglot/ext/newext
     •      switch
                                                                                                      copies the skeleton to polyglot/ext/simj, and
Principal operators [9,8] are also present in SimJ. These                                             substitutes our languages name at all the appropriate
include: addition, subtraction, multiplication, division, logical                                     places in the skeleton.
and, logical or, logical not, smaller than, greater than, not                                  •      A new parser is implemented using PPG. This is done
equal, equal.                                                                                         by modifying




                                                                                      38                                http://sites.google.com/site/ijcsis/
                                                                                                                        ISSN 1947-5500
                                                                           (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                     Vol. 9, No. 6, 2011
           polyglot/ext/simj/parse/simj.ppg using                                        [7]    Gosling, J., Joy, B., Steele, G., Bracha, G. (2005). The Java Language
                                                                                                Specification (3rd ed.). Addison Wesley.
           the SimJ syntax.
                                                                                         [8]    Arnold, K., Gosling, J., Holmes, D. (2005). The Java Programming
      •    The required new AST nodes are implemented. The                                      Language (4th ed.). Addison Wesley Professional.
           node factory                                                                  [9]    Kernighan, B.W., Ritchie, D.M. (1988). The C Programming Language
           polyglot/ext/simj/ast/SimJNodeFactor                                                 (2nd ed.). Prentice Hall.
           y_c.java is modified in order to produce these                                [10]   Clinger, W., Rees, J. (2001). Report on the Algorithmic Language
           nodes.                                                                               Scheme. Retrieved January 24, 2007, from http://www-swiss.ai.mit.edu/
                                                                                                ~jaffer/r4rs_toc.html.
      •    Semantic checking for SimJ is implemented based on
                                                                                         [11]   Krishnamurthi, Sh. (2006). Programming Languages: Application and
           its rules.                                                                           Interpretation.      Retrieved      January      28,     2007,      from
      •    The translation from SimJ to Java is implemented                                     http://www.cs.brown.edu/~sk/Publications/Books/ ProgLangs/.
           based on the translation defined above. This is                               [12]   Cornell University, Department of Computer Science. (2003). J0: A Java
           implemented as a visitor pass that rewrites the AST                                  Extension for Beginning (and Advanced) programmers. Retrieved
                                                                                                January 20, 2007, from http:// www.cs.cornell.edu/Projects/j0/.
           into an AST representing a legal Java program.
                                                                                         [13]   Lindholm, T., Yellin, F. (1999). The Java Virtual Machine Specification
                                                                                                (2nd ed.). Addison Wesley.
                                V.       CONCLUSIONS                                     [14]   Jones, J. (2003). Abstract Syntax Tree Implementation Idioms. Retrieved
                                                                                                February 6, 2007, from http://jerry.cs.uiuc.edu/~plop/plop2003/Papers/.
Our motivation for creating SimJ was to provide a simple,                                [15]   Ambler, S.J. (2006). Introduction to Object-Orientation and UML.
understandable and easy to learn programming language                                           Retrieved             February          11,          2007,          from
                                                                                                http://www.agiledata.org/essays/objectOrientation101.html.
similar to Java that improves the learning of programming
                                                                                         [16]   O’Docherty, M. (2005). Object-Oriented Analysis and Design:
basic structures and being a source language exemplar for                                       Understanding System Development with UML 2.0. John Wiley & Sons
implementing student compiler project. We discovered that the                            [17]   Graver, J.O. (1992). The Evolution of an Object-Oriented Compiler
existing approaches did not fully address the problem of a                                      Framework.          Retrieved      January      30,      2007,      from
simplified Java like structured language and that is not only a                                 http://cs.ubc.ca/rr/proceedings/spe91-95/spe/vol22/ issue7/spe767jg.pdf
reduced copy of it. Our language is simple but improves
existing solutions by merging their advantages and trying to
avoid the weak points.
Using Polyglot Framework to build the compiler we conclude
that it is an effective and easy way to produce compilers for
Java-like languages like SimJ. It is simple and has a well
defined structure thus offering the possibility to generate a
base skeleton for new language extensions on which we can
add the desired specifications.
Our language, SimJ is a well structured simplified version of
the Java programming language that is not only a reduced
copy of it. SimJ could be used by beginners that want to learn
Java but don’t know anything about object oriented
programming. It is also a good choice for learning compiler
design because of its well defined and easy to implement
structure.

                                 REFERENCES


[1]   Appel, A.W , Palsberg, J. (2002). Modern Compiler         Implementation
      in Java (2nd ed.). Cambridge University Press.
[2]   Metsker,S. J. (2001). Building Parsers with Java. Addison Wesley.
[3]   Slonneger, K., Kurtz, B.L. (1995). Formal Syntax and Semantics of
      Programming Languages, A Laboratory Based Approach. Addison
      Wesley.K. Elissa, “Title of paper if known,” unpublished.
[4]   Mystrom, N., Clarkson, M.R., Myers, A.C. (2003). Polyglot: An
      Extensible Compiler Framework for Java. Retrieved January 20, 2007,
      from                                            http://techreports.library.
      cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR2002-1883.
[5]   Cornell University, Department of Computer Science. (2003). How to
      Use      Polyglot.     Retrieved      January     20,      2007,      from
      http://www.cs.cornell.edu/projects/polyglot/.
[6]   Cornell University, Department of Computer Science. (2003).. PPG: A
      Parser Generator for Extensible grammars. Retrieved January 20, 2007,
      http://www.cs. cornell.edu/projects/polyglot/.




                                                                                    39                                     http://sites.google.com/site/ijcsis/
                                                                                                                           ISSN 1947-5500