Automatic Derivation of Semantic Properties in .NET by hkksew3563rd

VIEWS: 18 PAGES: 177

									      Twente Research and Education on Software Engineering,
                   Department of Computer Science,
Faculty of Electrical Engineering, Mathematics and Computer Science,
                          University of Twente

      Automatic Derivation of
    Semantic Properties in .NET

            M.D.W. van Oudheusden

                     Enschede, August 28, 2006

                                                       graduation committee
                                                    prof. dr. ir. M. Aksit
                                                 dr. ir. L.M.J. Bergmans
                                                           ir. P.E.A. Durr

The Compose .NET project offers aspect-oriented programming for Microsoft .NET languages
through the composition filters model. Most of the information used by Compose (and other
Aspect Oriented Programming implementations) is based on structural and syntactical prop-
erties. For instance, the join point selection is largely based on naming conventions and hier-
archical structures. Furthermore, the analysis tasks of Compose would benefit when there
is more semantical data available. Like the behavior of the reified message used by the meta
filters or the side effects generated by the use of aspects. Fine grained join points, join points at
instruction level, are currently not possible because the information about the inner workings
of a function is not present.
The automatic derivation of semantic properties is an attempt to deduce the behavior of a
.NET application with the use of existing tools. Not only Compose can benefit from this, but
also other applications like finding design patterns in the source code, reverse engineer design
documentation, generating pre- and postconditions, verifying software contracts, checking be-
havioral subtyping, or any other type of statical analysis.
The Semantic Analyzer is the implementation of a static source code analyzer, which converts
instructions into semantic actions and store these actions inside a metamodel, a high level rep-
resentation of the original code. In order to do this, we first have to examine the different kinds
of semantic programming constructs and how these semantics are represented in the target
language, namely the Microsoft Intermediate Language.
Different tools are available to read and parse the Intermediate Language and Microsoft
Phoenix is selected to be used as the main IL reader and converter, called a Semantic Extractor.
The extractors convert an object-oriented language to the language independent metamodel.
In order to search in this model, a query mechanism based on native queries was devel-
oped. Native queries are a type-safe, compile-time checked, and object-oriented way to express
queries directly as Java or C# methods.
A number of tools, such as plugins, were created to make use of the native queries to query
the model to solve some of the semantical information needs. The automatic extraction of the
reified message behavior and the resource usage are some of the functionalities now added to

Compose by using the Semantic Analyzer.
Automatically deriving semantical properties by analyzing the source code and the semantics
of the source language can certainly not solve all the problems. Some intent is not present in
the code but in the underlying system and getting a clear behavioral description of a function
is not possible. However, the Semantic Analyzer offers developers an extensive metamodel with
basic semantic related actions, control flow information, and operand data to reason about the
possible intended behavior of the code.

Research and development of the Semantic Analyzer and the actual writing of this thesis was an
interesting, but time consuming process. Of course there are some people I would like to thank
for helping me during this time.
First, my graduation committee, Lodewijk Bergmans and Pascal Durr, for their insights in AOP
in general and Compose in particular. Their remarks, suggestions, and questions helped me
a great deal in creating this work.
Also the other Compose developers for their work on the whole project and the tips and tricks
I received from them when hunting down bugs and trying to get L TEXto do what I wanted.

Last, but certainly not least, I would like to thank my parents for making this all happen and
always supporting me in my decisions.

                                                                           Reading Guide

A short guide is presented here to help you in reading this thesis.
The first three chapters are common chapters written by the Compose developers. Chapter 1
presents a general introduction to Aspect Oriented Software Development and the evolution
of programming languages. The next chapter, chapter 2, provides more information about
Compose , which is an implementation of the composition filters approach. If you are unfa-
miliar with either AOSD or the AOP solution Compose , then read the first two chapters.
Chapter 3 describes the .NET Framework, the language platform used in the implementation.
Chapter 6 will present more details about the language, so for background information of the
.NET Framework, read chapter 3 first.
The reasons why this assignment was carried out are discussed in the motivation chapter, chap-
ter 4. To learn more about semantics, take a look at chapter 5. How semantic programming
constructions are represented in the .NET Intermediate Language and how this language can
be accessed is described in chapter 6.
The actual design of the Semantic Analyzer is described in chapter 7 and chapter 8 will explain
how to use the analyzer and provides some practical examples.
Finally, the evaluation and conclusions are presented in chapter 9, as are related and future
Code examples in the C# language are created in version 2.0 of the Microsoft .NET Framework
unless stated otherwise. The algorithms use a pseudo C# language for their representation.
Class diagrams were created with the Class Designer of Visual Studio 2005.
More information about Compose and the Semantic Analyzer can be found at the Compose
project website1 . The source code for the Semantic Analyzer is available in the CVS archives of
SourceForge2 .



Abstract                                                                                                                                      i

Acknowledgements                                                                                                                            iii

Reading Guide                                                                                                                                v

List of Figures                                                                                                                              xi

List of Tables                                                                                                                             xiii

List of Listings                                                                                                                            xv

List of Algorithms                                                                                                                         xvii

Nomenclature                                                                                                                               xix

1   Introduction to AOSD                                                                                                                     1
    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     1
    1.2 Traditional Approach . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     3
    1.3 AOP Approach . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     4
         1.3.1 AOP Composition . . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5
         1.3.2 Aspect Weaving . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5
       Source Code Weaving . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     5
       Intermediate Language Weaving              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     6
       Adapting the Virtual Machine . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     7
    1.4 AOP Solutions . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     7
         1.4.1 AspectJ Approach . . . . . . . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     8
         1.4.2 Hyperspaces Approach . . . . . . . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     9
         1.4.3 Composition Filters . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    10

2   Compose                                                                                                                                 12
    2.1 Evolution of Composition Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                            12

    2.2   Composition Filters in Compose          . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
    2.3   Demonstrating Example . . . . . . . . . . . . . . . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
          2.3.1 Initial Object-Oriented Design . . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
          2.3.2 Completing the Pacman Example . . . . . . .                                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
        Implementation of Scoring . . . . . .                                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   17
        Implementation of Dynamic Strategy                                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
    2.4   Compose Architecture . . . . . . . . . . . . . . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
          2.4.1 Integrated Development Environment . . . . .                                   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
          2.4.2 Compile Time . . . . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
          2.4.3 Adaptation . . . . . . . . . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
          2.4.4 Runtime . . . . . . . . . . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   21
    2.5   Platforms . . . . . . . . . . . . . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
          2.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
          2.5.2 C . . . . . . . . . . . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
          2.5.3 .NET . . . . . . . . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
    2.6   Features Specific to Compose         . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22

3   Introduction to the .NET Framework                                                                                                                         24
    3.1 Introduction . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   24
    3.2 Architecture of the .NET Framework             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   25
         3.2.1 Version 2.0 of .NET . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
    3.3 Common Language Runtime . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   27
         3.3.1 Java VM vs .NET CLR . . . .             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   28
    3.4 Common Language Infrastructure .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   28
    3.5 Framework Class Library . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29
    3.6 Common Intermediate Language . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   31

4   Motivation                                                                                                                                                 34
    4.1 Current State of Compose /.NET             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
        4.1.1 Selecting Match Points . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   34
        4.1.2 Program Analysis . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   36
        4.1.3 Fine Grained Join Points . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   37
    4.2 Providing Semantical Information           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   37
    4.3 General Design Goals . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   37

5   Semantics                                                                                                                                                  39
    5.1 What is Semantics . . . . . . . . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   39
    5.2 Semantics of Software . . . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   40
        5.2.1 Static and Dynamic Program Analysis                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   40
        5.2.2 Software Models . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   41
    5.3 Semantical Statements . . . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   41
        5.3.1 Value Assignment . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   41
        5.3.2 Comparison of Values . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
        5.3.3 Branching Statements . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   42
        5.3.4 Method Calling . . . . . . . . . . . . .                     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   43
        5.3.5 Exception Handling . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   43
        5.3.6 Instantiation . . . . . . . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
        5.3.7 Type Checking . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44
          5.3.8 Data Conversion       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   45
    5.4   Program Semantics . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   45
          5.4.1 Slicing . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   46
          5.4.2 Method Example        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   46

6   Analyzing the Intermediate Language                                                                                                                                   49
    6.1 Inside the IL . . . . . . . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   49
        6.1.1 Modules . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   49
        6.1.2 Metadata . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   51
        6.1.3 Assemblies . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   51
        6.1.4 Classes . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   52
        6.1.5 Fields . . . . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   53
        6.1.6 Methods . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   53
        6.1.7 Method Body . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
        6.1.8 IL Instructions . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   55
       Flow Control . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   56
       Arithmetical Instructions                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   57
       Loading and Storing . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   58
       Method Calling . . . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   59
       Exception Handling . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   59
        6.1.9 Normalization . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   60
        6.1.10 Custom Attributes . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   61
    6.2 Access the IL . . . . . . . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   62
        6.2.1 How to Read IL . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   62
        6.2.2 Reflection . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   64
        6.2.3 Mono Cecil . . . . . . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   66
        6.2.4 PostSharp . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   67
        6.2.5 RAIL . . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   68
        6.2.6 Microsoft Phoenix . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   68

7   Design and Implementation                                                                                                                                             73
    7.1 General Design . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   73
        7.1.1 Introduction . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   73
        7.1.2 Design Limitations . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   74
        7.1.3 Control Flow . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   76
        7.1.4 Building Blocks . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   77
      Semantic Extractor . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   77
      Semantic Metamodel                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   77
      Semantic Database . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   78
      Plugins . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   78
        7.1.5 Guidelines . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   78
    7.2 Semantic Model . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   79
        7.2.1 Overall Structure . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   79
        7.2.2 From Instructions to Actions .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   80
        7.2.3 Dealing with Operands . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   82
        7.2.4 Type Information . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   83
        7.2.5 Model Layout . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   83
      SemanticContainer .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   83
        SemanticClass . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    84
        SemanticOperation . . . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    84
        SemanticOperand and Subclasses                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    89
        SemanticBlock . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    90
        SemanticAction . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    90
        SemanticType . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    92
        SemanticAttributes . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    93
          7.2.6 Flow graphs . . . . . . . . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    93
    7.3   Extracting Semantics . . . . . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    99
          7.3.1 Semantic Extractor Class . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    99
          7.3.2 Mono Cecil Provider . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   100
          7.3.3 PostSharp Provider . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   101
          7.3.4 RAIL Provider . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   102
          7.3.5 Microsoft Phoenix Provider . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   102
    7.4   Querying the Model . . . . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   103
          7.4.1 Semantic Database . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   105
          7.4.2 What to Retrieve . . . . . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   105
          7.4.3 Query Options . . . . . . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   106
        Predicate Language . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   107
        Resource Description Framework                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   107
        Traverse Over Methods . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   107
        Object Query Language . . . . . .                            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   108
        Simple Object Database Access .                              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   108
        Native Queries . . . . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   109
        Natural Language Queries . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   110
          7.4.4 Native Queries in Detail . . . . . . . . . . .                        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   110

8   Using the Semantic Analyzer                                                                                                                               115
    8.1 Semantic Database . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   115
    8.2 Applications . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   118
    8.3 Plugins . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   119
        8.3.1 ReifiedMessage Extraction .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   119
        8.3.2 Resource Usage . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   122
        8.3.3 Export Types . . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   123
        8.3.4 Natural Language . . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   123
    8.4 Integration with Compose . . . .          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   123

9   Conclusion, Related, and Future Work                                                                                                                      126
    9.1 Related Work . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   126
        9.1.1 Microsoft Spec# . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   126
        9.1.2 SOUL . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   127
        9.1.3 SetPoints . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   128
        9.1.4 NDepend . . . . . . . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   129
        9.1.5 Formal Semantics . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   129
    9.2 Evaluation and Conclusion . . . .         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   130
    9.3 Future Work . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   134
        9.3.1 Extractors . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   134
        9.3.2 Model . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   134
       9.3.3   Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
       9.3.4   Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
       9.3.5   Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Bibliography                                                                                        135

A CIL Instruction Set                                                                               142

B Evaluation Stack Types                                                                            149

C Semantic Extractor Configuration File                                                              150

D Semantic Actions                                                                                  151

E Semantic Types                                                                                    153

F SEMTEX Generated File                                                                             154
                                                                                                             List of Figures

1.1    Dates and ancestry of several important languages . . . . . . . . . . . . . . . . . .                                                      2

2.1    Components of the composition filters model . . . . . . . . . . . . . . . . . . . . .                                                      14
2.2    UML class diagram of the object-oriented Pacman game . . . . . . . . . . . . . .                                                          16
2.3    Overview of the Compose architecture . . . . . . . . . . . . . . . . . . . . . . . .                                                      20

3.1    Context of the .NET framework . . . . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   26
3.2    Relationships in the CTS . . . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   29
3.3    Main components of the CLI and their relationships                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   30
3.4    From source code to machine code . . . . . . . . . .                  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   31

6.1    Structure of a managed executable module          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   50
6.2    Assembly containing multiple files . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   51
6.3    Different kinds of methods . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   54
6.4    Method body structure . . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   55
6.5    Graphical User Interface of ILDASM . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   62
6.6    Lutz Roeder’s Reflector . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   63
6.7    Microsoft FxCop . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   64
6.8    Platform of Phoenix . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   69
6.9    Control flow graph in Phoenix . . . . . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   70

7.1    Process overview . . . . . . . . . . . . . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   75
7.2    Control flow of the console application with a plugin . .                          .   .   .   .   .   .   .   .   .   .   .   .   .   .   76
7.3    Loop represented as blocks . . . . . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   81
7.4    Structure of the metamodel . . . . . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   84
7.5    Semantic Item and direct derived classes . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   85
7.6    SemanticUnit and child classes . . . . . . . . . . . . . . .                      .   .   .   .   .   .   .   .   .   .   .   .   .   .   86
7.7    Visitor pattern in the model . . . . . . . . . . . . . . . . .                    .   .   .   .   .   .   .   .   .   .   .   .   .   .   87
7.8    SemanticContainer and SemanticClass classes . . . .                               .   .   .   .   .   .   .   .   .   .   .   .   .   .   87
7.9    SemanticOperation class . . . . . . . . . . . . . . . . . .                       .   .   .   .   .   .   .   .   .   .   .   .   .   .   88
7.10   SemanticOperand class and derived classes . . . . . . . .                         .   .   .   .   .   .   .   .   .   .   .   .   .   .   89
7.11   SemanticBlock class with collection of SemanticAction                             .   .   .   .   .   .   .   .   .   .   .   .   .   .   90

7.12   The SemanticAction class with supporting types            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 91
7.13   SemanticType class . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 92
7.14   SemanticAttribute class . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 93
7.15   The flow classes . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 94
7.16   Semantic Extractor Classes . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 99
7.17   SemanticDatabaseContainer class . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 105
7.18   ExtendedList class . . . . . . . . . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 113

8.1    Windows Forms Semantic Analyzer application . . . . . . . . . . . . . . . . . . . 118
8.2    Plugin interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
                                                                                                                             List of Tables

5.1   Comparison operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                               42

6.1   Aritmetical operations in IL . . .    .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   58
6.2   Bitwise and shift operations in IL    .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   59
6.3   Phoenix unit hierarchy . . . . . .    .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   69
6.4   Phoenix instruction forms . . . .     .   .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   71

7.1   Assembly naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                              79

A.1 CIL instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

B.1 Evaluation Stack types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

D.1 Available semantic actions kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

E.1 Available semantic common types . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


1.1   Modeling addition, display, and logging without using aspects . . . . .             .   .   .   .   .    3
      (a) Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .    3
      (b) CalcDisplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .    3
1.2   Modeling addition, display, and logging with aspects . . . . . . . . . . .          .   .   .   .   .    5
      (a) Addition concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .    5
      (b) Tracing concern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .    5
1.3   Example of dynamic crosscutting in AspectJ . . . . . . . . . . . . . . . .          .   .   .   .   .    8
1.4   Example of static crosscutting in AspectJ . . . . . . . . . . . . . . . . . . .     .   .   .   .   .    9
1.5   Creation of a hyperspace . . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   10
1.6   Specification of concern mappings . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   10
1.7   Defining a hypermodule . . . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   11
2.1   Abstract concern template . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   13
2.2   DynamicScoring concern in Compose . . . . . . . . . . . . . . . . . . .             .   .   .   .   .   18
2.3   Implementation of class Score . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   19
2.4   DynamicStrategy concern in Compose            . . . . . . . . . . . . . . . . . .   .   .   .   .   .   20
3.1   Adding example in IL code . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   32
3.2   Adding example in the C# language . . . . . . . . . . . . . . . . . . . . .         .   .   .   .   .   33
3.3   Adding example in the VB.NET language . . . . . . . . . . . . . . . . . .           .   .   .   .   .   33
4.1   Getter and Setter examples in C# .NET . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   35
5.1   Assignment examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   41
5.2   Comparison examples in C# . . . . . . . . . . . . . . . . . . . . . . . . . .       .   .   .   .   .   42
5.3   Exception handling example in C# . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   43
5.4   Method AssignWhenMoreThenOne in C# .NET . . . . . . . . . . . . . .                 .   .   .   .   .   46
5.5   Method AssignWhenMoreThenOne in VB .NET . . . . . . . . . . . . . .                 .   .   .   .   .   46
5.6   Method AssignWhenMoreThenOne in Borland Delphi . . . . . . . . . .                  .   .   .   .   .   47
5.7   Method AssignWhenMoreThenOne in Common Intermediate Language                        .   .   .   .   .   47
6.1   Syntax of a class definition . . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   52
6.2   Syntax of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . . .    .   .   .   .   .   53
6.3   Example of a field definition . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   53
6.4   Syntax of a field definition with default value . . . . . . . . . . . . . . . .       .   .   .   .   .   53
6.5   Example of a field definition with default value . . . . . . . . . . . . . . .        .   .   .   .   .   53

6.6    Syntax of a method definition . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    54
6.7    Control flow examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    56
6.8    Constant loading instructions . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    57
6.9    Condition check followed by a branching instruction . . . . . . . . . . . .           . . . .    58
6.10   Method call example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    59
6.11   Exception handling in label form . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    59
6.12   Exception handling in label form example . . . . . . . . . . . . . . . . . . .        . . . .    59
6.13   Exception handling in scope form . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    60
6.14   Stack based expression in the Common Intermediate Language . . . . . .                . . . .    60
6.15   Custom attribute syntax in IL . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    61
6.16   Custom attribute example in IL . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    61
6.17   Example of a custom attribute in C# . . . . . . . . . . . . . . . . . . . . . .       . . . .    61
6.18   Reflection example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    65
6.19   Cecil get types example . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    66
6.20   PostSharp get body instruction . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    67
6.21   Phoenix phase execute example . . . . . . . . . . . . . . . . . . . . . . . . .       . . . .    71
7.1    Calling the SemanticExtractor . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .    77
7.2    For loop in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . .    80
7.3    Expression in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .    81
7.4    Expression in IL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . .    81
7.5    Semantical representation of the expression . . . . . . . . . . . . . . . . . .       . . . .    81
7.6    Part of the Cecil Instruction Visitor . . . . . . . . . . . . . . . . . . . . . . .   . . . .   100
7.7    Using the instruction stream in PostSharp . . . . . . . . . . . . . . . . . . .       . . . .   101
7.8    Loading the assembly using Phoenix . . . . . . . . . . . . . . . . . . . . . .        . . . .   102
7.9    Starting a phase for a function using the Phoenix Extractor . . . . . . . . .         . . . .   102
7.10   SODA example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . .   108
7.11   LINQ query example in C#.NET . . . . . . . . . . . . . . . . . . . . . . . .          . . . .   109
7.12   LINQ query example in Java . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .   109
7.13   LINQ query example in C#.NET 1.1 . . . . . . . . . . . . . . . . . . . . . .          . . . .   109
7.14   LINQ query examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . . .   111
       (a) query expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . . .   111
       (b) lambda expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . .   111
7.15   Query function signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . .   111
7.16   Predicate matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . .   111
7.17   Return all distinct operations containing actions assigning a value to a              field
       named ”name” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . .   114
8.1    Search for all call actions and sort by operation name . . . . . . . . . . . .        . . . .   115
8.2    Find all operations using a field named value as their destination operand             . . . .   116
8.3    Group jump labels by operation name . . . . . . . . . . . . . . . . . . . . .         . . . .   117
8.4    Retrieve all the assignments where an integer is used . . . . . . . . . . . .         . . . .   117
8.5    Find operations using a ReifiedMessage . . . . . . . . . . . . . . . . . . . .         . . . .   120
8.6    Find the argument using a ReifiedMessage . . . . . . . . . . . . . . . . . .           . . . .   120
8.7    Retrieve all the calls to methods of the ReifiedMessage argument . . . . .             . . . .   120
8.8    Retrieve other methods which should be analyzed after a proceed call . .              . . . .   121
9.1    Selecting getters in SOUL using Prolog . . . . . . . . . . . . . . . . . . . . .      . . . .   127
9.2    Examples of CQL queries . . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . . .   129
C.1    Contents of the app.config file . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . .   150
F.1    Part of the SEMTEX file for the pacman example . . . . . . . . . . . . . . .           . . . .   154
                                                                                                           List of Algorithms

1   GenerateSemanticControlFlow . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 95
2   Connect flow edges . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 96
3   Start DetermineControlDependency       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 96
4   DetermineControlDependency . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 96
5   Determine Flow Paths . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 97
6   Determine Access Levels . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 98
7   Optimization of Semantic Blocks . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 104


AOP    Aspect-Oriented Programming

API    Application Programming Interface

AST    Abstract Syntax Tree

BLOB   Binary Large Object

CF     Composition Filters

CIL    Common Intermediate Language

CLI    Common Language Infrastructure

CLR    Common Language Runtime

CLS    Common Language Specification

CQL    Code Query Language

CTS    Common Type System

FCL    Framework Class Library

GUI    Graphical User Interface

IL     Intermediate Language

JIT    Just-in-time

JVM    Java Virtual Machine

MLI    Meta Level Interface

OOP      Object-Oriented Programming

OpCode   Operation Code

OQL      Object Query Language

PDA      Personal Digital Assistant

RDF      Resource Description Framework

SDK      Software Development Kit

SOAP     Simple Object Access Protocol

SODA     Simple Object Database Access

SQL      Structured Query Language

UML      Unified Modeling Language

URI      Uniform Resource Identifiers

VM       Virtual Machine

WSDL     Web Services Description Language

XML      eXtensible Markup Language
                                                                                CHAPTER        1

                                                                     Introduction to AOSD

The first two chapters have originally been written by seven M. Sc. students [39, 24, 80, 11, 72,
37, 10] at the University of Twente. The chapters have been rewritten for use in the following
theses [79, 16, 75, 42, 23, 41, 71]. They serve as a general introduction into Aspect-Oriented
Software Development and Compose in particular.

1.1   Introduction

The goal of software engineering is to solve a problem by implementing a software system. The
things of interest are called concerns. They exist at every level of the engineering process. A re-
current theme in engineering is that of modularization: separation and localization of concerns.
The goal of modularization is to create maintainable and reusable software. A programming
language is used to implement concerns.
Fifteen years ago the dominant programming language paradigm was procedural program-
ming. This paradigm is characterized by the use of statements that update state variables.
Examples are Algol-like languages such as Pascal, C, and Fortran.
Other programming paradigms are the functional, logic, object-oriented, and aspect-oriented
paradigms. Figure 1.1 summarizes the dates and ancestry of several important languages [83].
Every paradigm uses a different modularization mechanism for separating concerns into mod-
Functional languages try to solve problems without resorting to variables. These languages are
entirely based on functions over lists and trees. Lisp and Miranda are examples of functional
A logic language is based on a subset of mathematical logic. The computer is programmed to
infer relationships between values, rather than to compute output values from input values.
Prolog is currently the most used logic language [83].

1.1 Introduction

                    Influenced by                                                                       Fortran
         1960                                                                                Algol-60             Cobol

                                                                    Simula                Algol-68

         1970                                                                 Pascal

         1980                                                                  Ada                                  BASIC

                                               Sina               C++

         1990                                                            VB

                Sina/st                                 Java

         2000           Hyper/J                                   C#

         2005   Compose*

                           aspect-oriented            object-oriented                procedural and concurrent              functional     logic
                              languages                  languages                           languages                      languages    languages

                     Figure 1.1: Dates and ancestry of several important languages

A shortcoming of procedural programming is that global variables can potentially be accessed
and updated by any part of the program. This can result in unmanageable programs because no
module that accesses a global variable can be understood independently from other modules
that also access that global variable.

The Object-Oriented Programming (OOP) paradigm improves modularity by encapsulating
data with methods inside objects. The data may only be accessed indirectly, by calling the
associated methods. Although the concept appeared in the seventies, it took twenty years to
become popular [83]. The most well known object-oriented languages are C++, Java, C#, and

The hard part about object-oriented design is decomposing a system into objects. The task
is difficult because many factors come into play: encapsulation, granularity, dependency,
adaptability, reusability, and others. They all influence the decomposition, often in conflict-
ing ways [31].

Existing modularization mechanisms typically support only a small set of decompositions and
usually only a single dominant modularization at a time. This is known as the tyranny of the
dominant decomposition [74]. A specific decomposition limits the ability to implement other
concerns in a modular way. For example, OOP modularizes concerns in classes and only fixed
relations are possible. Implementing a concern in a class might prevent another concern from
being implemented as a class.

Aspect-Oriented Programming (AOP) is a paradigm that solves this problem.

AOP is commonly used in combination with OOP but can be applied to other paradigms as
well. The following sections introduce an example to demonstrate the problems that may arise
with OOP and show how AOP can solve this. Finally, we look at three particular AOP method-
ologies in more detail.

2                                                                       Automatic Derivation of Semantic Properties in .NET
                                                                                 1.2 Traditional Approach

 1   public class Add extends Calculation{
 3       private int result;
 4       private CalcDisplay calcDisplay;
 5       private Tracer trace;
 7       Add() {
 8         result = 0;
 9         calcDisplay = new CalcDisplay();
10         trace = new Tracer();
11       }
12                                                       1   public class CalcDisplay {
13       public void execute(int a, int b) {      2            private Tracer trace;
14         trace.write("void Add.execute(int, int 3
               )");                               4              public CalcDisplay() {
15         result = a + b;                        5                trace = new Tracer();
16         calcDisplay.update(result);            6              }
17       }                                        7
18                                                       8       public void update(int value){
19       public int getLastResult() {              9               trace.write("void CalcDisplay.update(
20         trace.write("int Add.getLastResult()")                      int)");
               ;                                  10               System.out.println("Printing new value
21         return result;                                               of calculation: "+value);
22       }                                        11             }
23   }                                                  12   }

                                      (a) Addition                                         (b) CalcDisplay

                   Listing 1.1: Modeling addition, display, and logging without using aspects

         1.2   Traditional Approach

         Consider an application containing an object Add and an object CalcDisplay. Add inherits from
         the abstract class Calculation and implements its method execute(a, b). It performs the
         addition of two integers. CalcDisplay receives an update from Add if a calculation is finished
         and prints the result to screen. Suppose all method calls need to be traced. The objects use a
         Tracer object to write messages about the program execution to screen. This is implemented
         by a method called write. Three concerns can be recognized: addition, display, and tracing.
         The implementation might look something like Listing 1.1.

         From our example, we recognize two forms of crosscutting: code tangling and code scattering.

         The addition and display concerns are implemented in classes Add and CalcDisplay respec-
         tively. Tracing is implemented in the class Tracer, but also contains code in the other two
         classes (lines 5, 10, 14, and 20 in (a) and 2, 5, and 9 in (b)). If a concern is implemented across
         several classes it is said to be scattered. In the example of Listing 1.1 the tracing concern is

         Usually a scattered concern involves code replication. That is, the same code is implemented
         a number of times. In our example the classes Add and CalcDisplay contain similar tracing

         In class Add the code for the addition and tracing concerns are intermixed.                In class

         M.D.W. van Oudheusden                                                                               3
1.3 AOP Approach
CalcDisplay the code for the display and tracing concerns are intermixed. If more then
one concern is implemented in a single class they are said to be tangled. In our example the
addition and tracing concerns are tangled. Also display and tracing concerns are tangled.
Crosscutting code has the following consequences:
Code is difficult to change
    Changing a scattered concern requires us to modify the code in several places. Making
    modifications to a tangled concern class requires checking for side-effects with all existing
    crosscutting concerns;
Code is harder to reuse
    To reuse an object in another system, it is necessary to either remove the tracing code or
    reuse the (same) tracer object in the new system;
Code is harder to understand
    Tangled code makes it difficult to see which code belongs to which concern.

1.3   AOP Approach

To solve the problems with crosscutting, several techniques are being researched that attempt
to increase the expressiveness of the OO paradigm. Aspect-Oriented Programming (AOP) in-
troduces a modular structure, the aspect, to capture the location and behavior of crosscutting
concerns. Examples of Aspect-Oriented languages are Sina, AspectJ, Hyper/J, and Compose .
A special syntax is used to specify aspects and the way in which they are combined with reg-
ular objects. The fundamental goals of AOP are twofold [34]: first to provide a mechanism to
express concerns that crosscut other components. Second to use this description to allow for
the separation of concerns.
Join points are well-defined places in the structure or execution flow of a program where ad-
ditional behavior can be attached. The most common join points are method calls. Pointcuts
describe a set of join points. This allows us to execute behavior at many places in a program by
one expression. Advice is the behavior executed at a join point.
In the example of Listing 1.2 the class Add does not contain any tracing code and only imple-
ments the addition concern. Class CalcDisplay also does not contain tracing code. In our
example the tracing aspect contains all the tracing code. The pointcut tracedCalls specifies
at which locations tracing code is executed.
The crosscutting concern is explicitly captured in aspects instead of being embedded within
the code of other objects. This has several advantages over the previous code.
Aspect code can be changed
    Changing aspect code does not influence other concerns;
Aspect code can be reused
    The coupling of aspects is done by defining pointcuts. In theory, this low coupling allows
    for reuse. In practice reuse is still difficult;
Aspect code is easier to understand
    A concern can be understood independent of other concerns;
Aspect pluggability
    Enabling or disabling concerns becomes possible.

4                                        Automatic Derivation of Semantic Properties in .NET
                                                                                           1.3 AOP Approach

 1   public class Add extends Calculation{
 2     private int result;
 3     private CalcDisplay calcDisplay;
 5       Add() {
 6         result = 0;
 7         calcDisplay = new CalcDisplay();                1    aspect Tracing {
 8       }                                                 2      Tracer trace = new Tracer();
 9                                                         3
10       public void execute(int a, int b) {               4        pointcut tracedCalls():
11         result = a + b;                                 5          call(* (Calculation+).*(..)) ||
12         calcDisplay.update(result);                     6          call(* CalcDisplay.*(..));
13       }                                                 7
14                                                         8        before(): tracedCalls() {
15       public int getLastResult() {                      9          trace.write(thisJoinPoint.getSignature()
16         return result;                                                 .toString());
17       }                                                 10       }
18   }                                                     11   }

                                    (a) Addition concern                                   (b) Tracing concern

                         Listing 1.2: Modeling addition, display, and logging with aspects

         1.3.1     AOP Composition

         AOP composition can be either symmetric or asymmetric. In the symmetric approach every
         component can be composed with any other component. This approach is followed by e.g.

         In the asymmetric approach, the base program and aspects are distinguished. The base pro-
         gram is composed with the aspects. This approach is followed by e.g. AspectJ (covered in more
         detail in the next section).

         1.3.2     Aspect Weaving

         The integration of components and aspects is called aspect weaving. There are three approaches
         to aspect weaving. The first and second approach rely on adding behavior in the program,
         either by weaving the aspect in the source code, or by weaving directly in the target language.
         The target language can be intermediate language (IL) or machine code. Examples of IL are Java
         byte code and Common Intermediate Language (CIL). The remainder of this chapter considers
         only intermediate language targets. The third approach relies on adapting the virtual machine.
         Each method is explained briefly in the following sections.

   Source Code Weaving

         The source code weaver combines the original source with aspect code. It interprets the defined
         aspects and combines them with the original source, generating input for the native compiler.
         For the native compiler there is no difference between source code with and without aspects.

         M.D.W. van Oudheusden                                                                                   5
1.3 AOP Approach
Hereafter the compiler generates an intermediate or machine language output (depending on
the compiler-type).

The advantages of using source code weaving are:

High-level source modification
     Since all modifications are done at source code level, there is no need to know the target
     (output) language of the native compiler;
Aspect and original source optimization
     First the aspects are woven into the source code and hereafter compiled by the native
     compiler. The produced target language has all the benefits of the native compiler opti-
     mization passes. However, optimizations specific to exploiting aspect knowledge are not
Native compiler portability
     The native compiler can be replaced by any other compiler as long as it has the same
     input language. Replacing the compiler with a newer version or another target language
     can be done with little or no modification to the aspect weaver.

However, the drawbacks of source code weaving are:

Language dependency
     Source code weaving is written explicitly for the syntax of the input language;
Limited expressiveness
     Aspects are limited to the expressive power of the source language. For example, when
     using source code weaving, it is not possible to add multiple inheritance to a single in-
     heritance language.   Intermediate Language Weaving

Weaving aspects through an intermediate language gives more control over the executable
program and solves some issues as identified in subsubsection on source code weaving.
Weaving at this level allows for creating combinations of intermediate language constructs
that can not be expressed at the source code level. Although IL can be hard to understand, IL
weaving has several advantages over source code weaving:

Programming language independence
     All compilers generating the target IL output can be used;
More expressiveness
     It is possible to create IL constructs that are not possible in the original programming
Source code independence
     Can add aspects to programs and libraries without using the source code (which may not
     be available);
Adding aspects at load- or runtime
     A special class loader or runtime environment can decide and do dynamic weaving. The
     aspect weaver adds a runtime environment into the program. How and when aspects
     can be added to the program depend on the implementation of the runtime environment.

However, IL weaving also has drawbacks that do not exist for source code weaving:

6                                         Automatic Derivation of Semantic Properties in .NET
                                                                             1.4 AOP Solutions
Hard to understand
     Specific knowledge about the IL is needed;
More error-prone
     Compiler optimization may cause unexpected results. Compiler can remove code that
     breaks the attached aspect (e.g., inlining of methods).    Adapting the Virtual Machine

Adapting the virtual machine (VM) removes the need to weave aspects. This technique has the
same advantages as intermediate language weaving and can also overcome some of its disad-
vantages as mentioned in subsubsection Aspects can be added without recompilation,
redeployment, and restart of the application [63, 64].

Modifying the virtual machine also has its disadvantages:

Dependency on adapted virtual machines
     Using an adapted virtual machine requires that every system should be upgraded to that
Virtual machine optimization
     People have spend a lot of time optimizing virtual machines. By modifying the virtual
     machine these optimizations should be revisited. Reintegrating changes introduced by
     newer versions of the original virtual machine, might have substantial impact.

1.4       AOP Solutions

As the concept of AOP has been embraced as a useful extension to classic programming, dif-
ferent AOP solutions have been developed. Each solution has one or more implementations to
demonstrate how the solution is to be used. As described by [26] these differ primarily in:

How aspects are specified
     Each technique uses its own aspect language to describe the concerns;
Composition mechanism
     Each technique provides its own composition mechanisms;
Implementation mechanism
     Whether components are determined statically at compile time or dynamically at run
     time, the support for verification of compositions, and the type of weaving.
Use of decoupling
     Should the writer of the main code be aware that aspects are applied to his code;
Supported software processes
     The overall process, techniques for reusability, analyzing aspect performance of aspects,
     is it possible to monitor performance, and is it possible to debug the aspects.

This section will give a short introduction to AspectJ [46] and Hyperspaces [62], which together
with Composition Filters [8] are three main AOP approaches.

M.D.W. van Oudheusden                                                                         7
     1.4 AOP Solutions

 1   aspect DynamicCrosscuttingExample {
 2     Log log = new Log();
 4       pointcut traceMethods():
 5         execution(edu.utwente.trese.*.*(..));
 7       before() : traceMethods {
 8         log.write("Entering " + thisJointPoint.getSignature());
 9       }
11       after() : traceMethods {
12         log.write("Exiting " + thisJointPoint.getSignature());
13       }
14   }
                         Listing 1.3: Example of dynamic crosscutting in AspectJ

     1.4.1   AspectJ Approach

     AspectJ [46] is an aspect-oriented extension to the Java programming language. It is probably
     the most popular approach to AOP at the moment, and it is finding its way into the industrial
     software development. AspectJ has been developed by Gregor Kiczales at Xerox’s PARC (Palo
     Alto Research Center). To encourage the growth of the AspectJ technology and community,
     PARC transferred AspectJ to an open Eclipse project. The popularity of AspectJ comes partly
     from the various extensions based on it, build by several research groups. There are various
     projects that are porting AspectJ to other languages, resulting in tools such as AspectR and
     One of the main goals in the design of AspectJ is to make it a compatible extension to Java.
     AspectJ tries to be compatible in four ways:
     Upward compatibility
           All legal Java programs must be legal AspectJ programs;
     Platform compatibility
           All legal AspectJ programs must run on standard Java virtual machines;
     Tool compatibility
           It must be possible to extend existing tools to support AspectJ in a natural way; this
           includes IDEs, documentation tools and design tools;
     Programmer compatibility
           Programming with AspectJ must feel like a natural extension of programming with Java.
     AspectJ extends Java with support for two kinds of crosscutting functionality. The first allows
     defining additional behavior to run at certain well-defined points in the execution of the pro-
     gram and is called the dynamic crosscutting mechanism. The other is called the static crosscutting
     mechanism and allows modifying the static structure of classes (methods and relationships be-
     tween classes). The units of crosscutting implementation are called aspects. An example of an
     aspect specified in AspectJ is shown in Listing 1.3.
     The points in the execution of a program where the crosscutting behavior is inserted are called
     join points. A pointcut has a set of join points. In Listing 1.3 is traceMethods an example of
     a pointcut definition. The pointcut includes all executions of any method that is in a class
     contained by package edu.utwente.trese.

     8                                        Automatic Derivation of Semantic Properties in .NET
                                                                                  1.4 AOP Solutions

1   aspect StaticCrosscuttingExample {
2     private int Log.trace(String traceMsg) {
3       Log.write(" --- MARK --- " + traceMsg);
4     }
5   }
                         Listing 1.4: Example of static crosscutting in AspectJ

    The code that should execute at a given join point is declared in an advice. Advice is a method-
    like code body associated with a certain pointcut. AspectJ supports before, after and around
    advice that specifies where the additional code is to be inserted. In the example both before
    and after advice are declared to run at the join points specified by the traceMethods pointcut.

    Aspects can contain anything permitted in class declarations including definitions of pointcuts,
    advice and static crosscutting. For example, static crosscutting allows a programmer to add
    fields and methods to certain classes as shown in Listing 1.4.

    The shown construct is called inter-type member declaration and adds a method trace to class
    Log. Other forms of inter-type declarations allow developers to declare the parents of classes
    (superclasses and realized interfaces), declare where exceptions need to be thrown, and allow
    a developer to define the precedence among aspects.

    With its variety of possibilities AspectJ can be considered a useful approach for realizing soft-
    ware requirements.

    1.4.2   Hyperspaces Approach

    The Hyperspaces approach is developed by H. Ossher and P. Tarr at the IBM T.J. Watson Research
    Center. The Hyperspaces approach adopts the principle of multi-dimensional separation of
    concerns [62], which involves:

       • Multiple, arbitrary dimensions of concerns;
       • Simultaneous separation along these dimensions;
       • Ability to dynamically handle new concerns and new dimensions of concern as they arise
         throughout the software life cycle;
       • Overlapping and interacting concerns. It is appealing to think of many concerns as inde-
         pendent or orthogonal, but they rarely are in practice.

    We explain the Hyperspaces approach by an example written in the Hyper/J language. Hyper/J
    is an implementation of the Hyperspaces approach for Java. It provides the ability to identify
    concerns, specify modules in terms of those concerns, and synthesize systems and components
    by integrating those modules. Hyper/J uses bytecode weaving on binary Java class files and
    generates new class files to be used for execution. Although the Hyper/J project seems aban-
    doned and there has not been any update in the code or documentation for a while, we still
    mention it because the Hyperspaces approach offers a unique AOP solution.

    As a first step, developers create hyperspaces by specifying a set of Java class files that contain
    the code units that populate the hyperspace. To do this is, you create a hyperspace specification,
    as demonstrated in Listing 1.5.

    M.D.W. van Oudheusden                                                                          9
    1.4 AOP Solutions

1   Hyperspace Pacman
2     class edu.utwente.trese.pacman.*;
                                  Listing 1.5: Creation of a hyperspace

    Hyper/J will automatically create a hyperspace with one dimension—the class file dimension.
    A dimension of concern is a set of concerns that are disjoint. The initial hyperspace will con-
    tain all units within the specified package. To create a new dimension you can specify concern
    mappings, which describe how existing units in the hyperspace relate to concerns in that di-
    mension, as demonstrated in Listing 1.6.

    The first line indicates that, by default, all of the units contained within the package edu.
    utwente.trese.pacman address the kernel concern of the feature dimension. The other map-
    pings specify that any method named trace or debug address the logging and debugging
    concern respectively. These later mappings override the first one.

    Hypermodules are based on concerns and consist of two parts. The first part specifies a set of
    hyperslices in terms of the concerns identified in the concern matrix. The second part specifies
    the integration relationships between the hyperslices. A hyperspace can contain several hyper-
    modules realizing different modularizations of the same units. Systems can be composed in
    many ways from these hypermodules.

    Listing 1.7 shows a hypermodule with two concerns, kernel and logging. They are related
    by a mergeByName integration relationship. This means that units in the different concerns
    correspond if they have the same name (ByName) and that these corresponding units are to be
    combined (merge). For example, all members of the corresponding classes are brought together
    into the composed class. The hypermodule results in a hyperslice that contains all the classes
    without the debugging feature; thus no debug methods will be present.

    The most important feature of the hyperspaces approach is the support for on-demand remod-
    ularisation: the ability to extract hyperslices to encapsulate concerns that were not separated
    in the original code. Which makes hyperspaces especially useful for evolution of existing soft-

    1.4.3   Composition Filters

    Composition Filters is developed by M. Aksit and L. Bergmans at the TRESE group, which is
    a part of the Department of Computer Science of the University of Twente, The Netherlands.
    The composition filters (CF) model predates aspect-oriented programming. It started out as an
    extension to the object-oriented model and evolved into an aspect-oriented model. The current
    implementation of CF is Compose , which covers .NET, Java, and C.

1   package edu.utwente.trese.pacman: Feature.Kernel
2   operation trace: Feature.Logging
3   operation debug: Feature.Debugging
                            Listing 1.6: Specification of concern mappings

    10                                       Automatic Derivation of Semantic Properties in .NET
                                                                                   1.4 AOP Solutions

1   hypermodule Pacman_Without_Debugging
2     hyperslices: Feature.Kernel, Feature.Logging;
3     relationships: mergeByName;
4   end hypermodule;
                                  Listing 1.7: Defining a hypermodule

    One of the key elements of CF is the message, a message is the interaction between objects, for
    instance a method call. In object-oriented programming the message is considered an abstract
    concept. In the implementations of CF it is therefore necessary to reify the message. This reified
    message contains properties, like where it is send to and where it came from.
    The concept of CF is that messages that enter and exit an object can be intercepted and manip-
    ulated, modifying the original flow of the message. To do so, a layer called the interface part is
    introduced in the CF model, this layer can have several properties. The interface part can be
    placed on an object, which behavior needs to be altered, and this object is referred to as inner.
    There are three key elements in CF: messages, filters, and superimposition. Messages are sent
    from one object to another, if there is an interface part placed on the receiver, then the message
    that is sent goes through the input filters. In the filters the message can be manipulated before
    it reaches the inner part, the message can even be sent to another object. How the message
    will be handled depends on the filter type. An output filter is similar to an input filter, the only
    difference is that it manipulates messages that originate from the inner part. The latest addition
    to CF is superimposition, which is used to specify which interfaces needs to be superimposed
    on which inner objects.

    M.D.W. van Oudheusden                                                                          11
                                                                             CHAPTER         2


Compose is an implementation of the composition filters approach. There are three target
environments: the .NET, Java, and C. This chapter is organized as follows, first the evolution
of Composition Filters and its implementations are described, followed by an explanation of
the Compose language and a demonstrating example. In the third section, the Compose
architecture is explained, followed by a description of the features specific to Compose .

2.1    Evolution of Composition Filters

Compose is the result of many years of research and experimentation. The following time
line gives an overview of what has been done in the years before and during the Compose
1985                                                          ¸
         The first version of Sina is developed by Mehmet Aksit. This version of Sina contains a
         preliminary version of the composition filters concept called semantic networks. The
         semantic network construction serves as an extension to objects, such as classes, mes-
         sages, or instances. These objects can be configured to form other objects such as
         classes from which instances can be created. The object manager takes care of syn-
         chronization and message processing of an object. The semantic network construction
         can express key concepts like delegation, reflection, and synchronization [47].
1987     Together with Anand Tripathi of the University of Minnesota the Sina language is
         further developed. The semantic network approach is replaced by declarative specifi-
         cations and the interface predicate construct is added.
1991     The interface predicates are replaced by the dispatch filter, and the wait filter manages
         the synchronization functions of the object manager. Message reflection and real-time
         specifications are handled by the meta filter and the real-time filter [7].
1995     The Sina language with Composition Filters is implemented using Smalltalk [47]. The
         implementation supports most of the filter types. In the same year, a preprocessor

                                                                2.2 Composition Filters in Compose

 1       filtermodule{
 2         internals
 3         externals
 4         conditions
 5         inputfilters
 6         outputfilters
 7       }
 9       superimposition{
10         selectors
11         filtermodules
12         annotations
13         constraints
14       }
16       implementation
17   }
                                  Listing 2.1: Abstract concern template

              providing C++ with support for Composition Filters is implemented [33].
     1999     The composition filters language ComposeJ [85] is developed and implemented. The
              implementation consists of a preprocessor capable of translating composition filter
              specifications into the Java language.
     2001     ConcernJ is implemented as part of a M. Sc. thesis [70]. ConcernJ adds the notion of
              superimposition to Composition Filters. This allows for reuse of the filter modules
              and to facilitate crosscutting concerns.
     2003     The start of the Compose project, the project is described in further detail in this
     2004     The first release of Compose , based on .NET.
     2005     The start of the Java port of Compose .
     2006     Porting Compose to C is started.

     2.2    Composition Filters in Compose

     A Compose application consists of concerns that can be divided in three parts: filter module
     specification, superimposition, and implementation. A filter module contains the filter logic
     to filter on messages that are incoming or outgoing the superimposed object. A message has
     a target, which is an object reference, and a selector, which is a method name. The superim-
     position part specifies which filter modules, annotations, conditions, and methods need to be
     superimposed on which objects. The implementation part contains the class implementation
     of the concern. How these parts are placed in a concern is shown in Listing 2.1.
     The working of the filter module is shown in Figure 2.1. A filter module can contain input and
     output filters. The difference between these two sets of filters is that the first is used to filter on
     incoming messages and the second filter set is used on the outgoing messages. A return of a
     method is not considered as an outgoing message. A filter has three parts: the filter identifier,
     the filter type, and one or more filter elements. The filter element exist out of an optional
     condition part, a matching part, and a substitution part. These parts are shown below:

     M.D.W. van Oudheusden                                                                           13
2.2 Composition Filters in Compose

                   Figure 2.1: Components of the composition filters model

                          identif ier     f ilter type          condition part

                       stalker f ilter : Dispatch = {!pacmanIsEvil =>
                          matching part                  substitution part

                       [∗.getN extM ove] stalk strategy.getN extM ove }

The filter identifier is the unique name for a filter in a filter module. A filter matches when
both the condition as the matching provide the boolean value true. In the demonstrated filter
it matches on every message where the selector is getNextMove, the ‘*’ in the target means that
every target matches. When the condition part and the matching part are true, the message
is substituted with the values of the substitution part. How these values are substituted and
how the message continues depends on the filter type. At the moment there are four basic filter
types in Compose ; it is possible to write custom filter types.

      If the message is accepted, it is dispatched to the specified target of the message, other-
      wise the message continues to the subsequent filter. This filter type can only be used for
      input filters;
      If the message is accepted, it is sent to the specified target of the message, otherwise the
      message continues to the subsequent filter. This filter type can only be used for output
      If the filter rejects the message, it raises an exception, otherwise the message continues to
      the next filter in the set;
      If the message is accepted, the message is sent as a parameter of another meta message to
      an internal or external object, otherwise the message just continues to the next filter. The
      object that receives the meta message can observe and manipulate the message and can
      re-activate the execution of the message.

14                                          Automatic Derivation of Semantic Properties in .NET
                                                                    2.3 Demonstrating Example
The pacmanIsEvil used in the condition part must be declared in the conditions section of
a filtermodule. The targets that are used in a filter must declared as internals or externals.
Internals are objects which are unique for each instance of a filter module and externals are
shared between filter modules.
The filter modules can be superimposed on classes with filter module binding, this binding
has a selection of objects on one side and a filter module on the other side. The selection is de-
fined with a selector definition. The selector uses predicates, such as isClassWithNameInList,
isNamespaceWithName, and namespaceHasClass, to select objects. It is also possible to bind
conditions, methods, and annotations to classes with the use of superimposition.
The last part of the concern is the implementation part. In the implementation part we can
define the object behavior of the concern, so for example in a logging concern, we can define
specific log functions.

2.3       Demonstrating Example

To illustrate the Compose toolset, this section introduces a Pacman example. The Pacman
game is a classic arcade game in which the user, represented by pacman, moves in a maze to
eat vitamins. Meanwhile, a number of ghosts try to catch and eat pacman. There are, however,
four mega vitamins in the maze that make pacman evil. In its evil state, pacman can eat ghosts.
A simple list of requirements for the Pacman game is briefly discussed here:
      •   The number of lives taken from pacman when eaten by a ghost;
      •   A game should end when pacman has no more lives;
      •   The score of a game should increase when pacman eats a vitamin or a ghost;
      •   A user should be able to use a keyboard to move pacman around the maze;
      •   Ghosts should know whether pacman is evil or not;
      •   Ghosts should know where pacman is located;
      •   Ghosts should, depending on the state of pacman, hunt or flee from pacman.

2.3.1      Initial Object-Oriented Design

Figure 2.2 shows an initial object-oriented design for the Pacman game. Note that this UML
class diagram does not show the trivial accessors. The classes in this diagram are:
    This class encapsulates the control flow and controls the state of a game;
    This class is a representation of a ghost chasing pacman. Its main attribute is a property
    that indicates whether it is scared or not (depending on the evil state of pacman);
    This class is responsible for painting ghosts;
    This is the superclass of all mobile objects (pacman and ghosts). It contains common
    information like direction and speed;
    This class accepts all keyboard input and makes it available to pacman;

M.D.W. van Oudheusden                                                                         15
2.3 Demonstrating Example

            Figure 2.2: UML class diagram of the object-oriented Pacman game

16                                   Automatic Derivation of Semantic Properties in .NET
                                                                    2.3 Demonstrating Example
     This is the entry point of a game;
     This is a representation of the user controlled element in the game. Its main attribute is a
     property that indicates whether pacman is evil or not;
     This class is responsible for painting pacman;
     By using this strategy, ghosts move in random directions;
     This class is responsible for painting a maze;
     This class has all the information about a maze. It knows where the vitamins, mega
     vitamins and most importantly the walls are. Every class derived from class Glyph checks
     whether movement in the desired direction is possible.

2.3.2     Completing the Pacman Example

The initial object-oriented design, described in the previous section, does not implement all the
stated system requirements. The missing requirements are:
    • The application does not maintain a score for the user;
    • Ghosts move in random directions instead of chasing or fleeing from pacman.
In the next sections, we describe why and how to implement these requirements in the
Compose language.   Implementation of Scoring

The first system requirement that we need to add to the existing Pacman game is scoring. This
concern involves a number of events. First, the score should be set to zero when a game starts.
Second, the score should be updated whenever pacman eats a vitamin, mega vitamin or ghost.
And finally, the score itself has to be painted on the maze canvas to relay it back to the user.
These events scatter over multiple classes: Game (initializing score), World (updating score),
Main (painting score). Thus scoring is an example of a crosscutting concern.

To implement scoring in the Compose language, we divide the implementation into two parts.
The first part is a Compose concern definition stating which filter modules to superimpose.
Listing 2.2 shows an example Compose concern definition of scoring.
This concern definition is called DynamicScoring (line 1) and contains two parts. The first part
is the declaration of a filter module called dynamicscoring (lines 2–11). This filter module
contains one meta filter called score_filter (line 6). This filter intercepts five relevant calls
and sends the message in a reified form to an instance of class Score. The final part of the
concern definition is the superimposition part (lines 12–18). This part defines that the filter
module dynamicscoring is to be superimposed on the classes World, Game and Main.
The final part of the scoring concern is the so-called implementation part. This part is defined by
a class Score. Listing 2.3 shows an example implementation of class Score. Instances of this

M.D.W. van Oudheusden                                                                         17
     2.4 Compose Architecture

 1   concern DynamicScoring in pacman {
 2     filtermodule dynamicscoring {
 3       externals
 4         score : pacman.Score = pacman.Score.instance();
 5       inputfilters
 6         score_filter : Meta = {[*.eatFood] score.eatFood,
 7                                [*.eatGhost] score.eatGhost,
 8                                [*.eatVitamin] score.eatVitamin,
 9                                [*.gameInit] score.initScore,
10                                [*.setForeground] score.setupLabel}
11     }
12     superimposition {
13       selectors
14         scoring = { C | isClassWithNameInList(C, [’pacman.World’,
15                                    ’pacman.Game’, ’pacman.Main’]) };
16       filtermodules
17         scoring <- dynamicscoring;
18     }
19   }
                              Listing 2.2: DynamicScoring concern in Compose

     class receive the messages sent by score_filter and subsequently perform the events related
     to the scoring concern. In this way, all scoring events are encapsulated in one class and one
     Compose concern definition.    Implementation of Dynamic Strategy

     The last system requirement that we need to implement is the dynamic strategy of ghosts. This
     means that a ghost should, depending on the state of pacman, hunt or flee from pacman. We
     can implement this concern by using the strategy design pattern. However, in this way, we
     need to modify the existing code. This is not the case when we use Compose dispatch filters.
     Listing 2.4 demonstrates this.

     This concern uses dispatch filters to intercept calls to method RandomStrategy.getNextMove
     and redirect them to either StalkerStrategy.getNextMove or FleeStrategy.getNextMove.
     If pacman is not evil, the intercepted call matches the first filter, which dispatches the inter-
     cepted call to method StalkerStrategy.getNextMove (line 9). Otherwise, the intercepted
     call matches the second filter, which dispatches the intercepted call to method FleeStrategy.
     getNextMove (line 11).

     2.4       Compose Architecture

     An overview of the Compose architecture is illustrated in Figure 2.3. The Compose archi-
     tecture can be divided in four layers [60]: IDE, compile time, adaptation, and runtime.

     18                                              Automatic Derivation of Semantic Properties in .NET
                                                                      2.4 Compose Architecture

 1   import Composestar.Runtime.FLIRT.message.*;
 2   import java.awt.*;
 4   public class Score
 5   {
 6     private int score = -100;
 7     private static Score theScore = null;
 8     private Label label = new java.awt.Label("Score: 0");
10       private Score() {}
12       public static Score instance() {
13         if(theScore == null) {
14           theScore = new Score();
15         }
16         return theScore;
17       }
19       public void initScore(ReifiedMessage rm) {
20         this.score = 0;
21         label.setText("Score: "+score);
22       }
24       public void eatGhost(ReifiedMessage rm) {
25         score += 25;
26         label.setText("Score: "+score);
27       }
29       public void eatVitamin(ReifiedMessage rm) {
30         score += 15;
31         label.setText("Score: "+score);
32       }
34       public void eatFood(ReifiedMessage rm) {
35         score += 5;
36         label.setText("Score: "+score);
37       }
39       public void setupLabel(ReifiedMessage rm) {
40         rm.proceed();
41         label = new Label("Score: 0");
42         label.setSize(15*View.BLOCKSIZE+20,15*View.BLOCKSIZE);
43         Main main = (Main)Composestar.Runtime.FLIRT.message.MessageInfo
44                                      .getMessageInfo().getTarget();
45         main.add(label,BorderLayout.SOUTH);
46       }
47   }
                              Listing 2.3: Implementation of class Score

     M.D.W. van Oudheusden                                                                  19
     2.4 Compose Architecture

 1   concern DynamicStrategy in pacman {
 2     filtermodule dynamicstrategy {
 3       internals
 4         stalk_strategy : pacman.Strategies.StalkerStrategy;
 5         flee_strategy : pacman.Strategies.FleeStrategy;
 6       conditions
 7         pacmanIsEvil : pacman.Pacman.isEvil();
 8       inputfilters
 9         stalker_filter : Dispatch = {!pacmanIsEvil =>
10                           [*.getNextMove] stalk_strategy.getNextMove};
11         flee_filter : Dispatch = {
12                           [*.getNextMove] flee_strategy.getNextMove}
13     }
14     superimposition {
15       selectors
16         random = { C | isClassWithName(C,
17                           ’pacman.Strategies.RandomStrategy’) };
18       filtermodules
19         random <- dynamicstrategy;
20     }
21   }
                       Listing 2.4: DynamicStrategy concern in Compose

                        Figure 2.3: Overview of the Compose architecture

     20                                  Automatic Derivation of Semantic Properties in .NET
                                                                    2.4 Compose Architecture
2.4.1   Integrated Development Environment

Some of the purposes of the Integrated Development Environment (IDE) layer are to interface
with the native IDE and to create a build configuration. In the build configuration it is specified
which source files and settings are required to build a Compose application. After creating
the build configuration the compile time is started.
The creation of a build configuration can be done manually or by using a plug-in. Examples
of these plug-ins are the Visual Studio add-in for Compose /.NET and the Eclipse plug-in for
Compose /J and Compose /C.

2.4.2   Compile Time

The compile time layer is platform independent and reasons about the correctness of the com-
position filter implementation with respect to the program which allows the target program to
be build by the adaptation.
The compile time ‘pre-processes’ the composition filter specifications by parsing the specifica-
tion, resolving the references, and checking its consistency. To provide an extensible architec-
ture to facilitate this process a blackboard architecture is chosen. This means that the compile
time uses a general knowledgebase that is called the ‘repository’. This knowledgebase contains
the structure and metadata of the program which different modules can execute their activities
on. Examples of modules within analysis and validation are the three modules SANE, LOLA
and FILTH. These three modules are responsible for (some) of the analysis and validation of
the super imposition and its selectors.

2.4.3   Adaptation

The adaptation layer consists of the program manipulation, harvester, and code generator.
These components connect the platform independent compile time to the target platform. The
harvester is responsible for gathering the structure and the annotations within the source pro-
gram and adding this information to the knowledgebase. The code generation generates a
reduced copy of the knowledgebase and the weaving specification. This weaving specification
is then used by the weaver contained by the program manipulation to weave in the calls to the
runtime into the target program. The end result of the adaptation the target program which
interfaces wit the runtime.

2.4.4   Runtime

The runtime layer is responsible for executing the concern code at the join points. It is acti-
vated at the join points by function calls that are woven in by the weaver. A reduced copy of
the knowledgebase containing the necessary information for filter evaluation and execution is
enclosed with the runtime. When the function is filtered the filter is evaluated. Depending on
if the the condition part evaluates to true, and the matching part matches the accept or reject
behavior of the filter is executed. The runtime also facilitates the debugging of the composition
filter implementations.

M.D.W. van Oudheusden                                                                        21
2.5 Platforms
2.5     Platforms

Compose can in theory be applied to any programming language given certain assumptions
are met. Currently Compose has three platforms.

2.5.1   Java

Compose /J, the Java platform of Compose , uses different compiling and weaving tools then
the other platforms. For the use of Compose /J an Eclipse plug-in is provided.

2.5.2   C

Compose /C, the C platform of Compose , is different from its Java and .NET counterparts
because it does not have a runtime interpreter. This implies that the filters implementation
of Compose /C uses generated composition filter code that is weaved directly in the source
code. Because the programming language C does not have the concept of objects the reasoning
within Compose is based on sets of functions. Like the Java platform, Compose /C provides
a plug-in for Eclipse.

2.5.3   .NET

The .NET platform called Compose /.NET of Compose is the oldest implementation of
Compose . Because Compose /.NET works with CIL code, it is programming language inde-
pendent as long as the programming language can be compiled to CIL code. The .NET platform
uses a Visual Studio add-in for ease of development.

2.6     Features Specific to Compose

The Composition Filters approach uses a restricted (pattern matching) language to define fil-
ters. This language makes it possible to reason about the semantics of the concern. Compose
offers three features that use this possibility, which originate in more control and correctness
over an application under construction. These features are:
Ordering of filter modules
      It is possible to specify how the superimposition of filter modules should be ordered.
      Ordering constraints can be specified in a fixed, conditional, or partial manner. A fixed
      ordering can be calculated exactly, whereas a conditional ordering is dependent on the re-
      sult of filter execution and therefore evaluated at runtime. When there are multiple valid
      orderings of filtermodules on a join point, partial ordering constraints can be applied to
      reduce this number. These constraints can be declared in the concern definition;
Filter consistency checking
      When superimposition is applied, Compose is able to detect if the ordering and con-
      junction of filters creates a conflict. For example, imagine a set of filters where the first
      filter only evaluates method m and another filter only evaluates methods a and b. In this

22                                       Automatic Derivation of Semantic Properties in .NET
                                                             2.6 Features Specific to Compose
     case the latter filter is only reached with method m; this is consequently rejected and as a
     result the superimposition may never be executed. There are different scenarios that lead
     to these kinds of problems, e.g., conditions that exclude each other;
Reason about semantic problems
     When multiple pieces of advice are added to the same join point, Compose can reason
     about problems that may occur. An example of such a conflict is the situation where a
     real-time filter is followed by a wait filter. Because the wait filter can wait indefinitely, the
     real-time property imposed by the real-time filter may be violated.
The above mentioned conflict analyzers all work on the assumption that the behavior of every
filter is well-defined. This is not the case for the meta filter, its user-undefined, and therefore
unpredictable, behavior poses a problem to the analysis tools.
Furthermore, Compose is extended with features that enhance the usability. These features
are briefly described below:
Integrated Development Environment support
     The Compose implementations all have a IDE plug-in; Compose /.NET for Visual Stu-
     dio, Compose /J and Compose /C for Eclipse;
Debugging support
     The debugger shows the flow of messages through the filters. It is possible to place break-
     points to view the state of the filters;
Incremental building process
     When a project is build and not all the modules are changed, incremental building saves
Some language properties of Compose can also be seen as features, being:
Language independent concerns
     A Compose concern can be used for all the Compose platforms, because the composi-
     tion filters approach is language independent;
Reusable concerns
     The concerns are easy to reuse, through the dynamic filter modules and the selector lan-
Expressive selector language
     Program elements of an implementation language can be used to select a set of objects to
     superimpose on;
Support for annotations
     Using the selector, annotations can be woven at program elements. At the moment anno-
     tations can be used for superimposition.

M.D.W. van Oudheusden                                                                          23
                                                                            CHAPTER         3

                                               Introduction to the .NET Framework

This chapter gives an introduction to the .NET Framework of Microsoft. First, the architecture
of the .NET Framework is introduced. This section includes terms like the Common Language
Runtime, the .NET Class Library, the Common Language Infrastructure and the Intermediate
Language. These are discussed in more detail in the sections following the architecture.

3.1   Introduction

Microsoft defines [57] .NET as follows; “.NET is the Microsoft Web services strategy to con-
nect information, people, systems, and devices through software.”. There are different .NET
technologies in various Microsoft products providing the capabilities to create solutions using
web services. Web services are small, reusable applications that help computers from many
different operating system platforms work together by exchanging messages. Based on indus-
try standards like XML (Extensible Markup Language), SOAP (Simple Object Access Protocol),
and WSDL (Web Services Description Language) they provide a platform and language inde-
pendent way to communicate.
Microsoft products, such as Windows Server System (providing web services) or Office Sys-
tem (using web services) are some of the .NET technologies. The technology described in this
chapter is the .NET Framework. Together with Visual Studio, an integrated development envi-
ronment, they provide the developer tools to create programs for .NET.
Many companies are largely dependent on the .NET Framework, but need or want to use AOP.
Currently there is no direct support for this in the Framework. The Compose /.NET project
is addressing these needs with its implementation of the Composition Filters approach for the
.NET Framework.
This specific Compose version for .NET has two main goals. First, it combines the .NET
Framework with AOP through Composition Filters. Second, Compose offers superimposition

                                                      3.2 Architecture of the .NET Framework
in a language independent manner. The .NET Framework supports multiple languages and is,
as such, suitable for this purpose. Composition Filters are an extension of the object-oriented
mechanism as offered by .NET, hence the implementation is not restricted to any specific object-
oriented language.

3.2   Architecture of the .NET Framework

The .NET Framework is Microsoft’s platform for building, deploying, and running Web Ser-
vices and applications. It is designed from scratch and has a consistent API providing support
for component-based programs and Internet programming. This new Application Program-
ming Interface (API) has become an integral component of Windows. The .NET Framework
was designed to fulfill the following objectives [54]:
      Allow object code to be stored and executed locally, executed locally but Internet-
      distributed, or executed remotely and to make the developer experience consistent across
      a wide variety of types of applications, such as Windows-based applications and Web-
      based applications;
      The ease of operation is enhanced by minimizing versioning conflicts and providing bet-
      ter software deployment support;
      All the code is executed safely, including code created by an unknown or semi-trusted
      third party;
      The .NET Framework compiles applications to machine code before running thus elimi-
      nating the performance problems of scripted or interpreted environments;
      Code based on the .NET Framework can integrate with other code because all communi-
      cation is built on industry standards.
The .NET Framework consists of two main components [54]: the Common Language Run-
time (CLR, simply called the .NET Runtime or Runtime for short) and the .NET Framework
Class Library (FCL). The CLR is the foundation of the .NET Framework, executing the code
and providing the core services such as memory management, thread management and ex-
ception handling. The CLR is described in more detail in Section 3.3. The class library, the
other main component of the .NET Framework, is a comprehensive, object-oriented collection
of reusable types that can be used to develop applications ranging from traditional command-
line or graphical user interface (GUI) applications to applications such as Web Forms and XML
Web services. Section 3.5 describes the class libraries in more detail.
The code run by the runtime is in a format called Common Intermediate Language (CIL), fur-
ther explained in Section 3.6. The Common Language Infrastructure (CLI) is an open specifi-
cation that describes the executable code and runtime environment that form the core of the
Microsoft .NET Framework. Section 3.4 tells more about this specification.
Figure 3.1 shows the relationship of the .NET Framework to other applications and to the com-
plete system. The two parts, the class library and the runtime, are managed, i.e., applications

M.D.W. van Oudheusden                                                                       25
3.2 Architecture of the .NET Framework

                  Figure 3.1: Context of the .NET Framework (Modified) [54]

managed during execution. The operating system is in the core, managed and unmanaged
applications operate on the hardware. The runtime can us other object libraries and the class
library, but the other libraries can use the same class library them self.
Besides the Framework, Microsoft also provides a developer tool called the Visual Studio. This
is an IDE with functionality across a wide range of areas allowing developers to build appli-
cations with decreased development time in comparison with developing applications using
command line compilers.

3.2.1   Version 2.0 of .NET

In November 2005, Microsoft released a successor of the .NET Framework. Major changes are
the support for generics, the addition of nullable types, 64 bit support, improvements in the
garbage collector, new security features and more network functionality.
Generics make it possible to declare and define classes, structures, interfaces, methods and del-
egates with unspecified or generic type parameters instead of specific types. When the generic
is used, the actual type is specified. This allows for type-safety at compile-time. Without gener-
ics, the use of casting or boxing and unboxing decreases performance. By using a generic type,
the risks and costs of these operations is reduced.
Nullable types allow a value type to have a normal value or a null value. This null value can
be useful for indicating that a variable has no defined value because the information is not
currently available.
Besides changes in the Framework, there are also improvements in the four main Microsoft
.NET programming languages (C#, VB.NET, J# and C++). The language elements are now
almost equal for all languages. For instance, additions to the Visual Basic language are the
support for unsigned values and new operators and additions to the C# language include the
ability to define anonymous methods thus eliminating the need to create a separate method.

26                                       Automatic Derivation of Semantic Properties in .NET
                                                              3.3 Common Language Runtime
A new Visual Studio 2005 edition was released to support the new Framework and functional-
ities to create various types of applications.

3.3      Common Language Runtime

The Common Language Runtime executes code and provides core services. These core services
are memory management, thread execution, code safety verification and compilation. Apart
from providing services, the CLR also enforces code access security and code robustness. Code
access security is enforced by providing varying degrees of trust to components, based on a
number of factors, e.g., the origin of a component. This way, a managed component might
or might not be able to perform sensitive functions, like file-access or registry-access. By im-
plementing a strict type-and-code-verification infrastructure, called the Common Type System
(CTS), the CLR enforces code robustness. Basically there are two types of code;

    Managed code is code, which has its memory handled and its types validated at execu-
    tion by the CLR. It has to conform to the Common Type Specification (CTS Section 3.4). If
    interoperability with components written in other languages is required, managed code
    has to conform to an even more strict set of specifications, the Common Language Spec-
    ification (CLS). The code is run by the CLR and is typically stored in an intermediate
    language format. This platform independent intermediate language is officially known
    as Common Intermediate Language (CIL Section 3.6) [82].
    Unmanaged code is not managed by the CLR. It is stored in the native machine language
    and is not run by the runtime but directly by the processor.

All language compilers (targeting the CLR) generate managed code (CIL) that conforms to the

At runtime, the CLR is responsible for generating platform specific code, which can actually
be executed on the target platform. Compiling from CIL to the native machine language of
the platform is executed by the just-in-time (JIT) compiler. Because of this language indepen-
dent layer it allows the development of CLRs for any platform, creating a true interoperability
infrastructure [82]. The .NET Runtime from Microsoft is actually a specific CLR implementa-
tion for the Windows platform. Microsoft has released the .NET Compact Framework especially
for devices such as personal digital assistants (PDAs) and mobile phones. The .NET Com-
pact Framework contains a subset of the normal .NET Framework and allows .NET developer
to write mobile applications. Components can be exchanged and web services can be used
so an easier interoperability between mobile devices and workstations/servers can be imple-
mented [56].

At the time of writing, the .NET Framework is the only advanced Common Language Infras-
tructure (CLI) implementation available. A shared-source1 implementation of the CLI for re-
search and teaching purposes was made available by Microsoft in 2002 under the name Ro-
tor [73]. In 2006 Microsoft released an updated version of Rotor for the .NET platform version
two. Also Ximian is working on an open source implementation of the CLI under the name
       Only non-commercial purposes are allowed.

M.D.W. van Oudheusden                                                                       27
3.4 Common Language Infrastructure
Mono1 , targeting both Unix/Linux and Windows platforms. Another, somewhat different ap-
proach, is called Plataforma.NET2 and aims to be a hardware implementation of the CLR, so
that CIL code can be run natively.

3.3.1    Java VM vs .NET CLR

There are many similarities between Java and .NET technology. This is not strange, because
both products serve the same market.
Both Java and .NET are based on a runtime environment and an extensive development frame-
work. These development frameworks provide largely the same functionality for both Java
and .NET. The most obvious difference between them is lack of language independence in
Java. While Java’s strategy is ‘One language for all platforms’ the .NET philosophy is ‘All lan-
guages on one platform’. However these philosophies are not as strict as they seem. As noted
in Section 3.5 there is no technical obstacle for other platforms to implement the .NET Frame-
work. There are compilers for non-Java languages like Jython (Python) [45] and WebADA [1]
available for the JVM. Thus, the JVM in its current state, has difficulties supporting such a vast
array of languages as the CLR. However, the multiple language support in .NET is not optimal
and has been the target of some criticism.
Although the JVM and the CLR provide the same basic features they differ in some ways. While
both CLR and the modern JVM use JIT (Just In Time) compilation the CLR can directly access
native functions. This means that with the JVM an indirect mapping is needed to interface
directly with the operating system.

3.4      Common Language Infrastructure

The entire CLI has been documented, standardized and approved [43] by the European associ-
ation for standardizing information and communication systems, Ecma International3 . Benefits
of this CLI for developers and end-users are:
      • Most high level programming languages can easily be mapped onto the Common Type
        System (CTS);
      • The same application will run on different CLI implementations;
      • Cross-programming language integration, if the code strictly conforms to the Common
        Language Specification (CLS);
      • Different CLI implementations can communicate with each other, providing applications
        with easy cross-platform communication means.
This interoperability and portability is, for instance, achieved by using a standardized meta
data and intermediate language (CIL) scheme as the storage and distribution format for appli-
cations. In other words, (almost) any programming language can be mapped to CIL, which in
turn can be mapped to any native machine language.
     An European industry association founded in 1961 and dedicated to the standardization of Information and
Communication Technology (ICT) Systems. Their website can be found at http://www.ecma-international.

28                                            Automatic Derivation of Semantic Properties in .NET
                                                                              3.5 Framework Class Library

                                   Figure 3.2: Relationships in the CTS

The Common Language Specification is a subset of the Common Type System, and defines the
basic set of language features that all .NET languages should adhere to. In this way, the CLS
helps to enhance and ensure language interoperability by defining a set of features that are
available in a wide variety of languages. The CLS was designed to include all the language
constructs that are commonly needed by developers (e.g., naming conventions, common prim-
itive types), but no more than most languages are able to support [55]. Figure 3.2 shows the
relationships between the CTS, the CLS, and the types available in C++ and C#. In this way the
standardized CLI provides, in theory1 , a true cross-language and cross-platform development
and runtime environment.
To attract a large number of developers for the .NET Framework, Microsoft has released CIL
compilers for C++, C#, J#, and VB.NET. In addition, third-party vendors and open-source
projects also released compilers targeting the .NET Framework, such as Delphi.NET, Perl.NET,
IronPython, and Eiffel.NET. These programming languages cover a wide-range of different
programming paradigms, such as classic imperative, object-oriented, scripting, and declara-
tive languages. This wide coverage demonstrates the power of the standardized CLI.
Figure 3.3 shows the relationships between all the main components of the CLI. The top of the
figure shows the different programming languages with compiler support for the CLI. Because
the compiled code is stored and distributed in the Common Intermediate Language format,
the code can run on any CLR. For cross-language usage this code has to comply with the CLS.
Any application can use the class library (the FCL) for common and specialized programming

3.5     Framework Class Library

The .NET Framework class library is a comprehensive collection of object-oriented reusable
types for the CLR. This library is the foundation on which all the .NET applications are built.
It is object oriented and provides integration of third-party components with the classes in the
.NET Framework. Developers can use components provided by the .NET Framework, other
     Unfortunately Microsoft did not submit all the framework classes for approval and at the time of writing only
the .NET Framework implementation is stable.

M.D.W. van Oudheusden                                                                                          29
3.5 Framework Class Library

Figure 3.3: Main components of the CLI and their relationships. The right hand side of the
figure shows the difference between managed code and unmanaged code.

30                                    Automatic Derivation of Semantic Properties in .NET
                                                           3.6 Common Intermediate Language

                          Figure 3.4: From source code to machine code

developers and their own components. A wide range of common programming tasks (e.g.,
string management, data collection, reflection, graphics, database connectivity or file access)
can be accomplished easily by using the class library. Also a great number of specialized de-
velopment tasks are extensively supported, like:

      •   Console applications;
      •   Windows GUI applications (Windows Forms);
      •   Web applications (Web Forms);
      •   XML Web services;
      •   Windows services.

All the types in this framework are CLS compliant and can therefore be used from any pro-
gramming language whose compiler conforms to the Common Language Specification (CLS).

3.6       Common Intermediate Language

The Common Intermediate Language (CIL) has already been mentioned briefly in the sections
before, but this section will describe the IL in more detail. All the languages targeting the .NET
Framework compile to this CIL (see Figure 3.4).

M.D.W. van Oudheusden                                                                          31
     3.6 Common Intermediate Language
     A .NET compiler generates a managed module which is an executable designed to be run by the
     CLR [65]. There are four main elements inside a managed module:
          • A Windows Portable Executable (PE) file header;
          • A CLR header containing important information about the module, such as the location
            of its CIL and metadata;
          • Metadata describing everything inside the module and its external dependencies;
          • The CIL instructions generated from the source code.
     The Portable Executable file header allows the user to start the executable. This small piece of
     code will initiate the just-in-time compiler which compiles the CIL instructions to native code
     when needed, while using the metadata for extra information about the program. This native
     code is machine dependent while the original IL code is still machine independent. This way
     the same IL code can be JIT-compiled and executed on any supported architecture. The CLR
     cannot use the managed module directly but needs an assembly.
     An assembly is the fundamental unit of security, versioning, and deployment in the .NET
     Framework and is a collection of one or more files grouped together to form a logical unit [65].
     Besides managed modules inside an assembly, it is also possible to include resources like im-
     ages or text. A manifest file is contained in the assembly describing not only the name, culture
     and version of the assembly but also the references to other files in the assembly and security
     The CIL is an object oriented assembly language with around 100 different instructions called
     OpCodes. It is stack-based, meaning objects are placed on an evaluation stack before the ex-
     ecution of an operation, and when applicable, the result can be found on the stack after the
     operation. For instance, when adding two numbers, first those numbers have to be placed onto
     the stack, second the add operation is called and finally the result can be retrieved from the
 1   .assembly AddExample {}
 3   .method static public void main() il managed
 4   {
 5     .entrypoint           // entry point of the application
 6     .maxstack 2
 8       ldc.i4 3               // Place a 32-bit (i4) 3 onto the stack
 9       ldc.i4 7               // Place a 32-bit (i4) 7 onto the stack
11       add                    // Add the two and
12                              // leave the sum on the stack
14       // Call static System.Console.Writeline function
15       // (function pops integer from the stack)
16       call void [mscorlib]System.Console::WriteLine(int32)
18       ret
19   }
                                Listing 3.1: Adding example in IL code

     To illustrate how to create a .NET program in IL code we use the previous example of adding
     two numbers and show the result. In Listing 3.1 a new assembly is created with the name
     AddExample. In this assembly a function main is declared as the starting point (entrypoint)

     32                                      Automatic Derivation of Semantic Properties in .NET
                                                              3.6 Common Intermediate Language
    of this assembly. The maxstack command indicates there can be a maximum of two objects
    on the stack and this is enough for the example method. Next, the values 3 and 7 are placed
    onto the stack. The add operation is called and the results stays on the stack. The method
    WriteLine from the .NET Framework Class Library is called. This method resides inside the
    Console class placed in the System assembly. It expects one parameter with a int32 as its type
    that will be retrieved from the stack. The call operation will transfer the control flow to this
    method passing along the parameters as objects on the stack. The WriteLine method does not
    return a value. The ret operation returns the control flow from the main method to the calling
    method, in this case the runtime. This will exit the program.
    To be able to run this example, we need to compile the IL code to bytecode where each OpCode
    is represented as one byte. To compile this example, save it as a text file and run the ILASM
    compiler with as parameter the filename. This will produce an executable runnable on all the
    platforms where the .NET Framework is installed.
    This example was written directly in IL code, but we could have used a higher level language
    such as C# or VB.NET. For instance, the same example in C# code is shown in Listing 3.2 and
    the VB.NET version is listed in Listing 3.3. When this code is compiled to IL, it will look like
    the code in Listing 3.1.
1   public static void main()
2   {
3         Console.WriteLine((int) (3 + 7));
4   }
                           Listing 3.2: Adding example in the C# language

1   Public Shared Sub main()
2         Console.WriteLine(CType((3 + 7), Integer))
3   End Sub
                        Listing 3.3: Adding example in the VB.NET language

    M.D.W. van Oudheusden                                                                        33
                                                                                CHAPTER         4


This chapter describes the motivation for designing and implementing a system for the
automatic derivation of semantic properties in .NET languages. The current state of the
Compose /.NET project is explained in the first section. How this system can be extended is
discussed in the second section. The last section mentions the general design goals.

4.1     Current State of Compose /.NET

The Compose /.NET project offers aspect-oriented programming for the Microsoft .NET
Framework through the composition filters model. An introduction to Compose is given
in Chapter 2 and information about the .NET Framework can be found in Chapter 3. Most of
the information discussed below can also be applied to other Aspect Oriented Programming
(AOP) implementations.

4.1.1   Selecting Match Points

With composition filters, the incoming and outgoing messages on an object can be intercepted
through the use of input filters and output filters. A filter has three parts: the filter identifier, the
filter type, and one or more filter elements. The filter element exist out of an optional condition
part, a matching part, and a substitution part. When a filter is evaluated, the matching part is
checked with the current message. A filter matches when both the condition as the matching
provide the boolean value true and at that point the message gets substituted with the values
of the substitution part.
The filters are superimposed on classes with a filter module binding. This binding has a se-
lection of objects on one side and a filter module on the other side. The selection is made
with a selector definition in the form of a selector language, based on the logical programming

                                                                4.1 Current State of Compose /.NET
     language Prolog. By using elements like isClassWithNameInList, isNamespaceWithName or
     isMethod the developer can specify the points in the code the filter applies to.

     This selection is based on syntactical properties, like naming and coding conventions or struc-
     tural properties, such as coding patterns and hierarchical relations. This approach is used in
     almost all the current AOSD techniques and has the following problems [78, 60, 3, 13]:
        • Coding conventions are not always used or used incorrectly. There are multiple reasons
          for this; the complexity and evolution of the application, refactoring of code or the lack of
          documentation. The pointcut definitions are fragile, changes in the code can easily break
          pointcut semantics, a problem which is hard to detect [48];
        • Method names are sometimes extended to be used as an identifier for join points. To
          provide a correct naming convention, they can become to long and this leads to name
        • Using specific naming conventions for the use of identifying join points violates the in-
          tention revealing naming rule in which methods names should represent the intention of
          the method code.
     The result of these problems is that it is by no means certain that the selected join point is the
     one intended to be found. It is also possible that some join points are not selected, while they
     should be selected;
     To illustrate the use of naming conventions, consider the following example. In most pro-
     gramming languages there are Get and Set methods allowing another object to read from and
     write to a private variable or field. Examples of such a getter and setter method are given in
     Listing 4.1.
 1   private string _stringValue;
 3   public string GetStringValue
 4   {
 5       return _stringValue;
 6   }
 8   public void SetStringValue(string value)
 9   {
10       _stringValue = value;
11   }
                            Listing 4.1: Getter and Setter examples in C# .NET

     To select all the methods setting a value, as in assigning a value to a variable, we can use a Set*
     pointcut. This will select all the methods beginning with the word Set. However, this will also
     match any methods called, for instance, Setup, Settings or Settle. In addition, methods
     actually assigning a value, but not having a name starting with the word Set are not selected.
     On the other hand, we might find Set methods with an implementation part doing something
     completely different than actually setting a value.
     In this case, the selection is performed on the syntactical level instead of the semantical level.
     There is no knowledge about the actual implementation and the assumed purpose of the
     method is retrieved by using (parts of) the signature, a unique identifier of the method. Us-
     ing coding and naming conventions to match points in the code does not give the best results.
     There are some possible solutions to this problem [35];

     M.D.W. van Oudheusden                                                                           35
4.1 Current State of Compose /.NET
     • Refactor the original code so coding and naming conventions can be used to define as-
       pects more correctly. However, refactoring for the sake of aspects is a bad idea and should
       actually only be performed to increase the quality of the code. Furthermore, the original
       source code should be, to a degree, a black box to the aspect designer and refactoring
       violates the goal of separation of concerns and AOP;
     • Use a list to enumerate all the join points by name. This requires knowledge about the
       source program and can lead to long enumerations in large software systems. This tech-
       nique is also not robust to changes in the original code;
     • Pattern matching, as used in Compose , provides more possibilities. Using wild cards
       and structural conditions (like is in class or has interface) the selector part is more robust.
       There is still a great dependency on naming conventions as shown in the previous exam-
     • By annotating methods with special tags, a developer can provide more information
       about the intended behavior of the method. Naming conventions do not have to be used,
       but the major drawback is the necessity to place annotations in the source code.

The main problem is the use of structure based properties and syntactic conventions [3]. The
selection of join points should be applied based on the behavior of a method and not on the
name of the method.

4.1.2   Program Analysis

Compose has some basic information about the source of the program. Information about
the types, their relations to other types and properties of these types are collected. This is all
syntactical information, i.e. it describes the structure of the program.

There is almost no information about the behavior of the program. Two methods can use the
same resources without any problems, but might give resource conflicts when an aspect is used.
If a condition is used by a concern, can this condition be changed by other pieces of code? Are
there any side effects while using an aspect on a method [52]?

There is a partial solution to these questions in the form of SECRET (see [24] and [72] for more
information). This module in Compose reasons about possible semantic conflicts due to the
composition of filter modules and it analyzes the resource usage of the filters. One type of filter,
the meta filter, passes a ReifiedMessage object to the selected method as a parameter. The object
that receives the meta message can observe and manipulate the message, then re-activate its
execution. This can lead to unexpected behavior of the ReifiedMessage and as a result to possible
conflicts between filters. A developer can annotate methods, using this ReifiedMessage, with a
textual description of the intended behavior of the message. The SECRET module uses this
information to detect possible conflicts between aspects. A major requirement is the need for
the developer to specify the behavior beforehand inside the code.

If there is semantical information about the code in the Compose repository, besides the cur-
rently available syntactical data, then more extensive code analysis can be performed to detect
conflicts, side effects, and so on.

36                                          Automatic Derivation of Semantic Properties in .NET
                                                                       4.2 Providing Semantical Information
4.1.3     Fine Grained Join Points

The current Composition Filter model in Compose works at the level of message call events. It
could be interesting to expand this model to a statement and expression level model [77]. This
way it is possible to select points inside the code as matching points for the filters. Possible
applications of this technique are code coverage analysis 1 [66] or code optimization [36].
Currently Compose does not support this type of fine grained join points, but operates on
the level of object interfaces. There is however work in progress to support this at a certain
level [16]. An issue here is the need for (semantical) information about the code itself, the
instructions or statements. Compose does not have the necessary information about the mes-
sage body implementation.

4.2      Providing Semantical Information

The three main issues described in the previous section (match point selection, program analy-
sis and fine grained join points) all suffer from the same problem; there is almost no semantical
information available. The behavior of the source code is not known. With more informa-
tion about the meaning of the code it is possible to solve some of the shortcomings mentioned
One of the solutions used by Compose is the use of annotations to describe the semantics of a
function. There are three major problems with this approach;
      • The developer must specify the semantics manually for each function. A time consuming
        process and easely skipped because it is not enforced.
      • The current annotation representation is not powerful. For instance, it lacks the ability to
        provide control flow information of the instructions inside the function.
      • It is possible the annotations are not consistent with the actual implementation due to an
        incorrect description of the semantics or by changes to the code.
This assignment is called the automatic derivation of semantic properties in .NET and is an attempt
to extract semantical information from a .NET assembly using existing tools. Thus providing a
way to detect possible conflicts and give more information to Compose .
Semantical information can not only be used for Compose but also by other applications
wanting to do source code analysis. For example, finding design patterns in the source code,
reverse engineer design documentation [69], generating pre- and postconditions [53], verifying
software contracts [9], checking behavioral subtyping [30], or any other type of statical analy-

4.3      General Design Goals

As stated in the previous section, the goal is to design and implement a system to automatically
derive semantic properties from .NET assemblies. To get an idea what type of behavior we are
       Code coverage analysis is a technique to determine whether a set of test cases satisfy an adequacy criterion.

M.D.W. van Oudheusden                                                                                              37
4.3 General Design Goals
interested in we first have to look at the different type of semantics that can occur in typical
programming languages (see Chapter 5).
The context of this assignment is the Compose /NET platform, so the sources to analyze are in
the .NET format called Intermediate Language. In Chapter 6 a detailed description is given of
the IL and how the semantical elements, described in Chapter 5, are represented in this format.
Although the assignment is for the .NET platform it would be an advantage if the semanti-
cal representation is language independent so it can also be used with other object-oriented
languages like Java or C++. One requirement however is the need to conserve the type infor-
mation. This will discussed in the design chapter (Chapter 7).
After a code analysis, the semantical information should be stored in some sort of model. This
model must contain enough information to reason about the behavior of the original program.
Not only what it is suppose to do, but also when it performs certain actions. For this, flow
information, like control flow, is important to save and this will threated in Chapter 5.
Storing the data in a metamodel is not sufficient. There must be a system to query this model
for the required information. The possible options and the implementation can be found in
Section 7.4. Examples of the use of this search mechanism are mentioned in Chapter 8.

38                                      Automatic Derivation of Semantic Properties in .NET
                                                                              CHAPTER         5


Before a model can be created to store semantical information about a program we must first
understand what semantic is and what type of behavior can be found in the sources of a pro-
gram. The first section provides a definition of semantic. The next sections describes how
semantic is represented in source code.

5.1   What is Semantics

Semantics is the study of meaning [15]. It comes from the Greek σηµαντ ικoς, semantikos, which
stands for significant meaning where sema is sign. Semantics is the branch of semiotics, the study
of signs and meaning [20]. The other two branches of semiotics are syntactics (the arrangement
of signs) and pragmatics (the relationship between the speaker and the signs). In discussing
natural and computer languages, the distinction is sometimes made between syntax (for ex-
ample, the order of computer commands) and semantics (which functions are requested in the
Syntax, coming from the Greek words συν, together, and τ αξις, sequence or order, is the study of
the formation of sentences and the relationship of their component parts. Looking at computer
languages, syntax is the grammar of the language statements where semantics is the meaning
of these statements. The statements are the so called signs used in semiotics.
Keep in mind that there is a distinct difference between syntax and semantics. Syntax is the
relation between signs, and semantics is the meaning of these signs. For computer languages
the syntax is very clearly defined as a grammar the developer has to use. This grammar is, for
instance, used to create an abstract syntax tree (AST), in which each node of the tree represents
a syntactic construct. The compiler uses this AST to create the actual program. Each element
of the grammar has a certain semantic and the composition of those elements in a certain order
forms the semantics of a program.

5.2 Semantics of Software
5.2     Semantics of Software

Understanding a software product is difficult. Usually the behavior is described in the docu-
mentation of the product but is often not complete or up-to-date [67]. There are tools which
aid in program understanding, like debuggers, metric calculators, program visualization and
animation techniques. Basically these tools reverse engineer the software and represent this at
a higher level of abstraction than that of the information which is directly extracted from the
code [68]. They differ in the way how the data is retrieved, a higher level model is created and
the information is presented.

5.2.1   Static and Dynamic Program Analysis

The collection of relevant data for an analyzer is performed by either static or dynamic analysis.
It is possible to combine those two techniques to get more precise data [19].

In static program analysis, only the code of the software is considered without actually execut-
ing the program built from this software. Usually the analysis is performed on the source code
of the software, but other representations like object code or byte code can also be used. This
type of analysis is suited to retrieve structural information, like the elements in the source such
as the classes, methods, and so forth.

Analyzing classes is hard, due to polymorphism and dynamic binding. Also information
about the references to objects (pointers) are difficult to catch with static analysis. Conditional
branches and iterations are only known at runtime, so the exact control flow information, the
sequence of executed commands, is difficult to extract when the software is not executed. Most
of the time there are multiple possible executing paths. Static analysis is limited due to the
amount of data available about the structure of the source and the parse tree of the statements
inside the methods of a class.

Dynamic program analysis uses program execution traces to gather information about the soft-
ware. This means the program is analyzed while it is running, usually by attaching a separate
program to it, called a profiler. The runtime behavior retrieved, provides information about the
actual usage of the program. The intended methods of classes are called and the types of the
objects are known. It also provides timing and object instance information, and the real values
of operands, which can not be retrieved using static program analysis.

However, the gathered information is the result of certain user input and can change during
a different program execution. Not all the paths in the control flow could be executed, thus
information can be missing.

Static and dynamic analysis can be combined so all the possible paths are known beforehand
using static analysis and the runtime behavior can be analyzed using dynamic analysis [27].
For instance, static analysis is used to retrieve all possible control flow paths in the software
that can be used to generate different input for the software at runtime. Dynamic analysis is
then used to retrieve the information during runtime using the path information found by the
static analysis.

40                                        Automatic Derivation of Semantic Properties in .NET
                                                                          5.3 Semantical Statements
    5.2.2   Software Models

    We can distinguish two models of software, namely structural and behavioral [32]. In the
    structural model the organization of the software is described, like inheritance and the rela-
    tions between components. The behavioral model describes how the elements in the software
    operate to carry out functions. Usually static analysis is used to get the structural model while
    dynamic analysis collects the behavioral model during an execution run.
    The system designed for this assignment tries to create a software model consisting of both
    structural and behavioral information using static analysis. The reason for using static instead
    of dynamic analysis is explained in the design chapter (see Chapter 7). Extracting a structural
    model is easier then extracting a behavioral model. We have to give meaning, semantics, to
    parts of the code without using dynamic analysis. This means we have to look at the statement
    level, in other words, the instructions inside the methods, to extract their meaning. Combining
    the behavior of the individual statements and the control flow could tell us more about the
    meaning of the complete method.

    5.3     Semantical Statements

    To extract the behavior of a program we can start bottom-up, which means we have to look at
    the finest and lowest part of the program, the statements, before working our way to the whole
    program [49]. Statements are the instructions executed inside a method. There are various
    types of statements. Some return a value, we then call those statements expressions. Statements
    can operate on zero or more operands. An operand can, for instance, be a value, a memory
    address, a function name or a constant literal. A statement operating on only one operand is
    called unary. If the statement works on two operands, it is called binary.
    The next sections show the major generic kinds of statements.

    5.3.1   Value Assignment

    A commonly used statement is the assignment of values. This can be a unary or a binary
    statement. With an assignment statement a value is assigned to a variable, a symbol denoting
    a quantity or symbolic representation. A value can be a single value, but also the result of
    an operation such as the binary operations adding, multiplying, subtracting, dividing, and so
    on. A unary assignment statement is the negative operation. In Listing 5.1 some examples are
    shown of assignments in the C# language.
1   a = 4;                    // Assign the value 4 to the variable a
2   b = a + 5;                // Add 5 to the value of a and assign to b
3   s = "Hello World";        // Place the text in string s
                                 Listing 5.1: Assignment examples in C#

    Semantically, the assignment statement changes the state of variables. If there is an expression,
    like adding, the result of this expression is evaluated and assigned to the variable. A prior
    value, stored in the variable, is replaced by the new value.

    M.D.W. van Oudheusden                                                                         41
     5.3 Semantical Statements
     5.3.2   Comparison of Values

     Comparing two values is an operation which is frequently used for controlling the flow of the
     program. A simple example is conditional branching like the If...Then...Else construction, but
     also loops (do...while, do...until, while...loop constructions), loops and switch opera-
     tions. Basically a condition is checked determining if the loop must be exited or continued. An
     example of some comparison statements can be seen in Listing 5.2.
 1   if (x > 4)
 2   {
 3     // perform action when x has a value greater then 4
 5       while (x < 10)
 6       {
 7          x = x + 1;
 8       }
 9   }
10   else
11   {
12     // else, perform another action
14       for (int i; i < 10; i++)
15       {
16          x = x + i;
17       }
18   }
                                 Listing 5.2: Comparison examples in C#

     A comparison always works on two values and the result is stored in a destination operand.
     This result is either true or false. There are different kinds of comparison as shown in Table 5.1.

                                 Description                  Sign    Abbr.
                                 A is equal to B              A=B     EQ
                                 A is not equal to B          A=B     NE
                                 A is less then B             A<B     LT
                                 A is less or equal to B      A≤B     LTE
                                 A is greater then B          A>B     GT
                                 A is greater or equal to B   A≥B     GTE

                                     Table 5.1: Comparison operators

     Together with branching, discussed in the next section, the comparison statements are an im-
     portant element for the control flow. Based on the relation of one value to another value certain
     actions are performed.

     5.3.3   Branching Statements

     Branching statements change the flow of control to another point in the code. This point is
     identified with a label, like a line number or a legal identifier. We can distinguish two types of

     42                                        Automatic Derivation of Semantic Properties in .NET
                                                                             5.3 Semantical Statements
    Conditional A conditional branch occurs only if a certain condition expression evaluates to
    Unconditional With unconditional branching, the control flow is directly transfered to a new
        location without any conditions.
    In Listing 5.2, two branches are visible for the first condition (x > 4). If the value of x is greater
    then 4, it moves the control flow to the statements directly after the if statement. If x is equal
    or less then 4, the control flow is moved to the statements after the else statement.
    Typical unconditional branching commands are the continue or break statements which explic-
    itly jump to a location in the code without checking for a condition. These statements can, of
    course, exist inside another condition and are as such conditional.
    Branching is an important semantic because the flow of the program is controlled with these
    statements. Together with the conditions, branching makes it possible to use iteration state-
    ments, such as while, for, or foreach loops.

    5.3.4   Method Calling

    Because branching only moves the control flow inside a method, we need a special statement
    to indicate the move of the control flow to another method. When this method has finished
    processing its statements, the flow will be returned to the calling method. In most program-
    ming languages it is possible to specify input for the new method in the form of parameters
    and the called method can also return a value.
    Inside the method, a special statement is available to return to the calling method. Usually this
    is the return statement and if the method should return a value, this value can be returned

    5.3.5   Exception Handling

    It can be necessary to apply exception handling to certain statement. When an exception is
    thrown by one of those guarded instructions, a special block of code can handle this exception.
1   try
2   {
3     x = 4 / y;
4   }
5   catch (Exception ex)
6   {
8   }
                             Listing 5.3: Exception handling example in C#

    In Listing 5.3 the combined assignment and division statement are placed inside a guarded
    block. if, for instance, y is zero, a division by zero exception will occur. This exception will
    be handled by the statements inside the catch block. If there is no exception handling, the
    exceptions will be thrown upwards to the calling method until eventually the runtime itself.
    We are interested in this information because it provides insight in the capability of the code to
    handle exceptions. A division by zero and a not initialized value of x or y are the only exceptions

    M.D.W. van Oudheusden                                                                             43
5.3 Semantical Statements
which can occur in example 5.3. However, all the exceptions are catched using the general
Exception class. By using the information about the possible exceptions, we can chose to catch
more specific types of exceptions instead of all the exceptions.

5.3.6   Instantiation

The instantiation of a new object or variable can be important to detect. If an object is not
instantiated its internal functionality can not be accessed. Not only an object can be created,
also an array or a value type. A type indicates a set of values that have the same sort of generic
meaning or intended purpose. In object-oriented languages, an object is an individual unit
which is used as the basic building block of programs. An object is created from a class and is
called an instance of that class.
Most object-oriented languages divide types into reference types and value types. Value types
are stored on the stack. The variable contains the data itself. Reference types consist of two
parts. A reference or handle is stored on the stack. The object itself is stored on the heap, also
called the managed heap in .NET languages [44].
Value types tend to be simply, like a character or a number, while reference types are more
complex. Every object is a reference type and must be explicitly created by the developer
using a special new statement. Value types can be accessed directly and do not need to be
created, however they need to be initialized to a default value, e.g., 0. Usually this is handled
automatically by the runtime.
The creation of a new object or the (re)initialization of variables is also important to detect. If
an object is not instantiated it can not be used and could generate errors are runtime. Knowing
when a value has a certain value, even if this is the default value, can be interpreted as an
assignment with a default value.

5.3.7   Type Checking

Although types are directly related to a certain language, it is still important semantic infor-
mation. Adding a string to an integer is probably syntactically correct, but semantically it is
incorrect. We need to know what type of data we are talking about. A value being a string has
as such a different meaning than when the value has a numeric type.
Type checking is the process of verifying and enforcing the constraints of types. This can occur
at compile time, called static checking, or at runtime, called dynamic checking. Usually both
techniques are used. When a compiler performs static checking of the types in the source code,
it is performing a semantical analysis. Semantical information is added to the parse tree and
used to check for inconsistencies [38].
For the purpose of this assignment, we can distinguish two different kinds of semantical type
Compile time
    At compile time, the types of all the variables must be known. When the analyzer de-
    signed for this assignment is run on the code, it knows the types of all the elements.

44                                        Automatic Derivation of Semantic Properties in .NET
                                                                                 5.4 Program Semantics
     During runtime, the type information of a variable can change. This is called type casting
     and we need to know the new type the variable will become.

5.3.8    Data Conversion

As explained in the previous section, we need to store type information. However at runtime
the type of a variable can change because of casting or (un)boxing. Boxing is a mechanism for
converting value types to reference types. The value type is placed inside a box so it can be
used when an object reference is needed [44].

Data conversions change the type and thus the meaning of a variable. As such, it is interesting
semantical information we need to be able to reason about the contents of variables.

5.4     Program Semantics

In the previous section the lowest level of the code, the statements, were described. Multiple
statements are grouped together inside a method to perform a specific action. Sometimes a
method has a set of input1 parameters to parameterize those actions, and possibly an output
value (called return value) of some type. In object-oriented languages, a method resides in a
class and is providing a service for this particular object. A method is used to perform some
task of the object. For example; a class called Car can provide the services Accelerate and Brake
to control the inner state of the Car object.

Not only the class and methods tell something about the semantics of a program, also the rela-
tions between the different classes provide added information about the behavior [69]. Com-
ponents have to work together to execute the tasks of the complete program. Detecting and
recognizing these interactions in the source code using static analysis is not an easy task. Be-
cause of polymorphism, allowing a single definition to be used with different types of data, it is
difficult to determine which method is actually executed at runtime. Inheritance, the ability to
create new classes using existing classes, introduces the problem that the behavior of a subclass
is not solely defined in the class itself, but spread over multiple classes.

Semantically analyzing the source code of a program using static analysis techniques is thus a
difficult process [81]. Gamma [31] says the following about the relation between run-time and

        “An object-oriented program’s run-time structure often bears little resemblance
        to its code structure. The code structure is frozen at compile-time; it consists of
        classes in fixed inheritance relationships. A programs’ run-time structure consists
        of rapidly changing networks of communicating objects. In fact, the two structures
        are largely independent.”

    Some languages also allow output parameters, where the storage location of the variable specified on the
invocation is used.

M.D.W. van Oudheusden                                                                                   45
    5.4 Program Semantics
    5.4.1     Slicing

    If we cannot give a definite description of the semantics of the whole program, we can at least
    try to describe the behavior of the individual methods. A useful technique is called slicing, in-
    troduced by Weiser [84]. Slicing is used to highlight statements that are relevant to a particular
    computation and are as such semantically related [76]. Again, we can make a distinction be-
    tween static and dynamic slicing. With static slicing no assumptions are made, while dynamic
    slicing depends on specific test data.
    Slicing depends on the control and data flow of the statements inside a method. Control flow is
    the order in which the individual statements are executed. Data flow follows the trail of a data
    item, such as a variable, as it is created or handled by a program [29]. With slicing we can find
    out which statements contain variables that can be affected by another variable. This is called
    a backward static slice as it is computed by a backwards traversal of the statements beginning
    by the variable we are interested in.

    5.4.2     Method Example

    If we want to create a method with a specific purpose, we usually specify this method first
    in some sort of formal requirements specification, a high level view of the functionality. For
    instance, we need a method called AssignWhenMoreThenOne with the following requirement:
    “Method AssignWhenMoreTheOne must assign a value of 1 to the global variable moreThenOne
    if its first parameter is greater then or equal to 2.”.
    This single sentence describing the method AssignWhenMoreThenOne can be split into multiple
    semantical elements:
         •   reading the value of the first parameter;
         •   reading of a constant value of 2;
         •   comparison of two values;
         •   the use of the greater then or equal to operator;
         •   branching based on the comparison;
         •   assignment of a value 1 to global variable moreThenOne if the condition holds.
    We can implement this method in various ways and in different programming languages as
    shown in Listing 5.4, Listing 5.5 and Listing 5.6.
1   public int moreThenOne;
3   public void AssignWhenMoreThenOne(int stockAmount)
4   {
5     if (stockAmount >= 2)
6       moreThenOne = 1;
7   }
                        Listing 5.4: Method AssignWhenMoreThenOne in C# .NET

1   Public moreThenOne As Integer
3   Public Sub AssignWhenMoreThenOne(ByVal stockAmount As Integer)
4     If stockAmount >= 2 Then
5       moreThenOne = 1
6     End If

    46                                         Automatic Derivation of Semantic Properties in .NET
                                                                               5.4 Program Semantics
7    End Sub
                        Listing 5.5: Method AssignWhenMoreThenOne in VB .NET

1    procedure Module1.AssignWhenMoreThenOne(stockAmount: Integer);
2    begin
3      if (stockAmount >= 2) then
4         Module1.moreThenOne := 1
5    end;
                     Listing 5.6: Method AssignWhenMoreThenOne in Borland Delphi

     While the previous examples differ in syntax they have the same semantics. The C# and
     VB .NET examples both compile to the Common Intermediate Language as shown in List-
     ing 5.7.
 1   .method public static void AssignWhenMoreThenOne(int32 stockAmount) cil managed
 2   {
 3     // Code Size: 21 byte(s)
 4     .maxstack 2
 5     .locals init (bool flag1)
 7       L_0000:   nop
 8       L_0001:   ldarg.0
 9       L_0002:   ldc.i4.2
10       L_0003:   clt
11       L_0005:   ldc.i4.0
12       L_0006:   ceq
13       L_0008:   stloc.0
14       L_0009:   ldloc.0
15       L_000a:   brfalse.s L_0012
16       L_000c:   ldc.i4.1
17       L_000d:   stsfld int32 ConsoleApplication2.Module1::moreThenOne
18       L_0012:   nop
19       L_0013:   nop
20       L_0014:   ret
21   }
           Listing 5.7: Method AssignWhenMoreThenOne in Common Intermediate Language

     From this IL code it is still possible to determine the semantics we mentioned before. The
     ldarg.0 loads the first parameter on the stack; the ldc.i4.2 OpCode puts the value of 2 on
     the stack. Both values are used by the compare less then (clt) assignment where the result is
     also put on the stack. A zero value is loaded (ldc.i4.0) and a compare for equal is performed
     (ceq) where the result is stored (stloc.0) in a variable. Based on this value, which is loaded
     back onto the stack (ldloc.0) a branch action is performed (brfalse.s) to label L 0012 when
     the value is false. If the value of the variable on the stack was true, the constant value of 1 is
     placed on the stack (ldc.i4.0) and stored in the variable moreThenOne (stsfld).
     Although the C#, VB and Delphi code samples where practically the same, the IL code is some-
     what different. The compiler introduced two comparisons and a new local variable to hold
     the result of one of these comparisons. Also the branching is reversed; the check is now for a
     negative instead of a positive value. Furthermore the IL is a stack based language. Still this
     piece of code behaves as indicated by the definition of method AssignWhenMoreThenOne.
     With the information described in this chapter, we know what type of semantical constructions
     we are interested in. Not only the constructions are important semantical information, also

     M.D.W. van Oudheusden                                                                          47
5.4 Program Semantics
the control flow and the operands play an important part in the behavior of a function. The
next chapter discusses how these constructions and the related data are represented in the
Intermediate Language and how this information can be extracted from the code.

48                                    Automatic Derivation of Semantic Properties in .NET
                                                                               CHAPTER        6

                                               Analyzing the Intermediate Language

In Chapter 3 the .NET Framework is introduced and a brief introduction to the Common In-
termediate Language (IL) is presented. This assignment uses the Compose /.NET project and
hence the .NET languages, so we need access to the languages represented in the Intermediate
Language. This chapter provides more details of the IL and explains how to access this IL.

6.1     Inside the IL

An intermediate language is a CPU-independent instruction set, which resides between the
source code and the machine code. This has several advantages;
      • The code can be optimized just before it is executed;
      • Allows for a platform independent layer before generating a platform dependent version.
        Optimization can occur per platform;
      • Interoperability of other languages compiling to the same IL. Functionality in the IL can
        be shared;
      • Multiple different kinds of higher level languages can compile to this intermediate lan-
        guage, so a large number of languages can be supported.
Two major Intermediate Languages are Java byte code and the .NET IL. In this section we will
only discuss the .NET IL.

6.1.1    Modules

A .NET application consists of one or more managed executables, each of which carries meta-
data and (optionally) managed code [51]. Managed executables are called modules and they
basically contain two major components; metadata and IL code. Modules are being used by
two components in the CLR (see Section 3.3); a loader and the just-in-time (JIT) compiler.

6.1 Inside the IL
The loader is responsible for reading the metadata and creating an internal representation and
layout of the classes and their members. Only when a class is needed, it is loaded. When
loading a class, the loader runs a series of consistency checks on the related metadata.

The JIT compiler compiles the methods encoded in IL into the native code of the underlying
platform. The runtime does not execute the IL code, but the IL code is compiled in memory into
the native code and this native code is executed. Only when a method is called, it is compiled.
It is possible to precompile modules to native code for faster execution. The original file must
still be present since it contains the metadata.

When the IL code is compiled, it is also optimized by the JIT compiler. This means that the
original IL code is almost not optimized because the target architecture is only know at run-
time. The JIT compiler performs optimization algorithms like method inlining, constant fold-
ing, dead code elimination, loop unrolling, constant and copy propagation, and so one.

The file format of a managed module is based on the standard Microsoft Windows Portable
Executable and Common Object File Format (PE/COFF). As such, a managed module is exe-
cutable by the operating system. Figure 6.1 shows the structure of a managed module. When
the module is invoked by the operating system, the .NET runtime can seize control over the

                    Figure 6.1: Structure of a managed executable module

50                                      Automatic Derivation of Semantic Properties in .NET
                                                                                   6.1 Inside the IL
6.1.2   Metadata

Metadata is data that describes data. In the context of the common language runtime, metadata
means a system of descriptors of all items that are declared or referenced in a module [51].
In a module there is a collection of tables containing different kinds of metadata. One table,
the TypeDef table, lists all the types defined in the module. Another table lists the methods
implemented by those types, another lists the fields, another lists the properties, and so on [65].
Additional tables hold collections of external type references (types and type members in other
modules that are used by this module), the assemblies containing the external types, and so on.
Metadata also describes the structural layout of all the types in the module. Besides the tables
to store the metadata, there are also heaps in which a sequence of items is stored. For instance,
a list of strings, binary objects, and so on. The runtime loader analyzes the consistency and
protects the metadata headers and the metadata itself, making sure it can not be changed and
pose security risks.

6.1.3   Assemblies

The CLR can not use managed modules directly, but requires an assembly. An assembly is a
deployment unit and contains metadata, (optionally) managed code and sometimes resources.
The metadata is a system of descriptors of all the structural items of the application. It describes
the classes, their members and attributes, relations, etcetera. Part of the metadata is the man-
ifest. It provides information about the assembly itself like the name, version, culture, public
key and so on. Furthermore it describes the relations to other files, like resources and other
assemblies, and contains security demands. An example of an assembly is shown in Figure 6.2.

                         Figure 6.2: Assembly containing multiple files

An assembly contains one or more namespaces. A namespace is a collection of types that
are semantically related. Apart from some syntax restrictions, developers can define their own

M.D.W. van Oudheusden                                                                            51
    6.1 Inside the IL
    namespace. The main purpose is to allow (meta) items to be unambiguously identifiable. How-
    ever, namespaces are not metadata items and hence are not stored in a metadata table.

    The .NET Framework uses an object-oriented programming model in which types are an im-
    portant concept. The type of an item, such as a variable, constant, parameter, and so on, de-
    fines both data representation and the behavioral features of the item. The Common Language
    Infrastructure standard (see Section 3.4) defines two kinds of types, namely value types and
    reference types as further explained in Section 5.3.6. The Framework supports only a single
    type inheritance and thus creats a hierarchical object model. At the top is the System.Object
    type, from which all the types are derived.

    In .NET the types can be divided in five categories; class, interface, structure, delegate and enu-
    merations. Types, fields and methods are the three important elements in managed program-
    ming [51]. The other elements are using metadata to provide additional information about
    these three.

    6.1.4   Classes

    A class defines the operations an object can perform (methods, events, properties) and defines
    values that hold the state of the object (fields). A class also contains specific class metadata
    which can be divided into two concepts; type reference (TypeRef ) and type definition (TypeDef ).

    TypeDefs describe the types defined in the class whereas TypeRefs describe references to types
    that are declared somewhere else. In its simplest form we only get the TypeDef metadata table
    which contains flags about the visibility (like a public or private class) and a class reference
    to another type. There is more information if, for instance the class, implements other classes,
    uses custom attributes, is an enumeration or a value, et cetera.

    A class contains items like methods, properties, events or fields and these items are character-
    ized by signatures. A signature is a binary object containing one or more encoded types and
    resides in the metadata [51]. The first byte of a signature defines the calling convention and, in
    turn, identifies the type of the signature. Possible calling conventions are field, property, local
    variable, instance method or a method call identifiers.

    At line 3 in Listing 6.1 the syntax of a class definition is shown. The dotted_name defines
    the namespace this class belongs to. simple_name is the name of the class (the TypeDef ) and
    class_ref is the name of the class it extends from. There are numerous flags to specify
    specific options for the type definition of the class. For instance, if the type is visible outside of
    the assembly or if it is an interface.
1   .namespace <dotted_name>
2   {
3     .class <flags> <simple_name> extends <class_ref>
4     {
6       }
7   }

                                 Listing 6.1: Syntax of a class definition

    52                                        Automatic Derivation of Semantic Properties in .NET
                                                                                      6.1 Inside the IL
    6.1.5   Fields

    Fields are, together with local variables inside a method, data locations. Information about
    the fields is stored in a fields metadata table. Additional tables specify data like the layout,
    mapping and constant values of the fields. The syntax for a field is listed in Listing 6.2.
1   .field <flags> <type> <name>
                                 Listing 6.2: Syntax of a field definition

    The owner of a field is the class or value type in the lexical scope of which the field is defined.
    The flags are used to specify extra options, the type defines the type of the field and finally
    the name indicates the name of the field. An example is given in Listing 6.3.
1   .field public string s
2   .field private int32 i
                                Listing 6.3: Example of a field definition

    There are two types of fields;
    Instance fields These are created every time a type instance is created and they belong to this
          type instance.
    Static fields Fields which are shared by all instances of the type and are created when the type
          is loaded.
    The field signature does not contain an option to specify whether the field is static or instance.
    However, the compiler keeps separate tables for the two kinds and can classify the desired
    field. To load or store a field, there are two sets of instructions in the IL; one for static and one
    for instance fields.
    Fields can have default values and these are stored in the constant metadata table. Besides the
    default values for fields, this table can also contain default values for parameters of methods
    and properties. The syntax is listed in Listing 6.4 and an example is given in Listing 6.5. If the
    const_type is a null reference, a value is not mandatory.
1   .field <flags> <type> <name> = <const_type> [( <value> )]
                       Listing 6.4: Syntax of a field definition with default value

1   .field private int32 i = int32(1234)
                      Listing 6.5: Example of a field definition with default value

    A field declared outside a class is called a global field and belongs to the module in which it is
    declared. This module is represented by a special TypeDef record under the name <Module>.
    A global field is by definition static since only one instance of the module exists and no other
    instance can be created.

    6.1.6   Methods

    A method has a couple of related metadata tables like the definition and reference information,
    implementation, security, semantics, and interoperability. The syntax for a method is listed
    in Listing 6.6.

    M.D.W. van Oudheusden                                                                           53
    6.1 Inside the IL
1   .method <flags> <call_conv> <ret_type> <name>(<arg_list>) <impl> {
2      <method_body>
3   }
                               Listing 6.6: Syntax of a method definition

    The flags define the options for the method such as the accessibility (private, public and so
    forth). The call_conv, ret_type, and arg_list are the method calling convention, the return
    type, and the argument list defining the method signature. The impl specifies additional im-
    plementation flags of the method. For example like whether the method is managed, in CIL or
    native format, if the method must be executed in single-threaded mode only, and so on.
    The name of the method is a name or one of the two keywords .ctor or .cctor. The instance
    constructor method (.ctor) is executed when a new instance of the type is created, the class
    constructor (.cctor) is executed after the type is loaded and before any one of the type members
    is accessed. The global .cctor can be used to initialize global fields.
    There are different kinds of methods as depicted in Figure 6.3. Static methods are shared by all
    instances of a type and do not require an instance pointer (referred to as this). They also cannot
    access instance members unless the instance pointer is provided explicitly. An instance method
    is instance-specific and has as its first argument the this instance pointer.

                                 Figure 6.3: Different kinds of methods

    Virtual methods can be overridden in derived classes, whereas non-virtual methods can still be
    overridden but have nothing to do with the method declared in the base class. This method is
    hidden, but can be called when the class name is specified.

    6.1.7   Method Body

    The method body itself holds three parts, namely a header, IL code and an optional structured
    exception handling (SEH) table, see Figure 6.4. Currently, there are two types of headers: a fat
    and a tiny version indicated by the first two bits in the header. A tiny header is created by the

    54                                       Automatic Derivation of Semantic Properties in .NET
                                                                                 6.1 Inside the IL
compiler when the method does not use SEH nor local variables, has a default stack space of
eight and its size is less then 64 bytes.

                                     Figure 6.4: Method body structure

Local variables are declared and have their scope inside the method. Local variables only have
names if the source code is compiled in debug mode. They are referenced in IL code by their
zero based ordinals1 . Unlike fields and method names, the names of local variables are not part
of the metadata. Their names are stored inside a (separate) debug file, the program database
(PDB file).
If the keyword init has been added to the local’s declaration, the JIT compiler must initialize
the local variables before the execution of the method. This means that for all the value types
the constructor is called and all variables of object reference types are set to null. If the init
flag has not been set, the code is regarded as unverifiable and can only run from a local drive
with verification disabled.
It should be noted that methods, just like global fields, can reside outside any class scope. These
so called global methods are static and the same accessibility flags as for a global field apply.

6.1.8     IL Instructions

Inside a method we find the method body which contains a header, IL code, and an optionally
structured exception handling (SEH) table. IL code contains IL instructions, which are made
up of an operation code (OpCode) and are sometimes followed by an instruction parameter.
A list of all the available operational codes can be found in Appendix A. There are long and
short parameter instructions. The long form requires a four byte integer, the short form only
one byte. This can reduce the amount of space in the assembly, but the short form can only be
used when the value of the parameter is in the range of 0 to 255 (for unsigned parameters).
       numbers used to denote the position in an ordered sequence.

M.D.W. van Oudheusden                                                                          55
    6.1 Inside the IL
    IL is a stack based language, meaning that operands must be pushed onto the stack before an
    operation can use them. An operator grabs the values from the stack, performs the operation
    and (optionally) places the result back onto the stack. More generally, instructions take all
    required arguments from the stack and put the results onto the stack. If, for instance, a local
    variable is required by an instruction, then another instruction has to load this variable onto
    the stack. An exception to this rule are the instructions of the load and store group which are
    responsible for the pushing and popping of values onto and of the stack.
    Elements on the evaluation stack are called slots and can be one of the types listed in Ap-
    pendix B. Besides the evaluation stack, a method also contains an argument table and local
    variable table, both having slots with static types.
    We can distinguish different kinds of instructions which are listed in the sections below.     Flow Control

    Labels can be placed between the IL instructions to mark the first instruction that follows it.
    Labels are used by the control flow instructions to jump to a predefined part in the code. It
    is much easier and safer1 to use labels then offsets. The instructions dealing with control flow
    inside a method are the following;
         •    Branching;
         •    Switching;
         •    Exception handling;
         •    Returning.
    The branching instruction can be divided into three types of branching. First, we have uncondi-
    tional branching where the control flow jumps to another part of the code. Second, conditional
    branching where the control flow is directed to another location based on a true or false value
    on the stack. The third branching type is the comparative version where two values from the
    stack are compared according to the condition specified by the OpCode. This condition can be
    a greater then, not equal, less or equal and so forth.
    The switch instruction uses a jump table to determine where to jump to. It takes an unsigned
    integer from the stack and uses that number to jump to the target offset in the jump table
    sequence. A value of zero on the stack instructs the switch to jump to the first target offset in
    the list.
    In Listing 6.7 an example of an unconditional branching instruction and a switch instruction
    are listed.
1   Loop:
3   br       Loop
5   switch(Label1, Label2, ,LabelN)
6         // Default case
7   Label1:
9   Label2:

         Safer in the sense that when creating IL code by hand, it is safer to use labels instead of calculating, possibly
    incorrect, offsets.

    56                                                Automatic Derivation of Semantic Properties in .NET
                                                                                         6.1 Inside the IL
11   LabelN:
                                           Listing 6.7: Control flow examples

     The exception handling instructions are divided into an exiting and an ending instruction.
     A block of code inside an exception handling clause cannot be entered or exited by simply
     branching. There are special state requirements prohibiting this. So there is a leave instruction
     to exit an exception handling block, which clears the stack space before branching. To indicate
     the end of an exception handling block a special endfinally instruction is used. This also
     clears the stack but does not jump.
     A method always ends with one or more ret instructions which returns the control flow to the
     call site. If there is a (single) value on the stack, the ret instruction will retrieve this value and
     push it back on the stack to be used by the calling method.   Arithmetical Instructions

     Arithmetical instructions are used for numeric data processing, stack manipulation, loading
     and storing of constants, arithmetical operations, bitwise operations, data conversion opera-
     tions, and logical condition check operations.
     Stack manipulation instructions perform an action on the evaluation stack and do not have a
     nop Performs no operation on the stack. Although not a stack manipulation instruction it is
         included in this list for lack of a better category;
     dup Duplicates the value on the top of the stack;
     pop Removes the value from the top of the stack.
     Constant loading instructions place the parameter with the constant value on the stack. There
     are instructions which directly specify the value to load so a parameter is not needed, as shown
     at line 2 in Listing 6.8.
 1   ldc.i4   16            // Place the constant 16 onto the stack
 2   ldc.i4.7               // Load the value 7 on the stack. Note; there is no parameter
                                    Listing 6.8: Constant loading instructions

     It is possible to load and store values using pointers. The value on top of the stack refers to a
     specific address where the value can be loaded from or stored to. Table 6.1 lists all the possible
     arithmetical operations. The overflow instructions raises an exception when the result does not
     fit the target type.
     Bitwise and shift operations have no parameters. They take one or two values from the stack,
     perform there action and place the result back onto the stack. Table 6.2 provides a summary of
     the bitwise and shift operations.
     The conversion instructions take a value from the top of the stack, convert it to the type spec-
     ified by the instruction and put the result back onto the stack. There are of course some rules;
     not every type can be converted to another type and information can get lost when converting

     M.D.W. van Oudheusden                                                                              57
    6.1 Inside the IL
                    OpCode          Description
                    add             Addition
                    sub             Subtraction
                    mul             Multiplication
                    div             Division
                    div.un          Unsigned division
                    rem             Remainder, modulo
                    rem.un          The remainder of unsigned operands
                    neg             Negate thus invert the sign
                    add.ovf         Addition with overflow
                    add.ovf.un      Addition of unsigned operands with overflow
                    sub.ovf         Subtraction with overflow
                    sub.ovf.un      Subtraction of unsigned operands with overflow
                    mul.ovf         Multiplication with overflow
                    mul.ovf.un      Multiplication of unsigned operands with overflow

                                    Table 6.1: Aritmetical operations in IL

    to a narrowing type (for example converting a 32 bit integer to a 16 bit integer). Special over-
    flow conversion opcodes are available when there is a need to throw an exception whenever
    the value must be truncated to fit the target type.
    Logical condition check instructions are used to compare two values based on a certain opera-
    tor. The result is placed on the stack and is not directly used to branch to another location. A
    separate conditional branching instruction (branch true at line 3) can follow a logical condition
    check instruction (check equal at line 2) as shown in Listing 6.9.
1   Loop:
2   ceq
3   brtrue loop
                     Listing 6.9: Condition check followed by a branching instruction   Loading and Storing

    Almost all the instructions in the CIL operate on values on the stack except for the loading
    and storing instructions. This group of instructions is used to load values from local variables,
    fields, and method arguments onto the stack and to store items from the stack into these local
    variables, fields, and arguments.
    The ldarg and starg instructions handle the argument loading and storing while the ldloc
    and stloc are used for local variables. The parameter indicates the ordinal of the argument or
    variable to load or store. Remember; the first (zero based) argument of an instance method is
    the object pointer.
    The ldfld and stfld are the instructions for field loading and storing. A field signature does
    not indicate whether the field is static or instance so there are special instructions to load and
    store field values (or a pointer to) from or onto the stack. For each static load/store function
    there is also an instance load/store function.

    58                                          Automatic Derivation of Semantic Properties in .NET
                                                                                           6.1 Inside the IL
                       OpCode        Description
                       and           Bitwise AND (binary)
                       or            Bitwise OR (binary)
                       xor           Bitwise exclusive OR (binary)
                       not           Bitwise inversion (unary)
                       shl           Shift left
                       shr           Shift right
                       shr.un        Shift right, treating the shifted value as unsigned

                                   Table 6.2: Bitwise and shift operations in IL   Method Calling

    From within a method body it is possible to call other methods (see Listing 6.10). There are a
    number of instructions to use for method calling. These call instructions use a token as param-
    eter which contains either a MethodDef or a MethodRef of the method being called.
    Parameters of the calling method should be put into the stack in order of their appearance in
    the method signature before the actual call to the method. In case an instance method is being
    called, the instance pointer should be placed on the stack first. If the called method does not
    return void, it will place its return value back on the stack when returning.
    Methods can be called directly or indirectly. With an indirect call, not the method name, but a
    pointer to the method is used.
1   ldstr "Enter a number"
2   call void [mscorlib]System.Console::WriteLine(string)
3   call string [mscorlib]System.Console::ReadLine()
                                       Listing 6.10: Method call example   Exception Handling

    Exception handling is a feature of the managed runtime. The runtime is capable of detecting
    exceptions and finding a corresponding exception handler. SEH information is stored in a table
    after the IL code of the method body. There are two forms of SEH declaration; a label form as
    shown in Listing 6.11 with an example in Listing 6.12 and a scope form, shown in Listing 6.13.
1   .try <label> to <label> <EH_type_specific> handler <label> to <label>
                               Listing 6.11: Exception handling in label form

    The EH_type_specific is either a catch, filter, fault or a finally type. It uses labels to define
    a guarded block of code and a block of code which handles the exception. This is called the
    labeled form of exception handling. With the scope form, a try instruction is placed before the
    actual instruction block and is followed by the exception handling catch block, see Listing 6.13.
1   BeginTry:
3      leave KeepGoing
4   BeginHandler:

    M.D.W. van Oudheusden                                                                                59
     6.1 Inside the IL
 6      leave KeepGoing
 7   KeepGoing:
 9      ret
10   .try BeginTry to BeginHandler catch [mscorlib]System.Exception
11           handler BeginHandler to KeepGoing
                         Listing 6.12: Exception handling in label form example

 1   .try {
 2     // Guarded code
 3     leave KeepGoing
 4   }
 5   catch [mscorlib]System.StackOverflowException {
 6     // The exception handler 1 code
 7     leave KeepGoing
 8   }
 9   catch [mscorlib]System.Exception {
10     // The exception handler 2 code
11     leave KeepGoing
12   }
                             Listing 6.13: Exception handling in scope form

     Exception handling structures can be nested either using the labeled or the scope form. It is
     illegal to branch in or out a guarded block of code. This type of block can only be entered from
     the top (where the guarded block is defined by a TryOffSet) and code inside the handler blocks
     can only be invoked by the exception handling system. Guarded and handler blocks can only
     be exited by using the leave, throw, or rethrow instruction. The evaluation stack will be cleared
     before the branching.

     6.1.9   Normalization

     Because the IL is stack based, a simple expression can become very complex. Take for example
     the expression y := (x + 2) − 1. Adding the constant value 2 to the value of x and subtract the
     constant 1. The result is stored in the variable y. Listing 6.14 shows this expression in IL code.
 1   ldloc x
 2   ldc 2
 3   add
 4   ldc 1
 5   sub
 6   stloc y
               Listing 6.14: Stack based expression in the Common Intermediate Language

     Recognizing the expression is difficult. Temporary results are placed on the stack and are re-
     trieved by the next operation. For instance, the result of the add operation is not stored in a
     variable. To convert a stack based program to a semantical representation, it is necessary to
     perform a normalization step. The example in Listing 6.14 can be normalized by introducing a
     temporary variable. As such there is a single assignment for each statement.
     temp = x + 2
     y = temp - 1

     60                                       Automatic Derivation of Semantic Properties in .NET
                                                                                       6.1 Inside the IL
    This is only a simple example, but expression with multiple sequential operators can be nor-
    malized in more temporary assignments.

    6.1.10   Custom Attributes

    The .NET Framework can be extended using custom attributes, special metadata items. Cus-
    tom attributes cannot change the metadata tables because those tables are hard-coded and part
    of the runtime. The information in the custom attributes can be used by the program itself,
    but also by compiler or debugger. A major disadvantage of custom attributes is the amount of
    resources they occupy in the source code and the fact that the IL is not able to access custom
    attributes directly so a reflection technique must be used, which is a relatively slow mechanism.

    Custom attributes can be attached to any item in the metatables except for custom attributes
    itself. Attaching an attribute to an instance is not possible, only an assignment to the type itself
    is allowed. The value of a custom attribute is a BLOB, a binary large object and contains the
    arguments of the attribute constructor and optionally a list of fields with values.
1   .custom instance void <class_ref>::.ctor(<arg_list>) [ = ( <hexbytes> ) ]

                                 Listing 6.15: Custom attribute syntax in IL

    Listing 6.15 shows the declaration of a custom attribute in IL code. The class_ref is the name
    of the class implementing the attribute, arg_list are the arguments for the constructor of the
    attribute and hexbytes contains the BLOB representation of the argument values. An example
    is shown in Listing 6.16.
1   .custom instance void MyAttribute::.ctor(bool) = (01 00 01 00 00)

                              Listing 6.16: Custom attribute example in IL

    The position of the custom attribute in the code defines the owner of the attribute. All the
    attributes declared in the scope of an item, belong to that item. If there is no scope, then the
    attributes declared after a item belong to that item. This is the opposite of the use of custom
    attributes in a higher level programming language like C# or VB .NET where the custom at-
    tribute precedes the item it belongs to (see the example in Listing 6.17). There is another form
    of custom attribute declaration where the owner can be explicitly specified for metadata items,
    which are declared implicitly such as TypeRefs and MemberRefs. This form can appear anywhere
    in the source since the owner is specified.
1   public class ExampleClass
2   {
3     [MyAttribute(true)]
4     public void ExampleMethod( )
5     {
6       //
7     }
8   }

                            Listing 6.17: Example of a custom attribute in C#

    M.D.W. van Oudheusden                                                                            61
6.2 Access the IL
6.2     Access the IL

In the previous section the Intermediate Language used by Microsoft is described. Used as an
extra layer before compiling the instructions to machine code, it provides an accessible pro-
gram representation of the software created by a higher level programming language like C#
or VB.NET. However, the instructions together with the metadata is stored in a byte code for-
mat. A format optimized for speed and efficiency, but not direct readability. This section lists a
number of the ways to read the contents of the byte code so we have access to the instructions
and data.

6.2.1   How to Read IL

Basically to access IL code we have to parse the byte code and convert the code to their IL
representation. For instance, the hexadecimal value 5A stands for the IL instruction Add. Using
the ECMA 335 specification [25], the standard for the Common Language Infrastructure and
thus the Intermediate Language, we can read and parse the byte code. However, it is not as
simple as it looks. Information is contained in different metadata tables and must be associated
with the correct elements. Instead of writing our own CIL byte code parser, it is more efficient
to use an existing tool. Called IL readers, they parse the byte code and create a higher level
representation of the instructions. How this is represented differs by the used tool.
The process of analyzing a subject system to create representations of the system at a higher
level of abstraction is called reverse engineering [14]. A tool to reverse engineer a program
is called a decompiler, performing the reverse operation to that of a compiler. Microsoft even
provides a free decompiler called ILDASM (Intermediate Language Disassembler), which is
part of the Software Development Kit (SDK). It is the counterpart of ILASM, used to convert
plain text IL code to byte code. Using both tools you can even perform a round trip compila-
tion, so it is possible to decompile an assembly with ILDASM and recompile with ILASM and
get a correct assembly again. ILDASM provides a Graphical User Interface (GUI), displayed
in Figure 6.5, to show the contents of the .NET file in a tree structure. With special options it
allows access to the metadata and the CIL code of the selected methods.

                         Figure 6.5: Graphical User Interface of ILDASM

62                                       Automatic Derivation of Semantic Properties in .NET
                                                                                6.2 Access the IL
Besides the GUI, it allows output to a text file. By specifying the correct options, ILDASM
generates a .IL file containing the Intermediate Language code. An example of this format is
shown in Listing 3.1. Since this is a much easier representation to read than the byte code
version it can be used as an input for an IL parser. We can even change this representation and
recompile it using the ILASM program. A technique currently used by the Compose /.NET
project [10].
Reflector1 , a tool created by Lutz Roeder, is a commonly used tool to inspect .NET assemblies.
Figure 6.6 shows a screen shot of this program. This tool can convert the analyzed IL code to
a higher level language like C#, VB.NET or Delphi. With the use of plugins it is possible to
perform analysis on the code or even to completely retrieve the source of an assembly in the
language of your choice.

                              Figure 6.6: Lutz Roeder’s Reflector

With the help of these decompiler tools it is very easy for anybody to reverse engineer assem-
blies back into readable source code. Malicious people can now crack the programs, exploit se-
curity flaws and steal ideas. If it is imperative to protect your source code, obfuscation should
be applied. Code obfuscation is the generation of code, which is still understandable by the
compiler, but is very difficult for humans to comprehend. Some of the techniques used with
obfuscation are removing nonessential metadata, control flow obfuscation, string encryption,
reordering of elements, and size reduction. This is not applied on the source code, but on the
assemblies. Although it is not a foolproof security for your assemblies, it makes it very difficult
to reverse engineer an application.

M.D.W. van Oudheusden                                                                          63
6.2 Access the IL
Besides tools for visualization of the source code, there are tools to perform specific analysis
tasks on the code. For instance, the Fugue protocol checker1 . By specifying custom attributes
inside the source code, Fugue checks if the code conforms to the declarative specification. Pre-
and postcondition, resource usage, database queries, and so on can be specified, as well as the
different states a program must adhere to. Fugue performs a static analysis to find possible
problems which could occur at runtime[22].
FxCop2 is another static analysis tool that checks .NET managed code assemblies for confor-
mance to the Microsoft .NET Framework Design Guidelines. It generates reports on possible
design, localization, performance, and security improvements based on the design guidelines
written by Microsoft. Targets, the managed assemblies, are checked by rules. If a rule fails, a
descriptive message is displayed revealing relevant programming and design issues. Figure 6.7
displays a screenshot of FxCop.

                                  Figure 6.7: Microsoft FxCop

There are a lot of other tools using a representation of the IL byte code to perform an analysis
task. However, the purpose of this assignment is to perform our own analysis. We not only
need an IL reader, but also a program interface to access the IL representation so semantic
analysis can be performed. The next sections list some possible tools which can be used for this

6.2.2      Reflection

Reflection is a method to inspect the metadata to get information about an assembly, module,
or type [65, 4]. The .NET Framework Class Library contains special functionality in the Sys-
tem.Reflection namespace to access the metadata without converting the byte code to another
format. This is internally handled by the reflection classes. Reflection is used at runtime by

64                                       Automatic Derivation of Semantic Properties in .NET
                                                                                    6.2 Access the IL
    a program to analyze itself or other components. The structure of the analyzed component is
    converted to an object representation. There are three main classes;

       • System.Reflection.Assembly, which represents assemblies;
       • System.Reflection.Module, which represents managed modules;
       • System.Type, representing types.

    These classes contain properties and methods to get further information about the assemblies.
    For instance, the Assembly class contains functionality to get all the types in the assembly as
    System.Type objects. This type information is the root of all reflection operations and allows
    access to methods, fields, parameters, and so on. Besides the method in the Assembly class,
    there are other ways to dynamically load types. This is called late binding, binding to a type at
    run time rather than compile time.

    If we have a Type object, we can also invoke methods on this type, such as activating or execut-
    ing the method. In Listing 6.18 an example of using reflection to load an assembly, get the class
    ImageProcessing and the method GetImage is presented. This method is invoked and returns an
    Image object.

1   Assembly assm = Assembly.Load ("assembly.dll");
2   Type type = assm.GetType ("ImageProcessing");
3   MethodInfo method = type.GetMethod ("GetImage");
4   Object obj = Activator.CreateInstance (type);
5   Image image = (Image) method.Invoke (obj, null);

                                    Listing 6.18: Reflection example

    With reflection it is possible to access the custom attributes, the elements providing extra infor-
    mation about other items. Because it is not possible to directly access the custom attributes in
    the IL code, reflection is the only way to retrieve these items.

    The reflection library has another component named Emit and it allows a compiler or tool to
    emit metadata and Intermediate Language code and execute this code or save it to a file.

    Reflection is a powerful way to perform static analysis on assemblies. However there are two
    major problems. The first one is speed. Reflection takes place at runtime and must parse and
    process the assembly before it can build a representation. This is a time consuming process.
    Speed improvements are included in .NET version 2.0. A more serious problem is the lack of
    method body information. Reflection can reveal almost all the information needed except the
    contents, the body, of a method with the actual instructions inside. This is the information we
    are mostly interested in. Again, in version 2.0, the reflection classes have been enhanced with
    a function to get the method body contents. However, this will only give the byte code, so we
    still have to parse and convert this to another representation.

    Reflection gives us a partial implemented tool to perform code analysis. If the reflection possi-
    bilities regarding speed and the method body are improved then it will be a good candidate to

    M.D.W. van Oudheusden                                                                          65
    6.2 Access the IL
    6.2.3      Mono Cecil

    Cecil1 is a .NET library to inspect and generate .NET assemblies. It provides more or less the
    same functionalities as reflection. You can load an assembly and browse through all the types.
    In addition to reflection, it can read and parse IL instructions and has functionality to change
    the code and save it back to disk.

    Listing 6.19 gives an example of opening an assembly and reading the types inside this assem-
    bly. If we have access to a type object, we can use that object to retrieve the fields, methods,
    constructors, and so on.

1   // Creates an AssemblyDefinition from the "assembly.dll" assembly
2   AssemblyDefinition assembly = AssemblyFactory.GetAssembly ("assembly.dll");
4   // Gets all types which are declared in the Main Module
5   foreach (TypeDefinition type in assembly.MainModule.Types) {
6     // Writes the full name of a type
7     Console.WriteLine (type.FullName);
8   }

                                   Listing 6.19: Cecil get types example

    A method object has a property called Body which returns a MethodBody instance. We can use
    this object to access all the local variables, the exception handling information and the IL in-
    structions. Besides from directly accessing the instructions, we can also use the analysis tools of
    Cecil. The flow analysis class create basic instruction blocks in which instructions are grouped
    together in a block. A control flow instruction, like branching, exception throwing and return-
    ing, creates a new block and blocks are connected to each other. If the last instruction in a block
    is a branching instruction to another block, then the current block has a link to that block. This
    gives us the opportunity to trace the control flow of the instructions in the method.

    Inside the blocks are instructions and each instruction has an OpCode, indicating the type of
    instruction, and an operand, the parameter of the instruction (when available). The offset of
    the instruction, indicating the placement of the instruction inside the method, is also stored in
    the instruction. A next and previous method allows for navigation between the instructions
    and a visitor pattern can be used the visit all the different kinds of instructions.

    Not only can Cecil read IL instructions, it can also add or change instructions and save the
    changed assembly. Cecil is used in a number of analysis tools, for instance, tools which checks
    if code is type safe or for code optimization.

    Although Cecil had the ability to access the IL instructions, it is limited in its abilities. An
    instruction does not contain specific information about the data it is working on as specified
    by the operand and there is no direct link to this operand. We have to determine the type of
    the operand ourself. Support for .NET Framework version 2.0 was not available in the version
    of Cecil examined for the assignment. It is possible that newer versions can handle the next
    version of the Framework without any problems.


    66                                        Automatic Derivation of Semantic Properties in .NET
                                                                                        6.2 Access the IL
     6.2.4           PostSharp

     A similar program to Cecil is PostSharp1 . This tool reads .NET assemblies, represents them
     as a Code Object Model, lets plug-ins analyze and transforms this model and writes it back
     to binary form. The two main purposes of this application are program analysis and program
     PostSharp is designed for .NET version 2.0 and supports the new language constructs in the
     CIL. Working in combination with the reflection capabilities of .NET it creates its own repre-
     sentation of the instructions inside a method body. Listing 6.20 gives an example of reading an
     assembly and printing all the instruction to the console.
 1   // Get the assembly
 2   System.Reflection.Assembly assembly =
 3      System.Reflection.Assembly.LoadFrom("assembly.dll");
 4   System.Reflection.Module[] modules = assembly.GetModules();
 6   // Get all the modules
 7   foreach (Module mod in modules)
 8   {
 9     // Open a module with PostSharp
10     PostSharp.ModuleReader.ModuleReader mr =
11        new PostSharp.ModuleReader.ModuleReader(mod);
12     PostSharp.CodeModel.ModuleDeclaration md = mr.ReadModule();
14       // Get the types
15       foreach (TypeDeclaration t in md.Types)
16       {
17         // Get all the methods in the type
18         foreach (MethodDeclaration method in t.Methods)
19         {
20           // Get the body of the method
21           MethodBodyDeclaration b = method.Body;
23                   // Print the method name
24                   Console.WriteLine(method.Name);
26                   // Enumerate through all the instructions
27                   b.ForEachInstruction(delegate(InstructionReader instructionReader)
28                   {
29                     Console.WriteLine("Read instruction {0} as {1}",
30                        instructionReader.OpCodeNumber, instructionReader.OperandType);
31                   });
33                   method.ReleaseBody();
34               }
35       }
36   }
                                     Listing 6.20: PostSharp get body instruction

     Just like Cecil, PostSharp has the ability to split the instructions into blocks for a representation
     of the control flow. Instructions are represented in the code model with detailed information
     about the instruction and their operands. Although the operands type information must still
     be resolved.

     M.D.W. van Oudheusden                                                                             67
6.2 Access the IL
PostSharp is more mature than Cecil, but is still under heavy development. At the time of
implementation, PostSharp was not production ready.

6.2.5      RAIL

The Runtime Assembly Instrumentation Library, called RAIL1 , is a project of the University of
Coimbra, Portugal. Like Cecil and PostSharp, it allows .NET assemblies to be manipulated and
instrumented before they are executed. It fills the gap between .NET reflection and the emitting
functionality. Its primary use is the transformation of assemblies by changing types, methods,
fields or IL instructions [12].
RAIL creates an object-oriented representation of the assembly which can be manipulated to
make changes in the code. Besides the structured view, RAIL also creates tables to hold the
sequence of objects and object references that represent the applications IL code and all the
exception handling related information.
Applications of RAIL are runtime analysis tools, security verification, MSIL optimization, As-
pect Oriented Programming, and so on. However at the time of writing, RAIL was immature
and could not be used for even simple analysis tasks.

6.2.6      Microsoft Phoenix

Phoenix2 is a framework for building compilers and a wide range of tools for program analysis,
optimization, and testing [50]. Phoenix is a joint project between Microsoft Research and the
Developer Division and is the basis for all future Microsoft compiler technology. It supports a
wide range of hardware architectures and languages.
Building blocks form the core of Phoenix, implemented around a common intermediate rep-
resentation (IR). Those blocks are called phases and are executed one by one in a predefined
order. Phases are used to build, modify or analyze the IR and in most cases, the final phase will
write the IR to a specific output format.
Figure 6.8 shows the components of the Phoenix platform with the IR as the main structure
that the phases can use to interact with the data. Readers are used to process different types of
input, like AST, C intermediate language, Common IL (MSIL), PE files and other binaries. The
input is read into the IR that represent an instruction stream of a function as a series of dataflow
operations and the phases are executed in sequence. Each phase performs some operations on
the IR and hence the IR can be in different levels of abstraction. For instance, during a compi-
lation with Phoenix, the IR is transformed from a high level IR, which is machine independent,
to the final instructions and addresses, a low level IR, which is machine dependent. Finally, the
writers are used to build an executable or library.
The list of phases can be changed to include or replace other phases at someplace in the se-
quence. If an analysis phase needs access to a high level representation of the code then include
the phase at the start of the sequence.

68                                        Automatic Derivation of Semantic Properties in .NET
                                                                                          6.2 Access the IL

                                        Figure 6.8: Platform of Phoenix

Besides the representation of the functions in the IR, there is also an API to perform analysis
on the data, like data flow, control flow, graphs (inheritance, call and interference), exception
handling, Static Single Assignment1 , and so on. The IR can also be modified by adding or
changing instructions of a function or by changing functions. This is ideal for instrumentation
and profiling, but also for AOP code weaving.
There are two ways to use Phoenix, either as a compiler back-end or as a standalone tool. As
a compiler back-end it uses the Phoenix framework and the input and output modules with
the custom phases to do a compilation. As a standalone tool it is possible to directly call the
Phoenix API and implement your own phases and place those at the right place in the phase
Each phase implements an Execute function. A Unit is passed as a parameter to this function
and can contain any of the unit types listed in Table 6.3. It is up to the phase to determine

  Unit                   Description
  FuncUnit               Encapsulates the information required during compilation of a single
                         function or method.
  DataUnit               Represents a collection of related data, such as a set of initialized vari-
                         ables or the result of encoding a FuncUnit.
  ModuleUnit             Represents a collection of functions.
  PEModuleUnit           Represents a portable executable (PE) image, such as an EXE or a DLL.
  AssemblyUnit           Represents a .NET Framework assembly compilation unit.
  ProgramUnit            Represents an executable image compilation unit, an EXE or a DLL.
  GlobalUnit             Represents the outermost compilation unit.

                                       Table 6.3: Phoenix unit hierarchy

if the unit is of the type it is interested in. The FuncUnit is the fundamental compilation unit
in Phoenix and contains all the information necessary to compile a single function. For code
       An intermediate representation in which every variable is assigned exactly once.

M.D.W. van Oudheusden                                                                                   69
6.2 Access the IL

                        Figure 6.9: Control flow graph in Phoenix [59]

analysis at the instruction level, this is the most interesting unit to analyze. It does not only
contain the instructions, but also graphs, exception handling information and type information.

Each phase has access to the intermediate representation (IR), which represent an instruction
stream of a function as a series of data-flow and/or control-flow operations. Instructions in
the IR have an operator, a list of source operands, and list of destination operands. The IR is
also strongly typed, meaning the types on the operands that reference data are stored. If, for
instance, an integer is placed onto the stack, then the source operand of the load instruction
is of the type integer. The type is determined by the operator and by the types of the source

The control flow in the IR is explicit, instructions are in sequence. Using the flows graphs
functionality it is possible to create basic blocks representing a sequence of instructions with
flow edges to other blocks. Each block starts with a unique label and ends with a branching
instruction or, optionally, an exception raising instruction. Figure 6.9 shows an example of the
control flow graph in Phoenix.

The IR instructions can be divided in two different kinds;

70                                       Automatic Derivation of Semantic Properties in .NET
                                                                                      6.2 Access the IL
          Describes instructions that represent operations with dataflow or control flow, most of
          which map to one or more machine instructions.
          Instructions that represent things such as labels and statically allocated data.
     Table 6.4 shows the different forms of instructions available in the Phoenix IR. The last three
     items are pseudo instructions, the rest are real instructions.

        Instruction     Description
        ValueInstr      Any arithmetic or logical operation that produces a value
        CallInstr       Function invocation, either direct or indirect
        CmpInstr        A compare instruction that generates a condition code
        BranchInstr     Control flow for a branching, condition/unconditional and returning
        SwitchInstr     Control flow for a switching, multi-way computed branch
        OutlineInstr    Defines an outline summary instruction in the IR for code moved out of
                        the main instruction stream: e.g. asm blocks
        LabelInstr      User-defined labels and control flow merge points in the code stream.
        PragmaInstr     Arbitrary user supplied directives.
        DataInstr       Statically allocated data.

                                   Table 6.4: Phoenix instruction forms

     Each instruction object contains properties such as the OpCode indicating the kind of opera-
     tion, a source operand list, and a destination operand list. Based on the type of the instructions,
     more properties can be available. The BranchInstr will contain properties with links to the La-
     belInstr for a true and a false value of a condition. The CallInstr has a property indicating the
     name of the function being called.
     In Listing 6.21 part of a phase is listed. The execute function checks if the type of the Unit is
     a FuncUnit. If this is the case, it will build a flow graph and go through all the instructions in
     each block, while printing the OpCode value to the console.
 1   protected override void Execute(Phx.Unit unit)
 2   {
 3     // Try casting to a FuncUnit
 4     Phx.FuncUnit func = unit as Phx.FuncUnit;
 5     if (func == null)
 6     {
 7       // Only interested in FuncUnits
 8       return;
 9     }
11     bool noPrevFlowGraph = func.FlowGraph == null;
12     if (noPrevFlowGraph)
13     {
14       // Build a control flow graph
15       func.BuildFlowGraphWithStyle(Phx.Graphs.BlockStyle.SplitEachHandlerEdge);
16     }
18     Phx.IR.Instr instr;
19     Phx.Graphs.BasicBlock basicBlock;
20     basicBlock = func.FlowGraph.StartBlock;

     M.D.W. van Oudheusden                                                                           71
     6.2 Access the IL
22       // Loop through all the basic blocks
23       while (basicBlock != func.FlowGraph.LastBlock)
24       {
26            // Begin the per-block instruction traversal.
27            instr = basicBlock.FirstInstr;
28            for (uint i = 0; i < basicBlock.InstrCount; i++)
29            {
30              // Write the OpCode to the console
31              Console.WriteLine(instr.Opcode);
33                // Get the next instruction in this block
34                instr = instr.Next;
35            }
37            // Get the next block
38            basicBlock = basicBlock.Next;
39        }
41        // Clean up the flowgraph
42        if (func.FlowGraph != null) func.DeleteFlowGraph();
43   }
                                 Listing 6.21: Phoenix phase execute example

     Phoenix is the most extensive IL reader and writer discussed in this chapter. It supports the
     .NET Framework version 2.0, gives detailed information about the operands of an instruction,
     and has flow graphs capabilities. It is more mature than Cecil and PostSharp, but is still under
     development. However, it has the support of Microsoft and it is used by Microsoft to compile
     applications like Windows, PowerPoint, certain games, and for building test tools. There is
     more documentation than the other IL readers discussed, but still information and samples are

     72                                        Automatic Derivation of Semantic Properties in .NET
                                                                               CHAPTER      7

                                                            Design and Implementation

Chapter 5 discussed the semantics of programming languages and Chapter 6 described the in-
ner workings of the language our target source code is in. This chapter brings the two together
by showing the design of the Semantic Analyzer.

7.1     General Design

A high level overview of the system is needed before we can elaborate on the more specific
parts. This section presents a general overview of the complete system, describes the limita-
tions we have to take into account, and the flow of the program and its various components. It
also specifies the coding guidelines used for the implementation.

7.1.1    Introduction

Basically, we want to read an assembly, perform an analysis task on it, and produce some
output. To perform the querying of data, we store the semantic representation of the input
assembly in a semantical metamodel first (more details in Section 7.2). We can use this model
to reason about the behavior of the elements in the source. To archive this, we perform a
number of steps;

   1.   Read a .NET assembly into memory;
   2.   Parse the intermediate language byte code to a readable representation;
   3.   Retrieve the structure of the assembly;
   4.   Build a semantic model based on the structure;
   5.   Convert IL instructions to a semantical representation and store in the model;
   6.   Provide a querying mechanism to search and reason about the program.

7.1 General Design
The reading and parsing of an assembly is handled by an existing tool. There are multiple IL
readers available as seen in Section 6.2 and since Phoenix is best suitable for the job, we use this
tool to access the assemblies.
The source code is converted to a metamodel, a higher level representation of the code, which
provides information to determine the behavior of the program. An interface allows for the
searching and working with the data inside this model. Two types of applications are created
to test the semantical extractor. One is a command line utility where the input is one or more
assemblies and the output is determined by the supplemented plugins. The other application
is a Windows Forms application, which provides a GUI to browse the metamodel and see all
its properties and graphs.

7.1.2    Design Limitations

Before the semantic analyzer is designed, a number of design requirements have to be taken in
     1. Because of the structure of the Compose compilation process, we do not have access
        to the full .NET assemblies at the start of the compilation (thus not before the actual
        weaving) [39]. The reason for this is the method introduction, which introduces new
        signatures making it impossible for the compiler to compile the source. By using dummy
        files and custom compilation the Compose compiler can create the assemblies.
     2. The analyzer should be language independent. Compose is available for multiple plat-
        forms and although the focus is on the .NET version of Compose , it would be wise to
        consider a model capable of expressing multiple source languages.
To take these points into account there are a number of possible solutions.

Compose Compilation Process Looking at item one, we have to redesign the way the compila-
tion process is working. If we have a .NET assembly at the first stage of the building process we
can run the semantic analyzer at that point. To be able to do this we must directly compile the
source and the dummy files so we have an assembly. However the dummy files only contain
signatures and lack method body implementations. The analyzer needs the implementations,
thus we cannot use these assemblies at this point in time.
Another solution is to use the assemblies modified by ILICIT [10]. However, at that point the
selection for the placement of the filters has already been completed. New selection points
based on the semantics of methods are too late to be introduced. We can still use the analyzed
assembly to perform other tasks, like resource usage checking, side effects determination, and
so on.
The third option is to only analyze the source files, as long as they are present as an IL repre-
sentation. They do not contain the added aspect code, but can be used as a source.
The most elegant solution is the possibility to switch off type checking so an assembly can be
created by the standard .NET compilers as the first step in the Compose compilation process.
This is not possible with the default implementation of the .NET compilers (writing our own
compiler is an option to overcome this, but requires too much effort for each .NET language).

74                                        Automatic Derivation of Semantic Properties in .NET
                                                                            7.1 General Design
Because of this limitation, the first version of the semantic analyzer will primary be used for
resource extraction and providing extra information to other Compose modules. This is an
action which can be performed after (or separated from) the main Compose compilation pro-
cess. The Semantic Analyzer will be placed after the ILICIT module.

Language Independancy Regarding issue two, the language independency of the analyzer, we
have to make sure we can store the semantics of any object oriented language. This means we
have to distinguish the language specific parts from the common parts and only store the com-
mon behavioral information. Type information, for instance, is language dependent (the type
system differs in the different OO implementations) and must be stored in a special manner.
Behavior is still very generic and not directly connected to the source. There are differences
between the naming in Java and C# such as the ToLowerCase in Java and the Lower in C#,
but the operations act the same. Section 7.2 gives more details about the implementation of a
language independent system.
Figure 7.1 provides a general overview of the system. Different Semantic Extractors can be used
for specific source languages. They each use their own reader and parser to access the source
code and convert the code to a semantical representation, which is stored in the Semantical
Model. The Semantic Database is used to access this model and contains the querying mecha-
nism. Plugins perform the specific tasks to get the required behavioral information.

                                 Figure 7.1: Process overview

Since the Semantic Model is language independent it can theoretically handle all types of source

M.D.W. van Oudheusden                                                                        75
7.1 General Design
languages as long as they are object-oriented. Besides the Java elements, we could also add
a Borland Delphi extractor to process Object Pascal source files and store the semantics in the

7.1.3   Control Flow

Because of the design limitations discussed earlier there is a specific control flow the analyzer
will use when we integrate the system into Compose .NET. The analyzer will be implemented
as a stand-alone console tool. The main analyzer is in a separate assembly so in the future it
can also be called from another front-end.

The Semantic Extractor will return a metamodel of the analyzed assembly. This model is passed
by the console application to the plugins, which can query this model using the Semantic
Database API. This way there is no need to save the metamodel to file and read it again.

               Figure 7.2: Control flow of the console application with a plugin

Figure 7.2 shows this flow with the Resource Checker plugin. It is possible to add multiple
plugins to the system by using command switches.

This pipeline flow of the analyzer is not the only way the application can be used. Since all the
functionality is spread in different components, it is also possible to only use an extractor and
work directly on the metamodel, not using the database or plugins.

76                                       Automatic Derivation of Semantic Properties in .NET
                                                                                  7.1 General Design
    7.1.4     Building Blocks

    The requirements ensure a separation of functionality in different components. For instance,
    the extractor using the Phoenix library is in a different assembly so the plugins have no de-
    pendency on the Phoenix subsystem (or any other language dependent system). The next
    paragraphs describe the different components of the system. The whole system is called the
    Semantic Analyzer.   Semantic Extractor

    This component is responsible for the extraction of the semantics from a source file. This source
    file can be a .NET assembly or a Jar file for the Java version. Currently there only is a .NET
    Semantic Extractor. The Extractor uses the Microsoft Phoenix library for the actual reading and
    parsing of the IL code. The resulted semantics will be stored in the metamodel and returned to
    the caller. Another component, like a command line application, calls the extractor and passes
    the returned metamodel to the plugins.
    When it is possible to create IL code directly at the start of the Compose compilation process,
    the extractor can, in theory, be placed before the other Compose components to determine
    semantic weaving points.
    Besides Phoenix, it is also possible to use other IL readers for the same task. Using the provider
    design pattern [40], we can chose which specific extractor must handle the calls issued to the
    abstract base class, the SemanticExtractor class. The available providers are listed in a con-
    figuration file and only one provider provide the actual implementation of the abstract class.
    Support for the provider design pattern is part of the .NET Framework version 2.0 Class Li-
1   // Change the default provider to a new provider
2   SemanticExtractor.SetProvider("phoenix");
4   // Call the Semantic Extractor
5   IList<SemanticItem> semanticItems = SemanticExtractor.Analyse(assemblyName);
                                   Listing 7.1: Calling the SemanticExtractor

    Listing 7.1 shows how to change the default provider to another provider and how to start
    the actual analysis process. The value phoenix is one of the values defined in the applications
    configuration file and can be found in Appendix C.   Semantic Metamodel

    The Semantic Metamodel is an object model, created and filled by the Semantic Extractor and
    passed to the plugins as data source. It is contained in memory, but a representation of the
    model can be written to an xml format. Recreating the metamodel from the xml file is currently
    not supported.
    This model is language independent. It is a higher level view of the source code with the same
    structure (classes, methods, and so forth) as the original code. The behavior of a method body

    M.D.W. van Oudheusden                                                                          77
7.1 General Design
is represented in a special way, using actions and blocks. More information about this model is
given in Section 7.2.   Semantic Database

The database allows for storing the model and to search for specific data in this model. It is
called by the plugins and works directly on the model (it does not need the Phoenix library or
the original source). To do this, it supplies a SemanticDatabaseContainer object. Not only
is the metamodel stored in this database, it also provides a number of ways to access the data
and to search in the database.
Although this is written in a .NET language (namely C# version 2.0), it uses functionality which
can be ported to Java (version 2 required). More information in Section 7.4.   Plugins

A plugin is a piece of code which uses the semantic model for a specific purpose. The Semantic
Analyzer is very general as where the plugins perform specific detailed actions using these
general functions.
For instance, resource usage will be collected by the Resource Checker plugin. This plugin pro-
vides information for the SECRET module and needs to get all the methods with a parameter of
type ReifiedMessage. Each method in the resulting method list is examined by determining the
behavior of the parameter containing a ReifiedMessage. It is possible the parameter is assigned
a different value or a method of the object is called. It is the task of the plugin to perform these
service orientated queries while the Semantic Database and Metamodel perform and contain
the general queries and data source.

7.1.5     Guidelines

A general guideline is presented to which the application will adhere. By using a specified
programming model the application will be consistent.

Coding guidelines For the .NET platform there already is an exhausted design guideline created
by Microsoft [58]. This guideline describes the naming of variables, methods, parameters etc.
The API designed for this assignment will follow that guideline.

Naming Conventions and Assemblies To provide a consistent programming model we have to
place the code inside namespaces. Microsoft advises to use the following standard:
In this case we will use UTwente for the company name and SemanticAnalyzer as the tech-

78                                        Automatic Derivation of Semantic Properties in .NET
                                                                                           7.2 Semantic Model
The metamodel and its API will be used from within the tools. The developers of these tools
must program against the metamodel and not the extractors which depends heavily on the un-
derlying supporting tools like the Phoenix framework. For this reason we place the metamodel
in a separate assembly. Table 7.1 lists all the assemblies and their purposes. Each name begins
with the UTwente.SemanticAnalyzer namespace1 .

        Assembly name                        Purpose
        SemanticExtractorRail                Provider using RAIL
        SemanticExtractorPostSharp           Provider using PostSharp
        SemanticExtractorCecil               Provider using Cecil
        SemanticExtractorPhoenix             Provider using Phoenix
        SemanticModel                        Contains the metamodel and graph algorithms
        SemanticDatabase                     Database container and API to query the model
        SemanticLibrary                      Shared functionality, such as plugin interface and
                                             provider model
        SemanticPlugins                      Standard plugins which will process the metamodel
        SemanticComposeStarPlugins           Specific plugins for Compose
        SemanticExtractorConsole             Console application
        SemanticWinForms                     Windows Forms Application

                                        Table 7.1: Assembly naming

7.2      Semantic Model

Information extracted by the Semantical Extractors is stored in a metamodel. This is an object
oriented model containing classes representing the structure and the semantical information of
the source code. The model can be stored in the SemanticalDatabaseContainer, which also
provides an interface for searching through this model.
Because multiple source languages can be converted to this model, we can not store any lan-
guage specific information such as types. However, we do not want to lose that kind of infor-
mation, so we have to convert this to a more general representation. The metamodel consists
of elements used by most object oriented languages and it is up to the extractor to convert the
specific language elements to the correct corresponding semantical elements in the model.

7.2.1    Overall Structure

To make effective use of the model, we not only have to store the semantics, but also the location
of these items in the original hierarchy. Meaning, if we have a function with certain behavior,
we have to place this function in its context. A function will be in a class, and a class is in some
sort of a container like an assembly (.NET) or JAR2 file (Java).
We can distinguish three main elements in the metamodel;
      Note; the current implemented code does not yet completely adhere to this standard. Also some class and
function names are implemented using UK English instead of US English, used in this thesis.
      JAR file (or Java ARchive) is a compressed file used to distribute a set of compiled Java classes and metadata.

M.D.W. van Oudheusden                                                                                           79
    7.2 Semantic Model
         This is the base for all the semantic items providing structure for the model. Such as
         classes, methods, and so on.
         The element on which a mathematical or logical operation is performed. Like a field, a
         parameter, a local variable, or a constant value.
         Contains a language independent view to store type information.

    Besides these three core elements, there are elements to track source reference (the line numbers
    the action came from), attribute information, and so on.

    In the Section 7.2.5 a more detailed view of the model is given.

    7.2.2   From Instructions to Actions

    An extractor not only builds the structure of the metamodel, like the layout of all the classes
    with their functions, it also converts the instructions, such as IL opcodes, to their semantical
    representations. Code is converted to actions and each action performs some sort of task. An
    action is represented in the model as a SemanticAction object and is placed inside a Semantical-
    Block. Both are elements of a function.

    Blocks The blocks are used for the control flow. Each block contains one or more actions and
    are linked together. Each block has a direct link to its previous and next block in the control
    flow. If the extractor supports exception handling extraction, we can also request the exception
    handling block for a specific block. That is the block being called when there was an exception.

    A simple example of the use of blocks is found in Listing 7.2. This for loop checks for a condition,
    performs an action, and returns to the condition check part.
1   int j = 0;
3   for (int i = 0; i < 10; i++)
4   {
5       j = j + i;
6   }
8   return;
                                     Listing 7.2: For loop in C#.NET

    Figure 7.3 shows the corresponding blocks for the code in Listing 7.2. The extractor is respon-
    sible for creating the blocks and connecting them to each other. If the information is available,
    the extractor can also indicate the start and end line number of the source code corresponding
    to a block. Blocks with no actions should be removed from the list of blocks and the links be-
    tween the blocks must be updated. It is possible an extractor has introduced more blocks than
    needed using its own control flow algorithm.

    80                                        Automatic Derivation of Semantic Properties in .NET
                                                                                 7.2 Semantic Model

                                 Figure 7.3: Loop represented as blocks

    Actions If we look at the code in Listing 7.3, we see a number of different kind of actions. First
    the value of x is increased by two, second, the value of y is multiplied with three. A compari-
    son is performed on those two results and based on this comparison a branching operation is
    executed, either to label l1 or label l2.
1   if ((x + 2) > (y * 3)) {
2     // l1
3   }
4   else {
5     // l2
6   }
                                   Listing 7.3: Expression in C#.NET

    It is human readable in the source language C#.NET, but converted to IL code, shown in List-
    ing 7.4, it is more difficult to understand.
1   ldloc 0
2   ldc 2
3   add
4   ldarg 1
5   ldc 3
6   mul
7   cgt
8   brtrue l1
9   br l2
                                    Listing 7.4: Expression in IL code

    We do not know what type of local variable the statement ldloc is loading. The result of the
    add operation is not stored in a variable, but only on the stack. Same with the result of the
    multiplication. Both values of the stack are used by the compare operation (cgt). The branching
    is more complex since it uses two different branch operations, one conditional branch and one
    unconditional jump.
    We would prefer a semantical representation of this expression in the way depicted by List-
    ing 7.5.
1   t1$ (loc)   = add   x(loc), 2(con)
2   t2$ (loc)   = mul   y(arg), 3(con)
3   t3$ (loc)   = cmp   t1$, t2$, gt
4    = branch   (t3$,   true), l1, l2
                         Listing 7.5: Semantical representation of the expression

    M.D.W. van Oudheusden                                                                          81
7.2 Semantic Model
We can see directly the basic actions performed; adding, multiplying, comparing, and branch-
ing. We also see the operands the actions are working on and the kind of operands, like a
variable, argument, or constant value. The actions store their results in temporary variables,
introduced by the Semantic Extractor. We can always trace the usage of an operand and where
it originated from. Some actions do not only contain the source and destination operands, but
also additional options. The compare action has information about the kind of comparison,
like greater then (gt). The branch action has direct information where a true and false value of
the corresponding comparison leads to.

Since the Semantic Extractors are language dependent and thus know the language they are rea-
soning about, they are responsible for converting one or more instructions to a corresponding
action. Not all the instructions provide a meaningful action and as such they do not intro-
duce new actions. On the other hand, multiple instructions can form one action. Loading two
operands onto the stack, adding the values, and storing the results is represented as one add

In Appendix D you can find all the available kinds of semantic actions the model can store. The
arguments are listed with each item. Besides these arguments, an action can also have a link
back to the original source line number.

7.2.3   Dealing with Operands

An operand is the data used by the IL operation. An operation can use one operand, called a
unary operation, or two operands, called binary. Some OpCodes do not use any operands, for
instance a call to another function with no return value and no parameters.

In our model we have four kinds of operands;

      This is the argument, also called the parameter, of a function.
      A local variable and it exists only inside a function.
      A variable defined outside the function or even outside a class.
      A constant value, a value that does not change, such as a number or a text.

The semantic actions can use these operands as their source or destination operands. It is
obvious that a constant operand cannot be used as a destination as it is a read-only operand.

We can access all the operands so we can follow them through a function. We might see a
variable operand get a default value such as a constant operand, being used by another action
and finally getting the value of an argument operand assigned to it. This is called data flow

Each operand has a name and a type. The Semantic Extractor is responsible for assigning a
unique name to each operand and determining a correct semantical type. A constant operand
also contains the value it is assigned to, such as a number or a text.

82                                       Automatic Derivation of Semantic Properties in .NET
                                                                               7.2 Semantic Model
7.2.4     Type Information

Although types are directly related to a specific language, we still want to maintain type infor-
mation in our model. This means we store type information in two ways; as a textual represen-
tation and as one of a list of common types. It is up to the Semantic Extractor to map a language
type to a common type representation in the model. Appendix E list all the possible common
types the extractor can use. The original type is still conserved as text so applications capable of
reasoning about the source language can use this representation. For example, the Compose
plugin wanting to find the ReifiedMessage object can use the full name of the object type.
A Semantic Type also contains metadata regarding the type, like if it is a class, an interface, a
pointer, an array, and so on. Every operand has a Semantic Type, just as almost every unit in the

7.2.5     Model Layout

The previous paragraphs gave some insight in the design of the model, more details are given
in this section. Figure 7.4 shows a simplified view of the structure of the metamodel. This is
a tree like structure and at the top there is a Semantic Container object. In .NET this is called
an assembly, in Java JAR. A container holds zero or more classes. Each class can contain fields
and operations. Operations are the methods or functions of a programming language. In .NET
languages there are also properties, special functions which act as accessors for private fields.
This kind of functions are actually normal functions with a prefix (get_ or set_) and a special
reference in a metadata table.
An operation has zero or more arguments, the parameters of a function. Inside the operation
there are zero or more local variables and constants. If there are instructions in the function,
then there will be one or more blocks with actions.
The main unit blocks of the model all inherit from the SemanticItem class. This class has a
collection of SemanticAttributes objects so custom attributes can be applied to any kind of
item in the model.
Figure 7.5 shows a class diagram of the SemanticItem class and its direct derived classes.
The child class SemanticUnit contains the structural elements of the model. In Figure 7.6 a
diagram is given of this class and its children. All the child classes have an interface and this
interface is used by all the other components.
Each unit also implements the interface IVisitable which contains an accept function with
an IVisitor argument (see Figure 7.7). This visitor design pattern [31] can be used to visit all
the elements in the model and process each element individual. Currently this is used for the
search mechanism and the xml exporter.   SemanticContainer

The SemanticContainer is the root element for the actual model and is like an assembly for
.NET or a JAR file for Java. It has a name and filename of the original analyzed element. A
strong typed collection of SemanticClass objects holds all the class information of the con-

M.D.W. van Oudheusden                                                                            83
7.2 Semantic Model

Figure 7.4: Structure of the metamodel. Blue is a unit, red is an operand and green are the parts
actually providing semantical information.   SemanticClass

A class is a collection of encapsulated variables (fields) and operations. Classes exists in a
SemanticContainer and have a unique fully qualified name, type information, a scope and a
collection of SemanticField and SemanticOperation objects. The scope indicates if this class
is public or private accessible or if it is a shared class.
Figure 7.8 shows the SemanticContainer and SemanticClass classes.   SemanticOperation

A SemanticOperation represents a function containing a sequence of instructions. An oper-
ation is contained inside a class and has a unique name in that class. If the operation returns
data, then the type of this data is known. An operation has a number of collections;
    The arguments or parameters of the function;
    All the used local variables in the operation including the ,by the Semantic Extractor intro-
    duced, variables;
    The constant values used in the operation;
    The blocks with the actions.

84                                       Automatic Derivation of Semantic Properties in .NET
                                                                        7.2 Semantic Model

                 Figure 7.5: Semantic Item and direct derived classes

M.D.W. van Oudheusden                                                                  85
7.2 Semantic Model

                     Figure 7.6: SemanticUnit and child classes

86                                 Automatic Derivation of Semantic Properties in .NET
                                                                   7.2 Semantic Model

                        Figure 7.7: Visitor pattern in the model

             Figure 7.8: SemanticContainer and SemanticClass classes

M.D.W. van Oudheusden                                                             87
7.2 Semantic Model

                     Figure 7.9: SemanticOperation class

88                              Automatic Derivation of Semantic Properties in .NET
                                                                             7.2 Semantic Model

                    Figure 7.10: SemanticOperand class and derived classes

Figure 7.9 represents the SemanticOperation class with the associated collections.   SemanticOperand and Subclasses

As indicated in Section 7.2.3 there are four kinds of operands; arguments, fields, local variables,
and constants. They all inherit from the base class SemanticOperand which contains a name
and type for the operand.

Figure 7.10 shows the base class for the operands, the SemanticOperand class, and the child
classes. The SemanticConstant class has the ability to store the constant value. The Se-
mantic Extractor is responsible for suppling the correct value. The SemanticArgument and
SemanticVariable allow for the specification of the sequence in the list since not always the
name of the operand is available. In IL, these operands are addressed by their ordinal.

M.D.W. van Oudheusden                                                                          89
7.2 Semantic Model

              Figure 7.11: SemanticBlock class with collection of SemanticAction   SemanticBlock

The SemanticOperation class has a collection of SemanticBlock objects, which contain
the SemanticAction objects. Figure 7.11 shows the SemanticBlock class with its col-
lection of SemanticAction. Each block has a unique name inside the operation and a
SourceLineNumber class provides a link back to the original source code. This can be useful
when you want to add or replace code contained in the block.
The class also contains functions to navigate to other blocks, such as the next and previous
block in the sequence. This does not imply that the next block is also the next block in the
control flow. If there is no control flow action (branching, returning, switching) as the last
action in the block, then the next block is also the next one in the sequence. If the extractor
has exception handling information, then it is possible to jump to the block which handles the
exception for the current block.   SemanticAction

This class contains the actual semantical action performed by one or more instruction in the
source. A graphical representation of this class is found in Figure 7.12.
Each SemanticAction object has an ActionType from the SemanticActionType enumeration.
Based on the ActionType some additional properties have meaning. For instance, if the type is
Branch, then the true and false label information makes sense.
Also present are the source and destination operands. Only two source operands can exists, so

90                                      Automatic Derivation of Semantic Properties in .NET
                                                                    7.2 Semantic Model

             Figure 7.12: The SemanticAction class with supporting types

M.D.W. van Oudheusden                                                              91
7.2 Semantic Model

                              Figure 7.13: SemanticType class

expressions with multiple operands have to be normalized to a binary form.   SemanticType

The SemanticType class is the class used to store the type information. The extractor has to
select a general type from the list in Appendix E (Table E.1) to map the source type to the
corresponding common type. The extractor can also indicate if the type has a base type and/or
implements interfaces. Figure 7.13 shows a graphical representation of this class.

92                                     Automatic Derivation of Semantic Properties in .NET
                                                                            7.2 Semantic Model

                               Figure 7.14: SemanticAttribute class   SemanticAttributes

As described in Section 6.1.10 the IL support the concept of custom attributes, special metadata
which can be applied to almost every item in the language. Other languages, such as Java
where it is called annotations, also supports a similar system. The Semantic Metamodel supports
this concept and allows the extractor to add multiple SemanticAttribute objects to all the
SemanticItem types.

As shown in Figure 7.14, the class has a SemanticType and a list of ISemanticOperand ob-
jects. The operands are normally constant values used for setting the properties of the custom

7.2.6     Flow graphs

The Semantic Metamodel does not only contain the data, it also provides graph func-
tionality to work with the data. The SemanticOperation class has a function called
RetrieveSemanticControlFlow, which generates a control flow graph for the actions in-
side the operation. It uses a class called SemanticFlowGenerator which contains all the flow
operations. We can also call this class directly by supplementing a SemanticOperation as
parameter in the constructor.
The function GenerateSemanticControlFlow in this class generates the control flow and re-
turns this graph in the form of a SemanticControlFlow object. The SemanticControlFlow
has a collection of FlowBlock objects and a start and end FlowBlock. Each FlowBlock has
a unique name, a collection of SemanticBlock objects with the actions, and three lists. One
has all the successors and the other one has the predecessors. The third list contains all the
FlowBlock objects, which are control dependent for the current flowblock, e.g., these flow-
blocks contain branching and can thus control if the flowblock is reached.
The successors and predecessors are represented as FlowEdge objects. They indicate the target
flowblock and the reason for the link. This reason can be conditional, unconditional, exception
or fall through. The first two are used when the successor block is the result of a branch (con-
ditional) or a jump (unconditional) action. The exception is used when an exception is raised
before getting to the flowblock. The fall through simply connects blocks which follow each
other in the normal sequence.

M.D.W. van Oudheusden                                                                        93
7.2 Semantic Model

                     Figure 7.15: The flow classes

94                          Automatic Derivation of Semantic Properties in .NET
                                                                              7.2 Semantic Model
     All the classes for the generation of flow graphs are depicted in Figure 7.15.            The
     GenerateSemanticControlFlow functions uses various algorithms to generate the Semantic
     FlowGraph. It begins by splitting the SemanticBlock objects in the operation into flow blocks.
     The splitting is based on the control flow actions as described in algorith 1.

     input : A SemanticBlocks collection
     output: A collection of FlowBlocks
 1   semanticBlock ← SemanticBlocks[0];
 2   while semanticBlock = null do
 3      foreach semanticAction ∈ semanticBlock do
 4          hasLink ← false;
 5          switch actionType do
 6              case Jump
 7                  Add unconditional flowedge;
 8                  hasLink ← true;
 9                  break;
10              case Branch
11                  Add conditional flowedge to true block;
12                  Add conditional flowedge to false block;
13                  hasLink ← true;
14                  break;
15              case Switch
16                  Add conditional flowedge for all switch labels;
17                  hasLink ← true;
18                  break;
19              end
20          end
21      end
22      if has exception handler link & different exception handler then
23          hasLink ← true;
24          Add exception flowedge;
25          if next SementicBlock exists then
26              Add fall through flowedge;
27          end
28      end
29      if hasLink last SemanticBlock then
30          Add flowBlock to output collection;
31      else
32          semanticBlock ← next block;
33      end
34   end
                               Algorithm 1: GenerateSemanticControlFlow

     The next step is to connect the successors and predecessors. Algorithm 2 shows how this is

     M.D.W. van Oudheusden                                                                     95
     7.2 Semantic Model

     input : A flowBlock collection
     output: A flowBlock collection with connected successors and predecessors.
 1   // Connect successors;
 2   foreach flowBlock ∈ flowBlocks do
 3       foreach flowEdge ∈ flowBlock.Successors do
 4            Find SemanticBlock with target name;
 5            Set successor flowblock for flowEdge to found block;
 6       end
 7   end
 8   // Set the predecessors;
 9   foreach flowBlock ∈ flowBlocks do
10       foreach flowEdge ∈ flowBlock.Successors do
11            Add flowBlock to list of predecessors of the successor flowblock;
12       end
13   end
                                     Algorithm 2: Connect flow edges

     We now have the successors and predecessors of all the FlowBlocks and can create the control
     dependencies. Algorithm 3 shows how the DetermineControlDependency function is started
     and algorithm 4 describes the actual function.

     input : A flowBlock collection with connected successors and predecessors.
     output: A flowBlock collection with control dependency information
 1   foreach flowBlock ∈ flowBlocks do
 2      DetermineControlDependency (flowBlock);
 3   end
                         Algorithm 3: Start DetermineControlDependency

     input : A flowBlock collection with connected successors and predecessors.
     output: A flowBlock collection with control dependency information
 1   // Search through all the predecessors;
 2   foreach flowBlock ∈ flowBlock.P redecessor do
 3       foreach flowEdge ∈ flowBlock.Successors do
 4           if flowEdge = conditional then
 5               Add to dependency list;
 6           end
 7       end
 8       if predecessor not visited before then
 9           DetermineControlDependency (flowBlock);
10       end
11   end
                                  Algorithm 4: DetermineControlDependency

     We also have a function to determine the flow paths, all the possible paths through the control

     96                                      Automatic Derivation of Semantic Properties in .NET
                                                                                 7.2 Semantic Model
     flow. Algorithm 5 shows how these are calculated using the internal function BuildFlowPath.

     input : A flowBlock and a previousStack with flowblocks
     output: A stack with FlowBlocks that forms a control sequence
 1   if flowBlock.Successors.Count = 0 then
 2       return previousStack
 3   else
 4       foreach flowEdge ∈ flowBlock.Successors do
 5           Create temporaryStack;
 6           temporaryStack ← previousStack;
 7           if flowBlock ∈ temporaryStack then
 8               if flowEdge.Successor ∈ temporaryStack then
 9                   Add flowBlock to temporaryStack;
10                   temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);
11               end
12           else
13               Add flowBlock to temporaryStack;
14               temporaryStack ← BuildFlowPath (flowEdge.Successor, temporaryStack);
15           end
16           Add temporaryStack to global flow path collections;
17       end
18   end
                                  Algorithm 5: Determine Flow Paths

     We now have the flow paths and the control dependencies so we can add the access levels to
     the flow. An access level on a FlowBlock indicates the number of times the block is accessed.
     Some blocks are only accessed once, some are always accessed. A block in a loop with a con-
     dition at the start can be accessed multiple times, but a block in a loop with a condition at the
     end is accessed at least once and maybe more. Algorithm 6 describes how the access level is

     M.D.W. van Oudheusden                                                                         97
     7.2 Semantic Model

     input : A startFlowBlock, an endFlowBlock, flowPaths collection
     output: SemanticControlFlow with access levels
 1   foreach flowPath ∈ flowPaths do
 2      index ← flowPath.Count − 1;
 3      while index ≥ 0 do
 4         flowBlock ← flowPath [index ];
 5         if flowBlock = startFlowBlock flowBlock = endFlowBlock then
 6             accessLevel ← AtLeastOnce ;
 7         else
 8             Determine if flowBlock has itself as a parent;
 9             if HasParent then
10                 accessLevel ← MaybeMoreThenOnce ;
11             else
12                 // Determine if flowPath is in a loop;
13                 loopBlock ← loop FlowBlock;
14                 if IsInLoop then
15                      if loopBlock has no control dependencies then
16                           accessLevel ← OnceOrMore ;
17                      else
18                           accessLevel ← MaybeMoreThenOnce ;
19                      end
20                 else
21                      if flowBlock has no control dependencies then
22                           accessLevel ← MaybeOnce ;
23                      end
24                 end
25             end
26         end
27         index ← index − 1;
28      end
29   end
                                   Algorithm 6: Determine Access Levels

     Plugins can use the flow graph capabilities of the model directly in their own analysis. Each
     FlowBlock contains the Semantic Blocks and Actions so all the information is preserved. Al-
     though SemanticBlocks are usually also split on control flow conditions, the FlowBlocks pro-
     vide an optimized and more detailed representation of the control flow.

     The flow analysis classes can be extended with more flow generators such as data dependency
     graphs or call graphs. These are currently not implemented and the algorithms are also not

     98                                     Automatic Derivation of Semantic Properties in .NET
                                                                      7.3 Extracting Semantics
7.3     Extracting Semantics

Extracting the semantics is the job of the Semantic Extractor. It reads and parses the source
language, builds the metamodel and converts the statements to corresponding actions.

7.3.1   Semantic Extractor Class

The SemanticExtractor class is responsible for the conversion. As described in section,
the extraction system is designed as a provider pattern with SemanticExtractor as the base
class. Applications wanting to convert code to the semantic model call this class, which on its
turn uses a specific provider to do the actual transformation.
This allows for the creation of multiple providers and switching between the providers is han-
dled by specifying the provider name. The providers use their own tools to read the source
code. Not only for .NET, but also providers for other source code like Java, Delphi, and so on
can be created.
Each provider must implement the SemanticExtractorProvider class and register itself in
the configuration file (see Appendix C). The SemanticExtractor will then load this configu-
ration file, initialize the providers and select the default provider.

                            Figure 7.16: Semantic Extractor Classes

Figure 7.16 shows the classes needed for the Semantic Extractor. An application can call the
Analyze function with the assembly name as parameter. This is passed to the correct provider
which returns a list of SemanticItem objects. Usually this will be a SemanticContainer
 as it is the root of the metamodel tree structure, but a provider can also chose to return a
SemanticOperation or SemanticClass if this is more suitable.

M.D.W. van Oudheusden                                                                       99
     7.3 Extracting Semantics
     The application requesting the metamodel can work directly on the SemanticItem list or store
     the objects in the SemanticDatabaseContainer. By using the database container, the devel-
     oper has extra functionality to search and retrieve elements from the model. More about this
     container in Section 7.4.
     In Section 6.2, four different tools were introduced. These tools were used to create four differ-
     ent providers, which are discussed in the next sections.

     7.3.2   Mono Cecil Provider

     The provider using Cecil, called SemanticProviderCecil, was relatively easy to create.
     Opening and reading an assembly is an one line statement. The Cecil library returns an
     AssemblyDefinition object where we can inquire for all the available types. By browsing
     the types and their properties we can generate the structure of the semantic metamodel.
     The important part is the method body. It provides direct access to the local variables de-
     clared inside the method. To help us iterate over the IL instructions, Cecil provides an analysis
     framework. This framework returns a control flow in the form of blocks with instructions.
     These blocks are mapped to SemanticBlocks and the instructions of a block are given to an

     This visitor has a method for each OpCode and holds an operand stack. Stack loading instruc-
     tions, like loading constants, variables and so on, will place the operand onto the local stack
     after converting it to a SemanticOperand object. When, for instance, an And instruction is vis-
     ited, two operands will be popped from the stack and used as the source operands. The And
     instruction places a result on the stack so we store the SemanticAction object in a temporary
     variable. If a store instruction is visited, then it will find this previously stored action and assign
     the operand to store as the destination operand for the action. See Listing 7.6 for an example.
 1   Stack<ISemanticOperand> operandStack;
 2   private ISemanticAction prevAction;
 4   public override void OnLdarg(IInstruction instruction)
 5   {
 6     // Load the argument indexed by the operand onto the stack
 7     LoadArgument((int)(instruction.Operand));
 8   }
10   public override void OnAnd(IInstruction instruction)
11   {
12     // Create an Add action
13     ISemanticAction action = GetAction(SemanticActionType.And);
15       // Pop the two operands from the stack
16       action.SourceOperand2 = operandStack.Pop();
17       action.SourceOperand1 = operandStack.Pop();
19       // Store this action as the previous action
20       prevAction = action;
21   }
23   public override void OnStarg(IInstruction instruction)
24   {
25     // See if a previous action exists and assign the argument

     100                                        Automatic Derivation of Semantic Properties in .NET
                                                                               7.3 Extracting Semantics
26       // operand to the DestinationOperand property
27       if (prevAction != null)
28       {
29         prevAction.DestinationOperand = GetArgument((int)(instruction.Operand),
30         prevAction = null;
31       }
32   }
                              Listing 7.6: Part of the Cecil Instruction Visitor

     Although a provider using Cecil could be created with little effort, it has some problems. We
     still have to retrieve the type of the operands. Sometimes this can be induced from the operator,
     such as OnLdc_I4 which loads a 32 bit integer onto the stack. However, in most cases it is
     difficult to get the correct type since this is not specified.
     The flow analysis library of Cecil had problems with more complex method bodies. For in-
     stance, methods with exception handling could not be converted to a block representation.
     Cecil could not cope with the new language elements of .NET version 2.0 such as generics. It is
     possible that this is corrected in a newer version of Cecil.

     7.3.3   PostSharp Provider

     The PostSharp provider is not very different from the Cecil provider. It also has the possibility
     to create blocks separating the control flow. Instead of a Visitor pattern to visit each instruction,
     it uses an instruction reader, a stream of instructions. As long as there are instructions in the
     stream, we can read the current instruction, convert it and read the next one.
 1   InstructionReader ireader = method.Body.GetInstructionReader();
 2   ForEachInstructionProcessBlock(b.RootInstructionBlock, ireader);
 4   InstructionBlock iblock = b.RootInstructionBlock;
 6   while (iblock != null)
 7   {
 8     ireader.EnterInstructionBlock(iblock);
 9     InstructionSequence iseq = iblock.GetFirstInstructionSequence();
10     ireader.EnterInstructionSequence(iseq);
12       while (ireader.ReadInstruction())
13       {
14         Console.WriteLine("Read instruction {0}", ireader.OpCodeNumber);
15         if (ireader.OpCodeNumber == OpCodeNumber.Ldloc_0)
16         {
17           // perform actions based on OpCode
18         }
19       }
20       iblock = iblock.NextSiblingBlock;
21   }
22   method.ReleaseBody();
                          Listing 7.7: Using the instruction stream in PostSharp

     Listing 7.7 lists the statements needed to read the instructions in a method body. The Post-
     Sharp provider is not further developed. PostSharp was still in its early phases of development

     M.D.W. van Oudheusden                                                                           101
    7.3 Extracting Semantics
    and did not always read assemblies correctly. Spending more time on this provider was not
    advisable at the time.
    At the time of writing of this thesis, PostSharp appears to be more mature and does support
    the .NET version 2.0 assemblies.

    7.3.4   RAIL Provider

    RAIL, the Runtime Assembly Instrumentation Library, could not even load an assembly. The
    documentation and samples were, at the time, very limited and the program crashed fre-
    quently. The provider is thus not further implemented.

    7.3.5   Microsoft Phoenix Provider

    Because Microsoft Phoenix was the best analysis tool to use at the time of implementation, a
    lot of effort has been put in the development of the Phoenix provider. It consists of two main
    parts, the provider itself and an analysis phase.
    When the provider is called to analyze an assembly, it will use a PEModuleUnit to read the
    assembly into memory (see Listing 7.8). It retrieves the Symbol for the unit and uses this symbol
    to walk through the assembly. Symbols are placed in the symbol tables of a unit and provide
    the metadata of the elements in a unit.
1   peModuleUnit = Phx.PEModuleUnit.Open(assemblyName);
2   peModuleUnit.LoadGlobalSyms();
4   Phx.Syms.Sym rootSym = peModuleUnit.UnitSym;
6   WalkAssembly(rootSym);
                            Listing 7.8: Loading the assembly using Phoenix

    The WalkAssembly method builds a SemanticContainer based on the information in the root
    symbol. It uses iterators to find all the sub elements like the classes, methods, fields, and
    so on. When the assembly walker finds a method it create a SemanticOperation object, re-
    trieves and assigns the parameters, return type, and other properties and adds this object to
    the SemanticClass it belongs to.
    To retrieve the message body and thus the instructions, the analysis phase is used. The function
    is raised to another representation level in the Intermediate Representation (IR). It will use a
    number of phases to perform this action and the provider adds our own phase to the list.
    Listing 7.9 shows the code performing this action.
1   // Get the FuncUnit and raise it
2   Phx.FuncUnit funcUnit = funcSym.FuncUnit;
3   funcUnit = peModuleUnit.Raise(funcSym, Phx.FuncUnit.
5   // Prepare a config list
6   Phx.Phases.PhaseConfig config =
7      Phx.Phases.PhaseConfig.New(peModuleUnit.Lifetime, "temp");
9   // Create a new phase

    102                                      Automatic Derivation of Semantic Properties in .NET
                                                                              7.4 Querying the Model
10   SemanticExtractorPhase semtexPhase = SemanticExtractorPhase.New(config);
11   semtexPhase.SemanticOperation = semOp;
12   Phx.Phases.PhaseList phaseList = Phx.Phases.PhaseList.New(config, "SemTex Phases");
14   // Add our phase and provide IR dump
15   phaseList.AppendPhase(semtexPhase);
16   phaseList.DoPhaseList(funcUnit);
18   // Place the result in the semOp variable
19   semOp = semtexPhase.SemanticOperation;
                 Listing 7.9: Starting a phase for a function using the Phoenix Extractor

     The SemanticExtractorPhase is the class responsible for converting the instructions to ac-
     tions. As its input, it uses a FuncUnit, the fundamental compilation unit in Phoenix, represent-
     ing a function. Using the graph functionalities of Phoenix, a control flow graph is created. This
     provides us with the Semantic Blocks.

     Converting IL instructions to actions is similar to Cecil. An internal stack is used to keep track
     of all the stack loading and storing actions. Instructions using operands can retrieve the correct
     operand from this stack. Determining the type of the operand values is aided by Phoenix own
     type system. Phoenix is able to retrieve the type of the operands so this information can be
     stored in a SemanticType object. A huge advantages above Cecil and PostSharp.

     Furthermore, Phoenix is able to include information stored in the associated debug files, into
     the analyzed assembly. This gives us information about the names of the local variables, which
     is not stored in the IL code. Another usage of this debug file is the ability to link actions and
     blocks to points in the original code since line numbers are available in the debug file.

     Phoenix creates blocks, where each block ends with either a branch instruction or an instruction
     that can cause an exception and each block starts with a unique label. This gives us more blocks
     then intended, so a special block optimization is performed. The optimization algorithm is
     listed in Algorithm 7 and removes empty blocks, which are a result of Phoenix, and updates all
     the references to the correct blocks. As input we have a list of label identifiers and associated

     Phoenix provides a lot of functionality to analyze .NET assemblies. For instance, the control
     flow support, the ability to read the debug files, direct access to type information, and so on.
     However, Phoenix is still under development and documentation is scarce. Because it is used
     internally by Microsoft, the support and development are still going on and getting better.

     7.4   Querying the Model

     The Semantic Extractor converts the source code to the Semantic Metamodel. An application can
     now use this model for further analysis. However, this means the application has to traverse
     the whole model each time it wants to search for an item. To facilitate the use of the metamodel,
     a database container is created, which has a mechanism for searching in the model. This section
     provides more details about the database and the search options. The next chapter provides
     some practical examples how this search system can be used.

     M.D.W. van Oudheusden                                                                         103
     7.4 Querying the Model

     input : A collection of labels with associated blocks
     output: Optimized collection of Semantic Blocks
 1   // Remove blocks with zero actions;
 2   foreach semanticBlock ∈ semanticBlocks do
 3       if semanticBlock has no actions then
 4           Get the next block;
 5           Update all references to current block to the next block;
 6           Remove the empty block;
 7       end
 8   end
 9   // Connect the blocks;
10   foreach semanticBlock ∈ semanticBlocks do
11       // Connect the next and previous blocks;
12       if not the first semanticBlock then
13           Connect previous block to semanticBlock;
14       end
15       if not the last semanticBlock then
16           Connect next block to semanticBlock;
17       end
18       // Connect the exception handling block;
19       if exception handling block exists then
20           Connect exceptionhandlerblock to semanticBlock;
21       end
22       // Connect the actions to the correct blocks;
23       foreach semanticAction ∈ semanticBlock do
24           switch semanticAction.ActionType do
25               case Jump RaiseException
26                    Set the semanticAction.LabelName to the correct block;
27               case Branch
28                    Set the semanticAction.TrueLabelName to the correct block;
29                    Set the semanticAction.FalseLabelName to the correct block;
30               case Switch
31                    foreach SwitchLabel ∈ semanticAction.SwitchLabels do
32                        Set the semanticAction.SwitchLabel to the correct block;
33                    end
34               end
35           end
36       end
37   end
                                 Algorithm 7: Optimization of Semantic Blocks

     104                                      Automatic Derivation of Semantic Properties in .NET
                                                                      7.4 Querying the Model

                       Figure 7.17: SemanticDatabaseContainer class

7.4.1   Semantic Database

The Semantic Database library contains the classes to store SemanticItem objects, which are
the root of the semantic objects and provides a system to search in the metamodel.
The SemanticDatabaseContainer holds a collection of SemanticItem objects indexed by
their object hash code. An application wanting to store a metamodel in this database, calls
the StoreItem function and passes the object to store as its parameter.
The functions GetContainer and GetContainers provide direct access to the SemanticContainer
 objects in the database. The Query function allows the developer to search in the metamodel.
Finally, the ExportToXml function exports the metamodel to an eXtensible Markup Language
(XML) format. The purpose of this function is not to provide save functionality so the model
can be loaded back from file, but only to create a more user friendly view of the model. Full
support for serialization can be added, but is currently not implemented. Figure 7.17 shows
the SemanticDatabaseContainer class and the predicate used by the query function.

7.4.2   What to Retrieve

First we have to consider what type of information, and in what way, we want to retrieve from
the Semantic Metamodel.
For instance, we might want to find all the assignments of an argument to a global field (a
setter method). This assignment resides inside a semantic operation. From this it is clear that
we must search for an action with an action type of assignment inside an operation where the
source operand of this action is an argument and the destination operand is of type field. The

M.D.W. van Oudheusden                                                                      105
7.4 Querying the Model
names of the source and destination are not important. We also do not know in which class this
operation may be so we do not specify this.

This should return a list of actions of the type assignment. Each action has a link to the block
it resides in and the name of the operation, class and container. Using this information it is
possible to find other elements related to this action. Instead of searching for actions it should
also be possible to search for all blocks with an assignment action inside. This will return a list
of all the blocks. At that point we can search for all the actions inside one of those blocks. The
same applies for operations. By retrieving the link to the operation from an action we can get
all the actions inside the operation.

How to find out if an action depends on a comparison? This is based on the block where
the action resides in. The action has a link to its parent block. Using this block, we have
information about the control flow and can find which actions (the branching actions) lead to
this block. Instead of retrieving the block we can also retrieve all the actions of the operation
which are referring to this block. If there are none, then there was no branching to this block
and the action inside. If the search returns actions which link to this block, we can find out if
these branching actions where conditional or not. If so, we can retrieve the comparison type
and the values used by the compare function.

Of course it is also possible to retrieve information like which variables are declared inside an
operation or what are the arguments of an operation. This are not directly semantically related
data but can be needed for reasoning about the semantics. We could create a query to select all
the operations where we look at the arguments and one of those arguments should be of type
ReifiedMessage. Since we do not have types in the semantic model we pass this type as a string.

As you can see there are three parts in the query needed to retrieve the correct data. The three
types are as follows:

What we want to return
    This is one of the child objects of the SemanticItem object. We always have to indicate
    what type of objects we need back from the database;
Where we want to search in
    Search in actions, blocks, operations, and so on. Even if we search in operations, we can
    still return classes which are the parents of these operations;
What we want to search for
    This specifies the values of the properties of the element to search for. For instance; the
    type of the destination property must be a field and the action type itself must be a com-
    parison. Every public property of the element we search in can be used in this condition.

Another requirement of the search system, is the ability to search in the search results. So we
can use the returned values for further queries. This allows us to work with already found
data, instead of searching for the same data with more detailed search parameters.

7.4.3   Query Options

To retrieve the correct data we must ask the semantic database to retrieve this from the object
store using some sort of interface. There are various ways to implement a querying mechanism
which will be discussed with their positive and negative points.

106                                       Automatic Derivation of Semantic Properties in .NET
                                                                      7.4 Querying the Model    Predicate Language

A predicate is an expression that can be true of something. For example, the predicate HasAc-
tionOfType(X) can be true or false depending on the value of X. By using this type of logic and
combining predicates we can indicate what type of data we want returned.

However due to the complexity of all the semantic information it is labor intensive to create
a predicate language to cover all the elements. All the information has to be converted to
predicates which will lead to many different predicates to capture all the semantics available.
Changes to the model also means making changes to the predicate language.

Furthermore, support for predicates in the .NET language is not directly available. We have to
use some sort of Prolog interface to integrate predicate queries in the database search system.    Resource Description Framework

The Resource Description Framework1 (RDF) is a general-purpose language for representing
knowledge on the Web. An xml syntax is used to define triplets of information and so creating
RDF Graphs. This leads to relations between elements identifiable using Web identifiers (called
Uniform Resource Identifiers or URIs).

RDF does provide a way of defining relations, and thus extra data about objects, and has a
standardized framework for working with this data. However, setting up a model with all
the information before hand is time consuming and inefficient. This would mean we have to
convert the metamodel to another xml based model. It has an advantage for interoperability
since xml is language independent format and supported on a wide variety of platforms.    Traverse Over Methods

This technique allows the developer to call a function of the SemanticDatabaseContainer and
passing the search values as parameters. This method will then return the found values. There
are various ways to pass the values. These could be delegates to other functions, a text based
value (e.g. “type=’parameter’”) or events which will be raised by this function. Internally it
will traverse all the information in the metamodel, checks the information against the supplied
arguments and return the found items.

By using this method, the Semantic Database does not have to create a new representation
of its data. It will look through all the data inside the metamodel and passes this data to
the call back functions to do actual comparison. It also provides the developer of the plug-
in a lot of flexibility because he can create its own comparison functions. This could also be
a disadvantage because of the complexity. Retrieving information this way means multiple
functions or events have to be generated. Changes in the model cannot be detected at compile
time while using text based values.


M.D.W. van Oudheusden                                                                      107
    7.4 Querying the Model   Object Query Language

    An Object Query Language (OQL) based on SQL, the Structured Query Language for
    databases, gives a mechanism to define a query using standard query elements like SE-
    LECT, FROM, WHERE etc. The queries will be parsed and converted to an AST. The parsing
    should be performed by the Semantic Database.

    The advantage of this system is the expressiveness of OQL. The three parts of our query can
    be expressed using the OQL expression. The SELECT indicates what we want to return, the
    FROM part indicates where we want to search and the WHERE part specifies the conditions.

    An OQL expression can be passed to the Semantic Database query function. Since it is actually a
    string, it is also possible to pass this over another transport medium like a web service, files, and
    so on. This means more separation between different platforms because of the independence
    of certain languages.

    A disadvantage of using OQL is the fact that the query is composed of strings. So it is not
    strong typed and syntax checking must be performed by the parser. Any detected exceptions
    are raised and returned to the caller at runtime.

    Keep in mind that the implementation will not be standard ODMG1 OQL since we have a
    predefined object database which will be searched through by the Semantic Database using
    its own mechanism. The OQL language is only used to pass a search string to the underlying
    query function.   Simple Object Database Access

    With the Simple Object Database Access (SODA)2 system a Query object is used to find data in
    the underlying objects store. An example using SODA in C# can be seen in Listing 7.10.
1   Query query = database.Query();
2   query.Constrain(typeof(Student));
3   query.Descend("age").Constrain(20).Smaller();
4   IList students = query.Execute();

                                  Listing 7.10: SODA example in C#.NET

    By using the constrain and descend methods on the query object it is possible to add con-
    straints to the query and look up descendants of objects.

    SODA enforces a more strict integration with the query object then OQL since OQL uses simply
    a string to pass all the information instead of object operations. However, the values passed to
    the constrain and descend methods are not type safe and only tested at runtime.

         Object Data Management Group The Standard for Storing Objects. or http://

    108                                       Automatic Derivation of Semantic Properties in .NET
                                                                              7.4 Querying the Model    Native Queries

    Native queries are a relative new technique to define search queries [18, 17]. Microsoft is ex-
    perimenting with this in the form of LINQ1 , which is expected to be released in .NET version
    Most query systems are string based. These queries are not accessible to development envi-
    ronment features like compile-time type checking, auto-completion and refactoring. The de-
    velopers must work in two different languages; the implementation language and the query
    Native queries provide an object-oriented and type-safe way to express queries native to the
    implementation language.
    A simple example in C# 2.0 is shown in Listing 7.11, while the same example in Java is listed
    in Listing 7.12.
1   List <Student> students = database.Query <Student> (
2    Delegate(Student student) {
3      return student.Age < 20
4        && student.Name.Contains("f");
5    });
                                 Listing 7.11: LINQ query example in C#.NET

1   List <Student> students = database.query <Student> (
2     new Predicate <Student> () {
3      public boolean match(Student student){
4        return student.getAge() < 20
5           && student.getName().Contains("f");
6      }
7    });
                                   Listing 7.12: LINQ query example in Java

    A native query uses anonymous classes in Java and delegates in .NET. The use of generics is
    recommended for a strongly typed return value.
    It is still possible to use native queries with .NET 1.1 by means of a special Predicate class. An
    example in C# 1.1 (in Java this will also apply) shows this construction (Listing 7.13).
1   IList students = db.Query(new StudentSearch());
3   public class StudentSearch: Predicate {
4      public boolean Match(Student student) {
5        return student.Age < 20
6          && student.Name.Contains("f");
7      }
8   };
                                Listing 7.13: LINQ query example in C#.NET 1.1

    For a programmer, the native queries are a safe and quick way to create queries as they are
    in the same native language. There is however the dependency on newer techniques, like
    delegates and generics. Still older systems can be supported by using the Predicate class.

    M.D.W. van Oudheusden                                                                         109
7.4 Querying the Model
Note that the latter may result in a sort of Traverse over Methods functionality where special
functions have to be made to support the querying.   Natural Language Queries

A natural language query is one that is expressed using normal conversational syntax; that is,
you phrase your query as if creating a spoken or written statement to another person. There
are no syntax rules or conventions to learn. For instance a natural language query can be “what
are the write operations on field X in class Y?”.
Although this gives a lot of possibilities and flexibility in asking questions, it is very difficult
to extract the essential information from this query. The query has to be analyzed and the
semantics must be extracted. The actual search can only be performed when there is enough
information about the meaning of the query. Because of the complexity of Natural Language
Queries it is out of the scope of this assignment.

7.4.4     Native Queries in Detail

Bases on the advantages and disadvantages of the query options, a decision was made to use
native queries. Its major advantage is that we do not have to parse a text based query and check
for all kinds of constraints. Another advantage is the native support of the developer environ-
ment (IDE) for this type of query. We can have type safe, object-oriented, and refactorable code
and possible errors are visible on compile time instead of runtime. We can use elements of the
IDE like intellisense for automatically completing of code, and tooltips for information about the
Microsoft is using native queries in LINQ, their Language Integrated Query system and it will
be available in the next version of the .NET Framework as it requires a new compiler. This
is needed because it not only depends on existing technologies in .NET version 2.0, but also
needs new technologies.
The techniques from version 2.0 are generics, delegates and anonymous methods and allow us to
create type safe, inline queries. For example, Listing 7.11 shows the use of a generic to create
a list consisting of Student objects and the delegate makes it possible to define an anonymous
method inside the code, instead of creating a separate new method.
In the next version of the .NET Framework, we will also have anonymous types, lambda expres-
sions and extension methods. Anonymous types have no meaning during programming, but
get a real type during the compilation and the type will be deduced from the contents of the
anonymous type. A lambda expression is a form of anonymous method with a more direct and
compact syntax. These lambda expressions can be converted to either IL code or to an expres-
sion tree, based on the context in which they are used. Expression trees are, for instance, used
by DLINQ, a system to convert native queries to database SQL queries. With extension meth-
ods, existing types can be extended with new functions at compile time. This is used in the
next version of .NET to add query operations to the IEnumerable<T> interface, the interface
used by almost all the collections.
LINQ also introduces a new query syntax, which simplifies query expressions with a declar-
ative syntax for the most common query operators: Where, Select, SelectMany, GroupBy, Or-

110                                      Automatic Derivation of Semantic Properties in .NET
                                                                                  7.4 Querying the Model
         derBy, ThenBy, OrderByDescending, and ThenByDescending. We can use the query expres-
         sions, the lambda expressions, or delegates to get the same results (see Listing 7.14).

1    IEnumerable<string> expr =
2      from s in names                                    1   IEnumerable<string> expr = names
3      where s.Length == 5                                2     .Where(s => s.Length == 5)
4      orderby s                                          3     .OrderBy(s => s)
5      select s.ToUpper();                                4     .Select(s => s.ToUpper());

                                  (a) query expressions                                (b) lambda expressions

                                        Listing 7.14: LINQ query examples

         LINQ can provide us with a technique to search in objects, however it is not yet available. A
         similar system, partly based on LINQ, is created for the Semantic Database. It uses the generics,
         delegates, and anonymous methods available in .NET version 2.0. Expression trees are not used
         and the query itself, called the predicate, is executed for each element, which complies to the
         requested type, in the metamodel.
         To determine which type of the semantic model should be searched in and what type should
         be returned, the query function requires two generic types as shown in Listing 7.15.
    1    public ExtendedList<OutType> Query<InType, OutType>(Predicate<InType> match)
                                      Listing 7.15: Query function signature

         The InType determines the type we are looking for, such as a SemanticAction or a
         SemanticClass. This same type is also used in the predicate, a delegate returning a boolean.
         The predicate contains the query and must always return true or false based on some sort of
         comparison with the InType. OutType is used to indicate the type we want to return. We
         might be searching through all the actions inside an operation, but want to return the classes
         containing the found actions. This means we upcast the action to a class. It is not allowed to
         downcast so searching for classes and returning actions is not possible.
         The query function uses the visitor pattern to visit all the objects in the metamodel. The
         NativeQuerySearchVisitor is the visitor implementation responsible for executing the pred-
         icate for each relevant type in the model. The InType is used to determine if the visited element
         should be evaluated. If this is the case, the element is converted to the InType and the predicate
         is used to execute the query. If there is a match, the element is converted to the OutType and
         added to the result list. Listing 7.16 shows the code for this function. The conversion function,
         SafeConvert, knows about the relations between the elements and can perform the upcasting
         to other types.
     1   private void CheckForMatch(SemanticItem si)
     2   {
     3     if (inType.Equals(si.GetType()))
     4     {
     5       InType x;
     6       x = SafeConvert<InType>(si, false);
     7       if (predicateMatch(x))
     8       {
     9         OutType y;
    10         y = SafeConvert<OutType>(si, true);
    11         if (y != null) results.Add(y);

         M.D.W. van Oudheusden                                                                           111
     7.4 Querying the Model
12           }
13       }
14   }
                                    Listing 7.16: Predicate matching

     An overloaded query function is available with a parameter used to provide a starting point
     for the search. This starting point is a SemanticItem and can be used to search in parts of the
     The query function returns the found elements as a strong typed ExtendedList<T>. This is a
     extended List<T> class with functionality to further search in the results. LINQ uses method
     extensions to add the same functionality to the normal list classes, we have to create our own
     list. All the lists in the semantic model are an instance of the ExtendedList and as such are
     The ExtendedList<T> class is shown is Figure 7.18 and provides the following operators;
     Restriction operator
           Return a new ExtendedList filtered by a predicate using the Where operator.
     Projection operator
           The Select operator performs a projection over the list.
     Partitioning operators
           The Take, TakeWhile, Skip and SkipWhile operators partition the list by either taking
           elements from the list or skipping elements before returning the elements.
     Concatenation operator
           A Concat operator concatenates two lists together.
     Ordering operators
           Sorting of the lists is handled by the OrderBy and OrderByDescending operators.
     Grouping operators
           The GroupBy operator groups the elements of the list.
     Set operators
           To return unique elements, use the Distinct operator. An Union and Intersect oper-
           ator are used to return the union or intersection of two lists, while the Except operator
           produces the set difference between two lists.
     Conversion operators
           The ExtendedList can be converted to an array, dictionary, or a list.
     Equality operator
           To check whether two lists are equal, use the EqualAll operator.
     Element operators
           Used for returning the first, last or a specific element in the list.
     Generation operators
           Create a range of numbers or repeat the creation of an element a number of times.
           To check whether the list satisfies a condition for all or any element.
     Aggregate operators
           Used for counting the elements, getting the minimum or maximum, the sum of, or the
           average of values. A Fold operator applies a function over the elements in the list.
     Most of the operators return a new ExtendedList so it is possible to combine operators as
     shown in Listing 7.17. Because the ExtendedList<T> inherits from the List<T> class, it will

     112                                     Automatic Derivation of Semantic Properties in .NET
                                                          7.4 Querying the Model

                        Figure 7.18: ExtendedList class

M.D.W. van Oudheusden                                                       113
    7.4 Querying the Model
    also contain the functionality specified in the base class. The ExtendedList contains full intel-
    lisense1 support to help developers understand the functions and the Semantic Metamodel has
    added functionality to make it efficient to use in the queries. For example, it contains function
    to check whether there is an operand, if an operand is a field, does an operation have a return
    type, and so on (see Figure 7.12).
1   ExtendedList<SemanticOperation> operations =
2      db.Query<SemanticAction, SemanticOperation>(delegate(SemanticAction sa)
3      {
4          return (sa.HasDestinationOperand &&
5                sa.DestinationOperand.IsSemanticField &&
6                sa.DestinationOperand.Name.Equals("name"));
7      }).Distinct();
    Listing 7.17: Return all distinct operations containing actions assigning a value to a field named

        A form of automated autocompletion and documentation for variable names, functions, and methods using
    metadata reflection inside the Visual Studio IDE.

    114                                         Automatic Derivation of Semantic Properties in .NET
                                                                                  CHAPTER         8

                                                             Using the Semantic Analyzer

    Chapter 7 provides us with the design of the Semantic Analyzer and introduces the query system
    based on native queries (Section 7.4.4). In this chapter, examples of the usage of the Semantic
    Database with the native queries are give, we examine the different applications using the ana-
    lyzer, and discuss the integration of the analyzer in the Compose /.NET project.

    8.1   Semantic Database

    If the metamodel is stored in the Semantic Database, there three ways to retrieve it. Use the
    GetContainers function to get the root elements of the structure, implement a visitor to visit
    all the elements, or use the Query function to execute a search.
    The first method allows a developer to get access to the root element of the hierarchical tree of
    items in the model and he can iterate over all the elements. The second method does basically
    the same thing, but allows the developer to specify what type of action must be performed
    based of the different kinds of elements inside the model. The third method provides an effi-
    cient way to retrieve specific elements from the model and is described in more detail in this
    The Semantic Database uses native queries to specify the query expressions in the same native
    language as the development language. To execute a query, the search terms must be enclosed
    inside a delegate and supplied to the Query function of the SemanticDatabaseContainer.
    Instead of creating a separate delegate, it is possible to use anonymous methods, so the delegate
    can be passed to the Query function as a parameter. An example is displayed in Listing 8.1.
1   ExtendedList<SemanticAction> callActions =
2     db.Query<SemanticAction,SemanticAction> (
3       delegate(SemanticAction sa)
4       {
5         return sa.ActionType==SemanticActionType.Call;

     8.1 Semantic Database
 6          })
 7          .OrderBy<string>(delegate(SemanticAction sa)
 8          {
 9            return sa.ParentBlock.ParentOperation.Name;
10          });
                        Listing 8.1: Search for all call actions and sort by operation name

     In Listing 8.1 a search for all the SemanticAction objects calling methods is performed. The
     delegate is used as an anonymous method directly as the parameter of the Query function. We
     indicate we want to search in all the SemanticAction objects in the model, by specifying this
     as the first generic type of the Query function. We want a list of SemanticAction objects to
     be returned, so we specify this type as the second generic type of the function (line 2). The
     delegate is the first parameter of the Query function and contains the actual query in the form
     of a boolean expression (lines 3–6). In this example, the ActionType of the SemanticAction
     must be a Call type.
     We also order the results by the name of the operation. As the Query function returns an
     ExtendedList<SemanticAction> we can access the OrderBy function of this list (lines 7–10).
     Again, we use a delegate to specify the string to order by. In this case, the semantic action has
     a link to its parent block and this block links back to the operation it resides in. From there, we
     can access the name of the operation and use it as the sort key.
     We can use all the public properties of the semantic items in our query. Special properties have
     been added to facilitate common tasks, like checking for null values, or determining the type of
     an operand. Some examples of this type of functions are HasDestinationOperand, HasArguments,
     IsSemanticField, or HasReturnType.
     Since the query is actually a boolean function, other operations can be included as long as the
     query returns a boolean. However, the query is evaluated for each selected type in the model
     and thus a query containing complex code is not advisable, due to processing costs.
     Listing 8.2 is another example and shows the ability to return a different type than searched
 1   ExtendedList<SemanticOperation> ops =
 2     db.Query<SemanticAction,SemanticOperation>(
 3       delegate(SemanticAction sa)
 4       {
 5         return (sa.HasDestinationOperand &&
 6                 sa.DestinationOperand.IsSemanticField &&
 7                 sa.DestinationOperand.Name.Equals("value"));
 8       }
 9     ).Distinct();
            Listing 8.2: Find all operations using a field named value as their destination operand

     In this example, we want to find all the operations with an action where the destination
     operand is a field named value. Because this field is the destination operand, it is potentially1
     written to.
     We supply two different types to the Query function. The first one is the type we are looking
     for (actions) and the second one is the type we want to have returned (operations). In the
            The action with the destination operand might be in a conditional block and as such is not always executed.

     116                                               Automatic Derivation of Semantic Properties in .NET
                                                                               8.1 Semantic Database
     search query, we indicate that the SemanticAction object must contain a destination operand,
     that this operand must be a field, and the name of this field must be equal to value. Keep in
     mind, that if we change the order of the elements, we can get null reference exceptions if we are
     trying to access non-existent destination operands. The Distinct command at line 9 signals the
     ExtendedList object to remove duplicate elements. Placing this command here has the same
     effect as applying it directly to the ops variable.

     To show the usage of the grouping operator, see Listing 8.3. This example finds all the actions
     performing a jump to another block and group these actions by the name of the operation they
     are in. Lines 12 to 19 show how to use the Grouping class to display these items.

 1   ExtendedList<Grouping<string, SemanticAction>> groupedBy =
 2     db.Query<SemanticAction, SemanticAction>(
 3       delegate(SemanticAction sa)
 4       {
 5         return (sa.ActionType == SemanticActionType.Jump);
 6       })
 7     .GroupBy<string>(delegate(SemanticAction a)
 8     {
 9         return a.ParentBlock.ParentOperation.Name;
10     });
12   foreach (Grouping<string, SemanticAction> element in groupedBy)
13   {
14     Console.WriteLine("Jumps in operation {0}", element.Key);
15     foreach (SemanticAction sa in element.Group)
16     {
17       Console.WriteLine("--Jump to {0}", sa.LabelName);
18     }
19   }

                           Listing 8.3: Group jump labels by operation name

     The last example, Listing 8.4, shows how to find all actions assigning a value to an operand
     of type integer. To determine the type, we use the SemanticType object of the operand and
     indicate we want the type to be an integer (line 8). The check for the existence of a destination
     operand (line 6) is not really needed, since an assignment always uses a destination operand.
     However, if the Semantic Extractor did not assign a destination operand, the query can raise an
     exception at runtime.

 1   ExtendedList<SemanticAction> actions =
 2     db.Query<SemanticAction, SemanticAction>(
 3        delegate(SemanticAction sa)
 4        {
 5          return (sa.ActionType == SemanticActionType.Assign &&
 6            sa.HasDestinationOperand &&
 7            sa.DestinationOperand.SemanticType.CommonType ==
 8              SemanticCommonType.Integer);
 9        }
10     );

                    Listing 8.4: Retrieve all the assignments where an integer is used

     M.D.W. van Oudheusden                                                                        117
8.2 Applications
8.2   Applications

Besides the metamodel, the extractors and the database, two other programs were created. One
is a command line utility primarily used for testing, the other is a Windows Forms application
showing the contents of the metamodel and the control flow in a graphical way.
The console application, called SemanticExtractorConsole, accepts as its command line argu-
ments a number of assemblies, a list of plugins, and optional settings. It uses the default
Semantic Extractor Provider to analyze the .NET assemblies, stores the results in the Semantic
Database, and executes the plugins. Each plugin is called with a pointer to the database and can
perform its own analysis. This tool is primarily used for testing the complete system because
tasks can be automated using the command line switches and the plugins. Plugins are .NET
assemblies implementing a certain interface. It is also possible to use a source file as a plugin
and the console application will compile this source in-memory first. The source file must be a
valid C# or VB.NET class implementing the plugin interface.
The Windows Forms application provides a graphical user interface (GUI) to open an assembly.
Internally, it calls the default Semantic Extractor Provider and creates a graphical representation
of the model. Figure 8.1 shows a screenshot of this applications. In the tree view at the up-
per left part of the window, the metamodel is displayed. By selecting an element in this tree,
detailed information in the property grid in the lower left part is shown.
Additional support for displaying the correct data and providing metadata like descriptions of
each property has been added to the Semantic Metamodel using custom attributes. If the selected
element is an operation, a control flow graph is created and displayed in the upper right part. If
the user clicks on a flow block, then the contents (the blocks with actions) is listed in the actions

                   Figure 8.1: Windows Forms Semantic Analyzer application

118                                       Automatic Derivation of Semantic Properties in .NET
                                                                                     8.3 Plugins
8.3     Plugins

Various plugins are created to test the system and to provide specific analysis tasks. Each plu-
gin must implement the SemanticLibrary.Plugin.IPlugin interface as shown in Figure 8.2.

                                   Figure 8.2: Plugin interface

The SemanticExtractorConsole calls the Execute function for each plugin with as its arguments
the SemanticDatabaseContainer, which holds all the metamodels, and a dictionary of com-
mand line options not interpreted by the console application itself. This allows the user to
supply settings for the plugin. A plugin can write to the console window and when it has an
exception, it can simply raise this exception so the SemanticExtractorConsole can handle the er-
ror and display a message. The read-only IsEnabled property allows a plugin to be disabled
when needed. As a result it will not be executed by the SemanticExtractorConsole application.
Plugins provide a way to create separate analysis tasks and allow for the execution of multiple
plugins with one command. They are not part of the metamodel nor the Semantic Database
and are only used by the console application. The next sections give more details about the
implemented plugins.

8.3.1   ReifiedMessage Extraction

This plugin is used to provide a partially solution to the problem discussed in Section 4.1.2,
the ReifiedMessage. The use of aspects on a certain piece of code, changes the behavior of this
code. Not all the behavioral changes are expected and may even lead to conflicts [24, 72]. In
Compose we use the FILTH module, with order specification, to compute all the possible
orders of the filter modules and select one of those orders. The SECRET module, the Semantic
Reasoning Tool, reasons about the possible semantic conflicts in the ordering of filter modules.
It does this by analyzing all the possible actions of the filters, which either accept or reject a
message. However, the meta filter introduces behavior that the SECRET module cannot handle,
because the function called by the meta filter is defining the semantics of the filter. The function
executed by the meta filter has an argument containing a ReifiedMessage object, representing
the message. It can retrieve the target or the arguments, but also change the execution of the
message, like resume the regular execution of the filterset or reply, which returns the message
to the sender.
Currently developers writing a function using a ReifiedMessage should define the behavior of
this message as a custom attribute (see Section 6.1.10). This is a time consuming process, often

M.D.W. van Oudheusden                                                                        119
     8.3 Plugins
     not executed or updated and error prone. The Semantic Analyzer might be helpful here.
     The task of this plugin is to determine the behavior of the usage of the ReifiedMessage and
     report this to the developer. He/she can then add the custom attributes to the code. Because it
     is not possible to capture all the intended behavior, the custom attributes are not automatically
     inserted so they can be reviewed by the developer.
     We first need to find all the operations using a ReifiedMessage object in their arguments. List-
     ing 8.5 shows the query used to find this operations. The reifiedMessageType constant
     (line 7) contains the full name of the type used by Compose for the ReifiedMessage.
 1   ExtendedList<SemanticOperation> ops =
 2     db.Query<SemanticOperation, SemanticOperation>(
 3       delegate(SemanticOperation obj)
 4       {
 5         foreach (SemanticArgument arg in obj.SemanticArguments)
 6         {
 7           if (arg.SemanticType.FullName.Contains(reifiedMessageType))
 8             return true;
 9         }
10         return false;
11       });
                           Listing 8.5: Find operations using a ReifiedMessage

     The next step is is to iterate through all the found operations and get the specific argument
     containing the ReifiedMessage as shown in Listing 8.6.
 1   // Find the argument
 2   // p is one of the SemanticOperations in ops
 4   SemanticArgument reifiedMessageArg =
 5     p.SemanticArguments.First(delegate(SemanticArgument arg)
 6     {
 7       return (arg.SemanticType.FullName.Contains(reifiedMessageType));
 8     });
                         Listing 8.6: Find the argument using a ReifiedMessage

     If we have the specific argument containing the ReifiedMessage object, we have to look which
     actions in the operation are using this argument.We use a query, listed in Listing 8.7 to find
     those actions. This time we search for all the call actions because properties and functions of the
     ReifiedMessage object are used. Of course, this call action should have as its first argument the
     ReifiedMessage operand we found earlier. Since we do not want to specify that the action should
     be in the operation we are currently analyzing, we supply the operation itself (represented as
     the p variable at line 20) as the second parameter to the Query function. It is then used as the
     starting point for the native search visitor, creating a more time efficient search.
 1   // Find the actions which are performing an operation on the argument
 2   // passing the operation p as the start point
 3   IList<SemanticAction> actions =
 4     db.Query<SemanticAction, SemanticAction>(
 5       delegate(SemanticAction sa)
 6       {
 7         if (sa.ActionType == SemanticActionType.Call)
 8         {
 9           if (sa.HasArguments)
10           {

     120                                       Automatic Derivation of Semantic Properties in .NET
                                                                                            8.3 Plugins
11               // The first argument is the object on which we are
12               // calling the function
13               // This should be the reifiedmessage
14               return (sa.Arguments[0].IsSemanticArgument &&
15                       sa.Arguments[0].Equals(reifiedMessageArg));
16            }
17          }
18          return false;
19       }
20       , p);   // <-- we start at the operation
21               //     so we do not specify the operation name, class etc
              Listing 8.7: Retrieve all the calls to methods of the ReifiedMessage argument

     The collections of actions is now used to determine the behavior of the reified message us-
     ing the rules specified by Staijen’s work [72]. For example; when the getTarget function is
     called, we add the semantic to the list of semantics for this operation. A call to the
     reply function introduces the returnvalue.write, target.dispose, selector.dispose, args.dispose and
     message.return semantics.
     This temporary list is then converted to a custom attribute and shown to the developer. One of
     the future ideas of Staijen is to annotate methods invoked after a proceed call with a semantic
     specification. This is implemented in this plugin as shown in Listing 8.8, which also displays
     some of the possibilities of the ExtendedList object.
 1   // Get all the semantic actions
 2   ExtendedList<ISemanticAction> allActions = p.RetrieveAllSemanticActions();
 4   // Skip the first items until we find the proceed call
 5   allActions = allActions.SkipWhile(delegate (ISemanticAction            sa)
 6   {
 7     return !sa.ActionId.Equals(proceedActionId);
 8   });
10   // Filter for only call actions
11   allActions = allActions.Where(delegate (ISemanticAction sa)
12   {
13     return sa.ActionType == SemanticActionType.Call;
14   });
           Listing 8.8: Retrieve other methods which should be analyzed after a proceed call

     If we find a proceed call, we store the unique action id so we can retrieve all the call actions
     occurring after the proceed action. The methods called by the found action are displayed to the
     developer so he/she can have a look at those methods.
     Tests using the ReifiedMessage Extraction plugin on the examples accompanying Compose
     showed that the plugin retrieves the same semantics as already specified in the examples. It
     does not had any problems in automatically inducing the behavior of the reified message based
     on the calls to the functions of this object.
     However, there is still no control flow information used in this analysis. Certain call actions
     may never happen or may be executed multiple times. Currently this cannot be expressed in
     the custom attribute and is not further implemented. To do this in the future, we can use the
     flow capabilities of the metamodel to generate a control flow with access level information.
     See Section 7.2.6 for more information about control flows.

     M.D.W. van Oudheusden                                                                          121
8.3 Plugins
8.3.2   Resource Usage

The Resource Usage plugin creates a list containing all the operands used in an operation. Not
only the name of the operand is shown, but also if the operand is created, read from or written
to. The plugin uses the flow blocks generated by the control flow generator and adds the access
level to the found operands.

The result of this plugin will be in the form of the following output:

Read/write set of checkEmpty [pacman.World]:

The variables are created first because the IL standard enforces this rule. Constant values are
read and there are some read and write actions to the variables. The last character of each line
indicate the access level as explained in the following list:

1   The operand is accessed at least once;
?   The operand might be accessed. It could be conditional;
*   A conditional operand inside a loop. It can be accessed maybe more than once;
+   It is executed at least once and maybe more. A loop with a conditional at the end;
0   Unreachable code. It is never accessed.

The plugin iterates through all the flow blocks of all the operations in the database. Each flow
block contains semantic blocks with actions. The operands of each action are collected and if it
is a destination operand it is a write action and if the operand is a source operand, then it is a
read action. The access level of the flow block is added to the retrieved operand and the list is
presented to the user. A collection of local variables is added at the top of the list as they are
created when the operation is entered.

The read/write sets returned from the examples are similar to the operands used in the actual
source code. However, the Semantic Extractor and the compiler have introduced extra variables
for normalization or optimization purposes.

122                                       Automatic Derivation of Semantic Properties in .NET
                                                                  8.4 Integration with Compose
8.3.3   Export Types

Compose uses a separate .NET application, called TypeHarvester, to retrieve the types from
the source assemblies. The harvester uses .NET reflection to load and parse the assemblies,
iterate through all the type information, and store this information in a types.xml file. All the
information found with the reflection API is added to this file, which can become quiet large.
The xml file is imported during the Compose compilation process and stored in the repository.
A language model is created with this data to be used in the selector language.
The export types plugin performs the same action but uses its metamodel as the source. It does
not have all the properties reflection returns, but still have extensive type information available.
The GetContainers function of the SemanticDatabaseContainer is called and returns a col-
lection of SemanticContainer objects. Now we can use the collections inside the containers to
find the classes, fields, operations, and so on. This plugin does not use the search functionality
of the database, but only retrieves the top elements of the metamodel contents and use their
properties to further export the data. The XmlTextWriter of the .NET Framework performs the
actual writing to an xml file with the same structure as used by the TypeHarvester application.
Although it does not export all the information, it does provide an example how the metamodel
can be used to replace the TypeHarvester application and how the metamodel can be browsed
instead of being searched.

8.3.4   Natural Language

The metamodel provides information about the behavior of the functions in the form of actions.
The natural language plugin converts the actions to a natural language representation for each
operation. An example of the output of this plugin is listed below.
Description of operation ’isPlaying’ in Jukebox.Player [Jukebox.Player]
Assigning SemanticField named playing to SemanticVariable Local0
Jumping to block B_6 where we perform the following:

Exiting operation ’isPlaying’ and returning the contents of Local0

Each action and accompanying operands are converted to a textual representation. Jumps and
branching operations are handled in such a way that the flow of the function is still visible and
loops are only shown once.
Although the practical use of this plugin is little, it can also serve as a basis for a conversion of
actions to a more formal and standard representation. For instance, an xml representation or a
standard for specifying software. Allowing for automatically checking of requirements to the
actual implemented code is a more practical example.

8.4     Integration with Compose

The Semantic Analyzer designed for this assignment is a general purpose analyzer and can be
used for different kinds of tasks. However, one of the reasons to create this analyzer was to

M.D.W. van Oudheusden                                                                            123
8.4 Integration with Compose
provide more information to be used in Compose . As explained in the design limitations
(Section 7.1.2) we only have access to a .NET assembly (with method implementations) at the
end of the Compose compilation process. At that point, all the analysis tasks are already exe-
cuted and semantical join points can no longer be selected. This makes it difficult to integrate
the analyzer in the compilation process.
Another problem is the mismatch between the .NET Framework versions of Compose /.NET
and the Semantic Analyzer. The first runs under version 1.1 and the second uses version 2.0 be-
cause it depends on Microsoft Phoenix, which needs the latest version of the runtime. Because
of the dependency of Phoenix, which has rather large libraries (10 MB) and restrictive license,
we cannot make the analyzer part of the standard Compose /.NET install.
Based on the limitations we can make the following requirements for the integration;
  1. The Semantic Analyzer should be optional. If it is not installed, then Compose must still
     work correctly;
  2. The retrieved information must be placed in the internal database, called the repository,
     so the components of Compose can access the data from one central store;
  3. The analyzer must extract the semantics of the usage of the ReifiedMessage object and
     supply this information in such a way the SECRET module can work with it;
  4. Information about the resource usage of fields and arguments should be extracted from
     the sources and placed in the repository;
  5. Per function, retrieve the calls made to other functions and store them in the repository.
To be able to satisfy these requirements we had to make some changes to the compilation or-
der of the modules in Compose /.NET. We first generate a weave file specification and call
ILICIT [10] to weave the assemblies. At that point we can start the two SEMTEX modules.
The DotNETSemTexRunner runs the SemanticExtractorConsole application (see Section 8.2) with
a special Composestar plugin. This plugin creates an xml file with the analyzed data. The sec-
ond module, DotNETSemTexCollector, loads the xml file and places the data in the Compose
datastore. The next module to be executed is the SECRET module, which can now access the
semantical data. Finally, we create a runtime repository and copy the assemblies to the output
If the DotNETSemTexRunner cannot find the required files, it will simply skip the creation of the
xml file and an information message is shown to the developer indicating why this module is
not executed and how to remedy this. When the next module, DotNETSemTexCollector, cannot
find the xml file, it will continue to the following module. This takes care of point one of the
The DotNETSemTexRunner uses the SemanticExtractorConsole application to analyze the assem-
blies and execute a special plugin. This plugin contains elements of the various separate plug-
ins described in Section 8.3. It determines the usage of the ReifiedMessage, the read, write and
create actions of fields and arguments in the functions of the assemblies, and the calls to other
functions. The techniques to retrieve this type of data are described in the Plugins section. An
example of the resulting xml file is listed in Appendix F. The plugin satisfies requirements
three, four and five.
To add the information to the datastore of Compose , we extended the existing MethodInfo

124                                      Automatic Derivation of Semantic Properties in .NET
                                                              8.4 Integration with Compose
 class to hold additional collections of CallToOtherMethod objects, ResourceUsage objects
and a list of strings containing the ReifiedMessage usage. The MethodInfo class is automatically
stored in the datastore and can be used by the other components.
The DotNETSemTexCollector class parses the generated semtex.xml file, creates the necessary
objects such as CallToOtherMethod or ResourceUsage, assign the values from the xml file to
the properties of these objects and adds them to the corresponding MethodInfo object already
in the datastore.
To be able to use the automatically extracted semantics of the ReifiedMessage in the SECRET
module, we changed the MetaFilterAction class to use the found semantics stored in the
MethodInfo object. It will still use the semantics defined by the developer first before import-
ing the automatically created semantics.

M.D.W. van Oudheusden                                                                      125
                                                                              CHAPTER         9

                                             Conclusion, Related, and Future Work

In this final chapter, the results of the research and implementation of the Semantic Analyzer are
evaluated and conclusions are drawn. First, the related work is discussed and we end with a
look at future work.

9.1      Related Work

Automatically analyzing source code is certainly not a new field in computer science, there
are various applications which use the resulting data. For example, finding design patterns in
the source code, reverse engineer design documentation [69], generating pre- and postcondi-
tions [53], verifying software contracts [9], or checking behavioral subtyping [30]. Some ana-
lyzers are relatively simple, they might only count programming elements like the number of
lines, methods, or classes or determine the complexity of functions. Other analyzers, like the
Semantic Analyzer, convert the source code to a higher level view, a metamodel, to reason about
This section discusses a number of analyzers, which are basically all static code analyzers work-
ing with the semantics of the code.

9.1.1    Microsoft Spec#

Spec# (Spec sharp)1 is a Microsoft Research project attempting to provide a more cost efficient
way to develop and maintain high-quality software [6]. Spec# extends the C# programming
language with specification constructions like pre- and postconditions, non-null types, checked
exceptions, and higher-lever data abstraction. The Spec# compiler statically enforces non-null
types, emits run-time checks for method contracts and invariants, and records the contracts as

                                                                                                9.1 Related Work
    metadata for consumption by other tools in the process. Another application, the Spec# static
    program verifier, generates logical verification conditions from a Spec# program and analyzes
    the verification conditions to prove the correctness of the program or find errors in the code.
    The language enhancements of Spec# are in the form of annotations the developer can add
    to the existing code. Of course, this is not possible for code in third party libraries, such as
    the .NET Framework Class Library. Microsoft is working on a project for semi-automatically
    generating contracts for existing code.
    Spec# is a tool for correctness and verification checks. Although a part of Spec# uses runtime
    checking (using inlined code for pre- and postconditions), the static checking is not much dif-
    ferent from the Semantic Analyzer. The static program verifier constructs a metadata view of
    the code in its own intermediate language, called BoogiePL. It consists of basic blocks with
    four kinds of statements: assignments, asserts, assumes, and function calls. Extensive analyz-
    ing systems processes the IL code and extracts additional properties which are added to the
    program in the form of asserts and assumes statements. An automatic theorem prover is then
    used to verify the conditions in the program.
    The verification systems of Spec# are very advanced, although the metamodel is not very differ-
    ent from the model used in the Semantic Analyzer. Our metamodel also contains basic building
    blocks with assignments and calls, but we lack the assume and assert statements.

    9.1.2   SOUL

    A project similar to the Semantic Analyzer is the SOUL logic metaprogramming system [28].
    This system is designed to reason on a metalevel, about the structure of object-oriented source
    code in a language independent way using logic meta-programming [21]. By using logic rules
    it is possible to reason about the structure of object-oriented programs. For the SOUL system
    the languages Smalltalk and Java are used.
    To reason about Smalltalk code, the SOUL system was built. It consists of a Prolog-like lan-
    guage and an associated logic inference engine [86]. Logic facts and rules could be used to
    query about Smalltalk programs. The mapping between the logic meta language and the object-
    oriented source language is handled by a metalevel interface (MLI) implemented as a hierarchy
    of classes. The reflection capabilities of Smalltalk are used to build the MLI and method body
    statements are converted to functors1 and can be queried for. A logic repository contains logic
    predicates to use with the reasoning engine.
    An extension to SOUL for the Java platform is called SOULJava2 , which contains its own parser
    to convert Java code to the repository and adds new methods to the MLI, for instance to query
    for interfaces.
    Some applications of SOUL are detecting patterns like double dispatch or getters, or finding de-
    sign patters such as the visitor pattern. Using the logic rules, the LMI can be searched for certain
    constructions. For example, the rule for detecting getting methods is listed in Listing 9.1.
1   gettingMethod(?class,?method,?varname) if
2     method(?class,?method),
           Functors are objects that model operations that can be performed. In their simplest form they are somewhat
    like function pointers.
           Evolved to Irish;˜jfabry/irish/index.html

    M.D.W. van Oudheusden                                                                                       127
    9.1 Related Work
3     methodSelector(?method,?gettingname),
4     instVar(?class,?varname),
5     gettingMethodName(?varname,?gettingname),
6     varName(?var,?varname),
7     methodStatements(?method,<return(?var)>)

                            Listing 9.1: Selecting getters in SOUL using Prolog

    Idiom rules, like the methodSelector and gettingMethodName, are used to provide a mapping
    between the source language and the items in the model to cope with the language specific
    differences between Smalltalk and Java.

    SOUL and the Semantic Analyzer share some properties, like creating a higher level metamodel,
    being language independent, and providing a search mechanism. The implementation is how-
    ever different. The Semantic Analyzer has a language independent metamodel and the Semantic
    Extractors are responsible for the correct conversion, whereas SOUL uses idioms to deal with
    different language constructs. Furthermore, SOUL relies on the Prolog predicates for searching
    in the MLI. Our analyzer has a similar search system, but based on native queries. The ma-
    jor difference is the conversion of statements to actions, the behavioral representation, in the
    Semantic Analyzer. SOUL is more a syntactical than a semantical analyzer.

    9.1.3    SetPoints

    SetPoints is a system designed to evolve structure-based pointcuts into semantical pointcuts,
    called SetPoints [3]. The SetPoints project identifies the same problems as described in the
    motivation chapter (Section 4.1.1); join points should be applied based on the behavior of the
    code and not on naming conventions.

    The SetPoint Framework for .NET tries to solve this problem by linking program semantics,
    or views, to source code through metadata elements, such as custom attributes. The pointcut
    definitions are based on these annotations. SetPoint uses OWL1 and RDF2 to represent the

    Developers have to specify the relations between base code and the views of the system using
    custom attributes. The SetPoints developers are working on a version where Microsoft Phoenix
    is used to find program annotations.

    SetPoints differs from the Semantic Analyzer in a number of ways. SetPoints is primarily de-
    signed to define semantical join points and performs the actual weaving of aspects, whereas
    the Semantic Analyzer only provides a system to retrieve semantics. The later is a more general
    purpose semantics extractor and eventually could serve as an automated tool to retrieve the
    views for SetPoints. Selecting the correct join points is handled in SetPoint by RDF bindings.
    The Semantic Analyzer uses native queries to find information in the model and could be used
    to add semantical information to the selector language of the Compose project.

           Web Ontology Language;
           Resource Description Framework;

    128                                       Automatic Derivation of Semantic Properties in .NET
                                                                                      9.1 Related Work
     9.1.4     NDepend

     NDepend1 is a static analysis tool to generate reports, diagrams, and warnings about .NET
     assemblies. Internally it employs Cecil to parse IL byte code and create a representation of the
     code. Users can query the model with the Code Query Language (CQL), based on the SQL
     syntax. A separate tool, called VisualNDepend, offers a graphical user interface to edit CQL
     and shows the results using diagrams.
     Information about the inner workings of NDepend could not be found as it is not an open
     source project, but the Cecil library is used to analyze the assemblies and the CQL2 operates on
     the results to generate metrics. Some examples of CQL queries are listed in Listing 9.2.
 1   WARN IF Count > 0 IN SELECT METHODS WHERE NbILInstructions > 200
 2    ORDER BY NbILInstructions DESC
 3   -- Warn if a method has more then 200 IL instructions
 4   SELECT METHODS WHERE ILCyclomaticComplexity > 40
 5    ORDER BY ILCyclomaticComplexity DESC
 6   -- Return methods where the CC is greater then 40
 7   SELECT TYPES WHERE DepthOfInheritance > 6
 8    ORDER BY DepthOfInheritance DESC
 9   -- Select the types with an inheritance level > 6
10   SELECT TYPES WHERE Implement "System.Web.UI.IDataSource"
11   -- Select the types implementing the IDataSource interface
12   SELECT TOP 10 METHODS WHERE IsPropertyGetter OR IsPropertySetter
13    ORDER BY NbILInstructions DESC
14   -- Return a top 10 of property setters or getters order by the number of
                                     Listing 9.2: Examples of CQL queries

     NDepend offers a good framework for analyzing and querying assemblies. The CQL provides
     almost the same functionality as the native queries used in the Semantic Analyzer and the ex-
     amples in Listing 9.2 could also be rewritten to native queries to be used in our analyzer. The
     instruction level capabilities in NDepend are limited. There are some parameters available to
     use in search queries, such as finding property setters or IL complexity, but most of the search
     operations focus on metrics and are very structural oriented. The Semantic Analyzer goes fur-
     ther and allows searching for actions, the behavior of the code.

     9.1.5     Formal Semantics

     Besides the static analyzers described above, there is also related work to be found if we con-
     sider the semantics itself.
     There are three major approaches to semantics [61, 49];
     Operational semantics
         The computation of a construction specifies the meaning. How the effect of an operation
         is produced is important.
     Denotational semantics
         Mathematical objects are used to represent the effect of executing the constructs. The
         effect is important, not how it is obtained.
            CQL is defined in the standard located at

     M.D.W. van Oudheusden                                                                        129
9.2 Evaluation and Conclusion
Axiomatic semantics
    Assertions are used to express certain properties of a construct.
One way to formalize the semantics of constructions using an operational explanation is Struc-
tural Operational Semantics, also known as small steps semantics. Small steps semantics de-
scribes the individual steps of the computations. There is also big step semantics, in which
natural language is used to represent the overall results of the executions.
With the denotational semantics, mathematical functions are used and we can calculate the
effects of execution with a certain state using these functions. We do not look at the execution
itself, but only at a mathematical abstraction of the program. Although it is relatively simple to
reason with mathematical objects, converting a program to mathematics is not.
The axiomatic semantics approach is often used for proving the correctness of a program. For
instance, checking pre- and postconditions.
The Semantic Analyzer uses a form of operational semantics. Like the Structural Operational
Semantics, the emphasis is on the individual steps of the execution. However, our semantic
model is more a means to allow other tools to reason about software than a formal specification
of all the semantics in a source file. The control flow graph capabilities of the model and the
availability of operand data are an important part in specifying the semantics.
A related system, using both small step as big step semantics is presented in the paper “A
formal executable semantics for Java” [5]. A system of 400 inference rules is used to describe the
operational semantics of Java using the Typol logical framework. A syntactically correct Java
program, an abstract syntax tree, is converted to a semantical representation and a graphical
environment shows this representation.

9.2   Evaluation and Conclusion

There were multiple reasons to create the Semantic Analyzer. Besides the general purpose static
analyzer, three main issues were identified for the Compose project; the need for semantic
join points, program analysis, and fine grained join points.
After determining the meaning of semantics and how different kinds of semantic constructs
are represented in the target language IL, it was possible to design the Semantic Analyzer. Ba-
sically this system can be divided into three parts; extractors, the metamodel, and the search
The semantics used in the metamodel are based on the execution of their corresponding source
code constructions. We distinguish the basic elements used by programming languages, like
mathematical functions, control flow constructs, conversions, assignments, testing, and so on.
However, there is no formal specification of this metamodel and there is no evidence that the
model is correct and complete. The model is one of the elements in the whole Semantic Analyzer
system. We use it to store the retrieved semantics and search for specific behavior. The added
abilities to the metamodel, such as the control flow graphs and operand information, help us
in determining the behavior of code. The emphasis of the model was more on the usability by
the developers, than providing a complete formal model for semantics.
The Semantic Analyzer uses a static analyzer instead of a dynamic one. As discussed before,

130                                      Automatic Derivation of Semantic Properties in .NET
                                                                 9.2 Evaluation and Conclusion
this has its advantages. For one, the source code does not have to be executed and we can
use the Common Intermediate Language to cover a wide range of higher level .NET program-
ming languages. Another advantage is the ability to compute all the possible paths in the code.
However, we try to use static analysis to obtain runtime behavior. Certain information, such
as the actual executed functions, the control flow paths taken, the real values of the variables
are only known at runtime and can only be determined using dynamic analysis. At runtime
it is possible to tell what the actual behavior of a function is. The behavior can only be guar-
anteed for that specific execution run, because the user input can be different the next time.
Static analysis was selected, because this could give all the possible occurring actions. If we
combine the static analysis with dynamic analysis we might get additional information about
polymorphism, inheritance, and dynamic binding, but at a higher effort. Dynamic analysis is,
compared with static analysis, difficult to do because the application must be executed and all
the paths must be visited to get a complete overview of the code. This is not practical to do and
is very time consuming.

To perform the actual code analysis, we use Semantic Extractors. The Provider design pattern
allows us the select a specific provider and four different kinds of providers were created. Each
provider reads the source code, parses it, builds up a metamodel, and converts the statements
to actions. How this is implemented differs per provider. Four different types of IL readers
were discussed in Section 6.2 and for each type a provider has been created. Parsing IL byte
code is a complex operation because of the amount of metadata available. The four different
IL readers can deal with this complexity and offer a form of code object model containing all
the elements of the source code. At the time of implementation, the used IL readers were lim-
ited and error prone. The only one really capable of getting all the information was Microsoft
Phoenix. Although very advanced in its capabilities, it was difficult to use. The documentation
was scarce, the samples were limited, and the system is still undergoing changes with every
new release. The PostSharp system has evolved into a much better usable IL reader and it can
be interesting to see if it is possible to use this reader to get the semantics. A problem with
the Phoenix implementation was the extraction of custom attributes. This was not yet imple-
mented in Phoenix and a trick involving the default reflection capabilities of .NET had to be
used. This is relatively slow operation and better build-in support of Phoenix for attributes is

Converting statements into actions is, even with Phoenix, still a difficult process. We raise the
actual code back to a higher level representation, the metamodel. This means we are losing
some information as we are combining statements into a more general action. It is up to the
applications of the metamodel to use the data to deduce semantics from it. The model lacks
actions such as assume and assert, which are present in Spec#. One can argue that the metamodel
is not really a semantical model, but merely a simplified code representation. However, the
available actions in the model represent the behavior of a function in the source code. As
such, we can reason about the intended semantics of the function using the model. So the
metamodel alone is not enough, we need applications operating on the model to do the real
semantic analysis. The metamodel is only a general purpose collection of semantical related
actions combined with control flow and operand information.

The metamodel contains enough information to reason about code and the possible execution
of code. We have information about the control flow and we can retrieve the data flow. Because
of the normalization of the statements we can use the extensive operand information to track
the usage of certain data. For example, we can follow an argument in a function and see the

M.D.W. van Oudheusden                                                                        131
9.2 Evaluation and Conclusion
possible changes made to that operand. The model also contains flow graph capabilities. A de-
cision was made to include this type of functionality directly inside the model instead of using
plugins. The flow graphing capabilities are frequently used operations. They use the blocks
and actions to generate the flow blocks and need direct access to the model. The flow graph
functions operate on parts of the model and as such should also be present in the model. They
extend the models capabilities and are placed inside their own namespace in the metamodel

Besides the actions, the model also contains a complete hierarchical structure representing the
original code structure. The main purpose of this tree-like structure is to place the actions in
context. Actions belong to a function, a function resides inside a class, and the class is inside
a container. We need this type of information to map elements to source code, but it does not
provide a direct semantical purpose. The question arises if it possible to place the function
elements in another form of behavior describing the class. Strictly speaking, a class contains
related functions with certain behavior. The combined behavior can be used as the parent of
a single function. For example; a Car class can have the combined behavior of the function
Brake, Accelerate, Go left, and Go right. Finding and representing this behavior however is a
difficult problem. Not all the functions are implemented in this class, or they could be inherited,
overridden, or in a base class. Creating a notation for the semantics of a class requires extended
knowledge about the intended behaviors of all the functions, how they operate together, and
what type of added functionality they provide to the class.

To search the model, a number of different search techniques were discussed. Because of the
capabilities of native queries, this search system was selected. Although searching in this man-
ner is not new (Smalltalk has a similar system) it is currently gaining a lot of popularity in
the .NET programming world. Mostly because Microsoft is developing LINQ, which is built
on the underlying technique of native queries. The capabilities of the search function of the
Semantic Database is very similar to LINQ, the actual implementation is not. Our database uses
delegates to check each element in the database for a match. This is not an efficient method
and it would be better to create some sort of an expression tree based on the query function
and use indexes in finding the information. This entails the need to parse the native query and
convert the statements to an expression tree. This solution was not chosen, because of the extra
work involved. To optimize the database in the future, we could switch to LINQ when it is
released or make use of an object-oriented database like DB4O1 , which contains an extensive
native query framework.

A number of plugins were created to test the capabilities of the analyzer and to perform some
of the tasks defined in the motivation chapter. One of the reasons to create the analyzer was to
determine the behavior of the ReifiedMessage object. The plugin is now capable of reasoning
about the usage and generates the correct semantics to be used by the SECRET module. Cur-
rently it does not take into account if the behavior of the ReifiedMessage is conditional and
how the control flow is organized. It is not that difficult to gather this type of information, but
it is at the moment not possible to represent control flow data in the format used by SECRET.

Another plugin provides information about the resource usage and depends heavily on the
flow graph capabilities of the metamodel. Test runs with source code containing a large num-
ber of control flow statements showed that the control flow path algorithm had some serious
problems during the execution. The algorithms took a long time when the source code con-

132                                      Automatic Derivation of Semantic Properties in .NET
                                                                  9.2 Evaluation and Conclusion
tained multiple control flow paths (more than 50, so this were certainly not optimized methods
to start with). It would be wise to invest in stronger and more efficient algorithms for control
flow analysis. In addition, the flow graph capabilities can be extended by adding algorithms
for data flow and method call flow. These frequently used types of flow analysis can then be
accessed directly from the applications without implementing their own algorithm.

By combining the plugins, it was possible to integrate the analyzer with Compose .NET. Be-
cause of the dependencies on the Phoenix libraries and the .NET Framework 2.0 it was not
feasible to package the analyzer directly with the default Compose installation and it is now
an optional part. The information found by the plugin is added to the Compose repository,
but the actual usage of this data is still limited. The SECRET module can now take the extra
found semantics of the ReifiedMessage into account, but this is not extensively tested. In theory,
the added semantical information should partly take care of one of the problems discussed in
the motivation, namely the need for program analysis data. There is now more information
available about the usage of variables, the methods being called, and the ReifiedMessage behav-
ior, that this can be used to reason about potential conflicts and problems. However, it must be
noted that there is currently not a module in Compose .NET which uses this data for further
analysis. The information is available for future use.

Another problem identified in the motivation chapter was the problem of semantic join points;
the need to apply aspects based on semantical information instead of naming conventions or
structural properties. The Semantic Analyzer alone does not solve this, but can certainly be used
to help with this problem. The primary reason this is not yet implemented, is the compilation
order used by Compose .NET. We only have access to a compiled .NET assembly with method
bodies at the end of the process. At that time it is too late to perform the analysis and calculate
the join points based on semantical properties. If we really want to use the Semantic Analyzer
we will have to change the compilation process and perform multiple phases so we have direct
access to the assemblies, extract the semantics, add those semantics to the selector language,
and perform the actual weaving.

Although at this time not yet implemented, the Semantic Analyzer can certainly be used for se-
mantical join points determination. Semantical join points provide a better alternative to using
syntactical selection criteria like naming conventions, an opinion shared by others. Gybels and
Brichau [35] argue that we should write a crosscut as a specification in the form of patterns as
close to the intent of the crosscut as possible to make the crosscut robust to program evolutions.
Tourwe [78] proposes the need for more sophisticated and expressive crosscut languages. This
can partially solve the fragile pointcut problem. Some possible solutions have been presented,
for example SetPoint [3].

The problem with the semantic join points also applies to the fine grained join points, applying
aspects at statement level. The compilation structure does not allow us to perform fine grained
join point weaving, but this can be added in the future. While the statements are converted to
actions, we still save information to map the action back to the source code in the form of line
numbers and file names. The weaver can use this information to add additional code around
the original statements based on certain actions.

Automatically deriving semantical properties by analyzing the source code and the semantics
of the source language can certainly not solve all the problems. Some intent is not present in the
code but in the underlying system and getting a clear behavioral description of a function is not
possible. A call to a method in another library does not give us any semantical information if we

M.D.W. van Oudheusden                                                                          133
9.3 Future Work
do not have the code of this function for further analysis. However, the Semantic Analyzer offers
developers an extensive metamodel with basic behavioral actions, control flow information,
and operand data to reason about the possible intended behavior of the code.

9.3     Future Work

As with almost any research and software product, there is always room for more work. This
section offers some possible suggestions for future work.

9.3.1   Extractors

Currently there are four different Semantic Extractors, each with their own IL reader. Only the
Phoenix extractor is working properly, although more testing is certainly advisable. There are
also some improvements to be gained by using more Phoenix functionality like the control flow
capabilities instead of creating our own algorithms.

Using other IL readers, such as the updated PostSharp library or the .NET reflection capabilities
in .NET version 2.0, it should be possible to create good working extractors for .NET assem-
blies. More interesting is developing an extractor capable of converting other object-oriented
languages to the semantic metamodel. For instance, analyzing Java source or byte code, or Bor-
land Delphi (Object Pascal). The metamodel should be language independent and the extractor
can map the language specific parts to the corresponding elements in the model.

9.3.2   Model

The Semantic Metamodel is based on programming language constructions found in most lan-
guages. However, there is no formal specification for the metamodel available and investing
more time in this area can make the model more complete and concise. The model was largely
designed with usability and flexibility in mind. The cooperation with the native query search
mechanism is very extensive and using the model intuitive. This is achieved by the added
functionality, metadata, and comments.

Creating a more formal semantic model, using techniques described in [61] or [49], can lead to
a more concise metamodel with better abilities to reason about than the one currently available.

9.3.3   Graphs

Only control flow graphs are now available in the model. The algorithms used for this can
certainly be optimized, but also other graph capabilities can be added. For instance data flow,
to make it possible to track the flow of an operand in the model, or call graphs, to represent
calling relationships among operations in the model.

The metamodel should contain enough information to create these kinds of graphs.

134                                      Automatic Derivation of Semantic Properties in .NET
                                                                               9.3 Future Work
9.3.4    Applications

Only a small number of plugins make use of the Semantic Analyzer. It would be very interest-
ing to develop more applications for the metamodel. From simple structural applications like
calculation metrics, to more advanced behavioral tools. For instance, the detection of design
patterns, automatically generating pre- and postconditions based on the contents of functions,
checking for security, performance, or other problems, and so on.
Another interesting application is to see if the model is really language independent by con-
verting a program written in one language to a program written in another language using the
metamodel. For example, using a .NET program as the input source and applying a plugin to
generate a Java application with the same behavior.

9.3.5    Integration

The integration with Compose .NET is now limited. The analyzer is used to extract some
basic elements from the source assemblies, but the resulting information is not further applied
in any Compose analysis task, other than SECRET. A part of the reason is the compilation
process of Compose , making it difficult to get a complete assembly at the start of the analysis
modules, but also the two different programming languages used. It is not possible to use the
rich metamodel and search functions to get the exact data needed, but we have to use different
subsystems to get the data.
The analyzer is written in the .NET language C#, the Compose modules are in the Java lan-
guage. Communication between the programs is handled by the use of xml files. It is not
possible to directly call the Semantic Analyzer and work with the metamodel and native query
search functions from within the Compose process.
One solution for this problem is to port the analyzer to Java. The difficult part is the creation
of good .NET parsers. The native query functionality can be ported to Java, but need the latest
Java version supporting generics and anonymous methods to work correctly. Another solution
is to create more extensive plugins to gather information from the source code. It might be
wise to store the resulting data in another format than xml, for instance in an object-oriented
database like DB4O or ObjectStore1 , for performance reasons.


M.D.W. van Oudheusden                                                                       135

 [1] Ada. Ada for the web, 1996. URL
 [2] M. Aksit, editor. Proc. 2nd Int’ Conf. on Aspect-Oriented Software Development (AOSD-2003),
     Mar. 2003. ACM Press.
 [3] R. Altman, A. Cyment, and N. Kicillof. On the need for setpoints. In K. Gybels,
     M. D’Hondt, I. Nagy, and R. Douence, editors, 2nd European Interactive Workshop on Aspects
     in Software (EIWAS’05), Sept. 2005. URL

 [4] T. Archer and A. Whitechapel. Inside C#,Second Edition. Microsoft Press, Redmond, WA,
     USA, 2002. ISBN 0735616485.
 [5] I. Attali, D. Caromel, and M. Russo. A formal executable semantics for Java. Proceedings of
     Formal Underpinnings of Java, an OOPSLA, 98.
 [6] M. Barnett, K. Leino, and W. Schulte. The Spec# programming system: An overview. Con-
     struction and Analysis of Safe, Secure, and Interoperable Smart Devices: International Workshop,
     CASSIS, pages 49–69, 2004.
 [7] L. Bergmans.   Composing Concurrent Objects. PhD thesis, University of Twente,
     1994. URL

 [8] L. Bergmans and M. Aksit. Composing crosscutting concerns using composition filters.
     Comm. ACM, 44(10):51–57, Oct. 2001.
 [9] A. Beugnard, J. Jezequel, N. Plouzeau, and D. Watkins. Making components contract
     aware. Computer, 32(7):38–45, 1999. ISSN 0018-9162. doi: http://doi.ieeecomputersociety.
[10] S. R. Boschman. Performing transformations on .NET intermediate language code. Mas-
     ter’s thesis, University of Twente, The Netherlands, Aug. 2006.
[11] R. Bosman. Automated reasoning about Composition Filters. Master’s thesis, University
     of Twente, The Netherlands, Nov. 2004.

136                                                                               BIBLIOGRAPHY
[12] B. Cabral, P. Marques, and L. Silva. RAIL: code instrumentation for .NET. Proceedings of
     the 2005 ACM symposium on Applied computing, pages 1282–1287, 2005.
[13] W. Cazzola, J. Jezequel, and A. Rashid. Semantic Join Point Models: Motivations, Notions
     and Requirements. SPLAT 2006 (Software Engineering Properties of Languages and Aspect
     Technologies), March 2006.
[14] E. Chikofsky and I. JH. Reverse engineering and design recovery: a taxonomy. Software,
     IEEE, 7(1):13–17, 1990.
[15] Columbia University. The Columbia Encyclopedia, Sixth Edition. Columbia University Press.,
[16] O. Conradi. Fine-grained join point model in Compose*. Master’s thesis, University of
     Twente, The Netherlands, 2006. To be released.
[17] W. Cook and S. Rai. Safe query objects: statically typed objects as remotely executable
     queries. Proceedings of the 27th international conference on Software engineering, pages 97–
     106, 2005.
[18] W. Cook and C. Rosenberger. Native Queries for Persistent Objects A Design White Pa-
     per. Dr. Dobbs Journal (DDJ), February 2006. URL˜wcook/

[19] C. De Roover. Incorporating Dynamic Analysis and Approximate Reasoning in Declarative
     Meta-Programming to Support Software Re-engineering. PhD thesis, Vrije Universiteit Brussel
     Faculteit Wetenschappen Departement Informatica en Toegepaste Informatica, 2004.
[20] F. de Saussure. Course in General Linguistics (trans. Wade Baskin). Fontana/Collins, 1916.
[21] K. De Volder. Type-Oriented Logic Meta Programming. PhD thesis, Vrije Universiteit Brussel,
[22] R. DeLine and M. Fahndrich. The Fugue protocol checker: Is your software baroque.
     Unpublished manuscript, 2003.
[23] D. Doornenbal. Analysis and redesign of the Compose* language. Master’s thesis, Uni-
     versity of Twente, The Netherlands, 2006. To be released.
[24] P. E. A. Durr. Detecting semantic conflicts between aspects (in Compose*). Master’s thesis,
     University of Twente, The Netherlands, Apr. 2004.
[25] ECMA-335. Standard ECMA-335, 2006. URL

[26] T. Elrad, R. E. Filman, and A. Bader. Aspect-oriented programming. Comm. ACM, 44(10):
     29–32, Oct. 2001.
[27] M. D. Ernst. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSE
     Workshop on Dynamic Analysis, pages 24–27, Portland, OR, May 9, 2003.
[28] J. Fabry and T. Mens. Language-independent detection of object-oriented design patterns.
     Computer Languages, Systems, and Structures, 30:21–33, 2004.
[29] N. Fenton and S. Pfleeger. Software metrics: a rigorous and practical approach. PWS Publishing
     Co. Boston, MA, USA, 1997.

M.D.W. van Oudheusden                                                                         137
[30] R. B. Findler, M. Latendresse, and M. Felleisen. Behavioral contracts and behavioral sub-
     typing. In Proceedings of ACM Conference Foundations of Software Engineering, 2001. URL

[31] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: elements of reusable
     object-oriented software. Addison Wesley, 1995.

[32] H. Giese, J. Graf, and G. Wirtz. Closing the Gap Between Object-Oriented Modeling of
     Structure and Behavior. UML, pages 534–549, 1999.

[33] M. Glandrup. Extending C++ using the concepts of composition filters. Master’s the-
     sis, University of Twente, 1995. URL

[34] J. D. Gradecki and N. Lesiecki. Mastering AspectJ: Aspect-Oriented Programming in Java.
     John Wiley and Sons, 2003. ISBN 0471431044.

[35] K. Gybels and J. Brichau. Arranging language features for pattern-based crosscuts. In
     Aksit [2], pages 60–69.

[36] B. Harbulot and J. R. Gurd. Using AspectJ to separate concerns in parallel scientific Java
     code. In K. Lieberherr, editor, Proc. 3rd Int’ Conf. on Aspect-Oriented Software Development
     (AOSD-2004), pages 122–131. ACM Press, Mar. 2004. doi:

[37] W. Havinga. Designating join points in Compose* - a predicate-based superimposition
     language for Compose*. Master’s thesis, University of Twente, The Netherlands, May

[38] A. Heberle, W. Zimmermann, and G. Goos. Specification and Verification of Compiler
     Frontend Tasks: Semantic Analysis. 04/96 Verifix Report UKA, 7, 1996.

[39] F. J. B. Holljen. Compilation and type-safety in the Compose* .NET environment. Master’s
     thesis, University of Twente, The Netherlands, May 2004.

[40] R. Howard. Provider Model Design Pattern and Specification, Part 1. Technical re-
     port, Microsoft Corporation, 2004. URL

[41] R. L. R. Huisman. Debugging Composition Filters. Master’s thesis, University of Twente,
     The Netherlands, 2006. To be released.

[42] S. H. G. Huttenhuis. Patterns within aspect orientation. Master’s thesis, University of
     Twente, The Netherlands, 2006. To be released.

[43] E. International. Common language infrastructure (CLI). Standard ECMA-335, ECMA In-
     ternational, 2002. URL

[44] Jeffrey Richter. Type Fundamentals. Technical report, Microsoft Corporation, 2000. URL

[45] Jython. Jython homepage. URL

138                                      Automatic Derivation of Semantic Properties in .NET
[46] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An overview
     of AspectJ. In J. L. Knudsen, editor, Proc. ECOOP 2001, LNCS 2072, pages 327–353, Berlin,
     June 2001. Springer-Verlag.
[47] P. Koopmans. Sina user’s guide and reference manual. Technical report, Dept. of
     Computer Science, University of Twente, 1995. URL

[48] C. Koppen and M. Storzer. PCDiff: Attacking the fragile pointcut problem. In K. Gy-
     bels, S. Hanenberg, S. Herrmann, and J. Wloka, editors, European Interactive Workshop
     on Aspects in Software (EIWAS), Sept. 2004. URL

[49] S. Krishnamurthi. Programming Languages: Application and Interpretation. January 2006.
[50] J. Lefor.   Phoenix as a Tool in Research and Instrumentation.             2004.     URL

[51] S. Lidin. Inside Microsoft .NET IL Assembler. Microsoft Press, Redmond, WA, USA, 2002.
     ISBN 0-7356-1547-0.
[52] C. Lopes, L. Bergmans, M. D’Hondt, and P. Tarr, editors. Workshop on Aspects and Di-
     mensions of Concerns (ECOOP 2000), June 2000. URL

[53] C. Marti. Automatic contract extraction: Developing a cil parser. Master’s thesis, ETH
     Zurich, September 2003.
[54] Microsoft Corporation. Overview of the .NET framework. Technical report, Microsoft
     Corporation, 2003. URL

[55] Microsoft Corporation. What is the common language specification. Technical report, Mi-
     crosoft Corporation, 2003. URL

[56] Microsoft Corporation. .NET compact framework - technology overview. Technical
     report, Microsoft Corporation, 2003. URL

[57] Microsoft Corporation. What’s is .NET? Technical report, Microsoft Corporation, 2005.
[58] Microsoft Corporation. Design Guidelines for Class Library Developers. Technical re-
     port, Microsoft Corporation, 2006. URL

[59] Microsoft Corporation. Phoenix Documentation, 2006.
[60] I. Nagy. On the Design of Aspect-Oriented Composition Models for Software Evolution. PhD
     thesis, University of Twente, The Netherlands, June 2006.

M.D.W. van Oudheusden                                                                      139
[61] H. R. Nielson and F. Nielson. Semantics with applications: a formal introduction. John Wiley
     & Sons, Inc., New York, NY, USA, 1992. ISBN 0-471-92980-8.
[62] H. Ossher and P. Tarr. Multi-dimensional separation of concerns and the Hyperspace
     approach. In M. Aksit, editor, Software Architectures and Component Technology. Kluwer
     Academic Publishers, 2001. ISBN 0-7923-7576-9.
[63] A. Popovici, T. Gross, and G. Alonso. Dynamic weaving for aspect-oriented programming.
     In G. Kiczales, editor, Proc. 1st Int’ Conf. on Aspect-Oriented Software Development (AOSD-
     2002), pages 141–147. ACM Press, Apr. 2002.
[64] A. Popovici, G. Alonso, and T. Gross. Just in time aspects. In Aksit [2], pages 100–109.
[65] J. Prosise. Programming Microsoft .NET. Microsoft Press, Redmond, WA, USA, 2002. ISBN
[66] H. Rajan and K. J. Sullivan. Generalizing aop for aspect-oriented testing. In In the proceed-
     ings of the Fourth International Conference on Aspect-Oriented Software Development (AOSD
     2005), 2005.
[67] T. Richner. Recovering Behavioral Design Views: a Query-Based Approach. PhD thesis, Uni-
     versity of Berne, May 2002.
[68] T. Richner and S. Ducasse. Recovering high-level views of object-oriented applications
     from static and dynamic information. Proceedings ICSM, 99:13–22, 1999.
[69] C. D. Roover, K. Gybels, and T. D’Hondt. Towards abstract interpretation for recovering
     design information. Electr. Notes Theor. Comput. Sci., 131:15–25, 2005.
[70] P. Salinas. Adding systemic crosscutting and super-imposition to Composition Filters.
     Master’s thesis, Vrije Universiteit Brussel, Aug. 2001.
[71] D. R. Spenkelink. Compose* incremental. Master’s thesis, University of Twente, The
     Netherlands, 2006. To be released.
[72] T. Staijen. Towards safe advice: Semantic analysis of advice types in Compose*. Master’s
     thesis, University of Twente, Apr. 2005.
[73] D. Stutz. The Microsoft shared source CLI implementation. 2002.
[74] P. Tarr, H. Ossher, S. M. Sutton, Jr., and W. Harrison. N degrees of separation: Multi-
     dimensional separation of concerns. In R. E. Filman, T. Elrad, S. Clarke, and M. Aksit,
     editors, Aspect-Oriented Software Development, pages 37–61. Addison-Wesley, Boston, 2005.
     ISBN 0-321-21976-7.
[75] J. W. te Winkel. Bringing Composition Filters to C. Master’s thesis, University of Twente,
     The Netherlands, 2006. To be released.
[76] F. Tip. A survey of program slicing techniques. Journal of programming languages, 3:121–189,
     1995. URL
[77] M. A. Tobias Rho, Gnter Kniesel. Fine-grained generic aspects. 2006. URL http://www.˜leavens/FOAL/papers-2006/RhoKnieselAppeltauer.pdf.

[78] T. Tourw´ , J. Brichau, and K. Gybels. On the existence of the AOSD-evolution paradox. In
     L. Bergmans, J. Brichau, P. Tarr, and E. Ernst, editors, SPLAT: Software engineering Properties

140                                       Automatic Derivation of Semantic Properties in .NET
     of Languages for Aspect Technologies, Mar. 2003. URL˜eernst/

[79] M. D. W. van Oudheusden. Automatic Derivation of Semantic Properties in .NET. Mas-
     ter’s thesis, University of Twente, The Netherlands, Aug. 2006.
[80] C. Vinkes. Superimposition in the Composition Filters model. Master’s thesis, University
     of Twente, The Netherlands, Oct. 2004.
[81] N. Walkinshaw, M. Roper, and M. Wood. Understanding Object-Oriented Source Code
     from the Behavioural Perspective. Program Comprehension, 2005. IWPC 2005. Proceedings.
     13th International Workshop on, pages 215–224, 2005.
[82] D. Watkins. Handling language interoperability with the Microsoft .NET framework.
     Technical report, Monash Univeristy, Oct. 2000. URL

[83] D. A. Watt. Programming language concepts and paradigms. Prentice Hall, 1990.
[84] M. D. Weiser. Program slices: formal, psychological, and practical investigations of an automatic
     program abstraction method. PhD thesis, 1979.
[85] J. C. Wichman. The development of a preprocessor to facilitate composition filters in the
     Java language. Master’s thesis, University of Twente, 1999. URL http://trese.cs.

[86] R. Wuyts. A Logic Meta-Programming Approach to Support the Co-Evolution of Object-
     Oriented Design and Implementation. PhD thesis, 2001. URL

M.D.W. van Oudheusden                                                                             141
                                                                             APPENDIX        A

                                                                       CIL Instruction Set

In the following table, all the available operational codes of the CIL instruction set are listed
with their description.

      OpCode                   Description
      nop                      Do nothing
      break                    Inform a debugger that a breakpoint has been reached.
      ldarg.0                  Load argument 0 onto stack
      ldarg.1                  Load argument 1 onto stack
      ldarg.2                  Load argument 2 onto stack
      ldarg.3                  Load argument 3 onto stack
      ldloc.0                  Load local variable 0 onto stack.
      ldloc.1                  Load local variable 1 onto stack.
      ldloc.2                  Load local variable 2 onto stack.
      ldloc.3                  Load local variable 3 onto stack.
      stloc.0                  Pop value from stack into local variable 0.
      stloc.1                  Pop value from stack into local variable 1.
      stloc.2                  Pop value from stack into local variable 2.
      stloc.3                  Pop value from stack into local variable 3.
      ldarg.s                  Load argument numbered num onto stack, short form.
      ldarga.s                 fetch the address of argument argNum, short form
      starg.s                  Store a value to the argument numbered num, short form
      ldloc.s                  Load local variable of index indx onto stack, short form.
      ldloca.s                 Load address of local variable with index indx, short form
      stloc.s                  Pop value from stack into local variable indx, short form.
      ldnull                   Push null reference on the stack
      ldc.i4.m1                Push -1 onto the stack as int32.
      ldc.i4.0                 Push 0 onto the stack as int32.
      ldc.i4.1                 Push 1 onto the stack as int32.
      ldc.i4.2                 Push 2 onto the stack as int32.

    OpCode              Description
    ldc.i4.3            Push 3 onto the stack as int32.
    ldc.i4.4            Push 4 onto the stack as int32.
    ldc.i4.5            Push 5 onto the stack as int32.
    ldc.i4.6            Push 6 onto the stack as int32.
    ldc.i4.7            Push 7 onto the stack as int32.
    ldc.i4.8            Push 8 onto the stack as int32.
    ldc.i4.s            Push num onto the stack as int32, short form.
    ldc.i4              Push num of type int32 onto the stack as int32.
    ldc.i8              Push num of type int64 onto the stack as int64.
    ldc.r4              Push num of type float32 onto the stack as F.
    ldc.r8              Push num of type float64 onto the stack as F.
    dup                 Duplicate value on the top of the stack
    pop                 Pop a value from the stack
    jmp                 Exit current method and jump to specified method
    call                Call method
    calli               Call method indicated on the stack with arguments de-
                        scribed by callsitedescr.
    ret                 Return from method, possibly returning a value
    br.s                Branch to target, short form
    brfalse.s           Branch to target if value is zero (false), short form
    brtrue.s            Branch to target if value is non-zero (true), short form
    beq.s               Branch to target if equal, short form
    bge.s               Branch to target if greater than or equal to, short form
    bgt.s               Branch to target if greater than, short form
    ble.s               Branch to target if less than or equal to, short form
    blt.s               Branch to target if less than, short form
    bne.un.s            Branch to target if unequal or unordered, short form
    bge.un.s            Branch to target if greater than or equal to (unsigned or
                        unordered), short form
    bgt.un.s            Branch to target if greater than (unsigned or unordered),
                        short form
    ble.un.s            Branch to target if less than or equal to (unsigned or un-
                        ordered), short form
    blt.un.s            Branch to target if less than (unsigned or unordered), short
    br                  Branch to target
    brfalse             Branch to target if value is zero (false)
    brtrue              Branch to target if value is non-zero (true)
    beq                 Branch to target if equal
    bge                 Branch to target if greater than or equal to
    bgt                 Branch to target if greater than
    ble                 Branch to target if less than or equal to
    blt                 Branch to target if less than
    bne.un              Branch to target if unequal or unordered
    bge.un              Branch to target if greater than or equal to (unsigned or
    bgt.un              Branch to target if greater than (unsigned or unordered)

M.D.W. van Oudheusden                                                                  143
      OpCode      Description
      ble.un      Branch to target if less than or equal to (unsigned or un-
      blt.un      Branch to target if less than (unsigned or unordered)
      switch      jump to one of n values
      ldind.i1    Indirect load value of type int8 as int32 on the stack.
      ldind.u1    Indirect load value of type unsigned int8 as int32 on the
      ldind.i2    Indirect load value of type int16 as int32 on the stack.
      ldind.u2    Indirect load value of type unsigned int16 as int32 on the
      ldind.i4    Indirect load value of type int32 as int32 on the stack.
      ldind.u4    Indirect load value of type unsigned int32 as int32 on the
      ldind.i8    Indirect load value of type int64 as int64 on the stack.
      ldind.i     Indirect load value of type native int as native int on the
      ldind.r4    Indirect load value of type float32 as F on the stack.
      ldind.r8    Indirect load value of type float64 as F on the stack.
      ldind.ref   Indirect load value of type object ref as O on the stack.
      stind.ref   Store value of type object ref (type O) into memory at ad-
      stind.i1    Store value of type int8 into memory at address
      stind.i2    Store value of type int16 into memory at address
      stind.i4    Store value of type int32 into memory at address
      stind.i8    Store value of type int64 into memory at address
      stind.r4    Store value of type float32 into memory at address
      stind.r8    Store value of type float64 into memory at address
      add         Add two values, returning a new value
      sub         Subtract value2 from value1, returning a new value
      mul         Multiply values
      div         Divide two values to return a quotient or floating-point
      div.un      Divide two values, unsigned, returning a quotient
      rem         Remainder of dividing value1 by value2
      rem.un      Remainder of unsigned dividing value1 by value2
      and         Bitwise AND of two integral values, returns an integral
      or          Bitwise OR of two integer values, returns an integer.
      xor         Bitwise XOR of integer values, returns an integer
      shl         Shift an integer left (shifting in zeros), return an integer
      shr         Shift an integer right (shift in sign), return an integer
      shr.un      Shift an integer right (shift in zero), return an integer
      neg         Negate value
      not         Bitwise complement
      conv.i1     Convert to int8, pushing int32 on stack
      conv.i2     Convert to int16, pushing int32 on stack
      conv.i4     Convert to int32, pushing int32 on stack

144                        Automatic Derivation of Semantic Properties in .NET
    OpCode              Description
    conv.i8             Convert to int64, pushing int64 on stack
    conv.r4             Convert to float32, pushing F on stack
    conv.r8             Convert to float64, pushing F on stack
    conv.u4             Convert to unsigned int32, pushing int32 on stack
    conv.u8             Convert to unsigned int64, pushing int64 on stack
    callvirt            Call a method associated with obj
    cpobj               Copy a value type from srcValObj to destValObj
    ldobj               Copy instance of value type classTok to the stack.
    ldstr               push a string object for the literal string
    newobj              Allocate an uninitialized object or value type and call ctor
    castclass           Cast obj to class
    isinst              test if obj is an instance of class, returning null or an in-
                        stance of that
    conv.r.un           Convert unsigned integer to floating-point, pushing F on
    unbox               Extract the value type data from obj, its boxed representa-
    throw               Throw an exception
    ldfld                Push the value of field of object, or value type, obj, onto
                        the stack
    ldflda               Push the address of field of object obj on the stack
    stfld                Replace the value of field of the object obj with val
    ldsfld               Push the value of field on the stack
    ldsflda              Push the address of the static field, field, on the stack
    stsfld               Replace the value of field with val
    stobj               Store a value of type classTok from the stack into memory
    conv.ovf.i1.un      Convert unsigned to an int8 (on the stack as int32) and
                        throw an exception on overflow
    conv.ovf.i2.un      Convert unsigned to an int16 (on the stack as int32) and
                        throw an exception on overflow
    conv.ovf.i4.un      Convert unsigned to an int32 (on the stack as int32) and
                        throw an exception on overflow
    conv.ovf.i8.un      Convert unsigned to an int64 (on the stack as int64) and
                        throw an exception on overflow
    conv.ovf.u1.un      Convert unsigned to an unsigned int8 (on the stack as
                        int32) and throw an exception on overflow
    conv.ovf.u2.un      Convert unsigned to an unsigned int16 (on the stack as
                        int32) and throw an exception on overflow
    conv.ovf.u4.un      Convert unsigned to an unsigned int32 (on the stack as
                        int32) and throw an exception on overflow
    conv.ovf.u8.un      Convert unsigned to an unsigned int64 (on the stack as
                        int64) and throw an exception on overflow
    conv.ovf.i.un       Convert unsigned to a native int (on the stack as native int)
                        and throw an exception on overflow
    conv.ovf.u.un       Convert unsigned to a native unsigned int (on the stack as
                        native int) and throw an exception on overflow
    box                 Convert valueType to a true object reference

M.D.W. van Oudheusden                                                                   145
      OpCode        Description
      newarr        Create a new array with elements of type etype
      ldlen         Push the length (of type native unsigned int) of array on
                    the stack
      ldelema       Load the address of element at index onto the top of the
      ldelem.i1     Load the element with type int8 at index onto the top of
                    the stack as an int32
      ldelem.u1     Load the element with type unsigned int8 at index onto
                    the top of the stack as an int32
      ldelem.i2     Load the element with type int16 at index onto the top of
                    the stack as an int32
      ldelem.u2     Load the element with type unsigned int16 at index onto
                    the top of the stack as an int32
      ldelem.i4     Load the element with type int32 at index onto the top of
                    the stack as an int32
      ldelem.u4     Load the element with type unsigned int32 at index onto
                    the top of the stack as an int32
      ldelem.i8     Load the element with type int64 at index onto the top of
                    the stack as an int64
      ldelem.i      Load the element with type native int at index onto the top
                    of the stack as an native int
      ldelem.r4     Load the element with type float32 at index onto the top
                    of the stack as an F
      ldelem.r8     Load the element with type float64 at index onto the top
                    of the stack as an F
      ldelem.ref    Load the element of type object, at index onto the top of
                    the stack as an O
      stelem.i      Replace array element at index with the i value on the
      stelem.i1     Replace array element at index with the int8 value on the
      stelem.i2     Replace array element at index with the int16 value on the
      stelem.i4     Replace array element at index with the int32 value on the
      stelem.i8     Replace array element at index with the int64 value on the
      stelem.r4     Replace array element at index with the float32 value on
                    the stack
      stelem.r8     Replace array element at index with the float64 value on
                    the stack
      stelem.ref    Replace array element at index with the ref value on the
      conv.ovf.i1   Convert to an int8 (on the stack as int32) and throw an
                    exception on overflow
      conv.ovf.u1   Convert to a unsigned int8 (on the stack as int32) and
                    throw an exception on overflow

146                          Automatic Derivation of Semantic Properties in .NET
    OpCode              Description
    conv.ovf.i2         Convert to an int16 (on the stack as int32) and throw an
                        exception on overflow
    conv.ovf.u2         Convert to a unsigned int16 (on the stack as int32) and
                        throw an exception on overflow
    conv.ovf.i4         Convert to an int32 (on the stack as int32) and throw an
                        exception on overflow
    conv.ovf.u4         Convert to a unsigned int32 (on the stack as int32) and
                        throw an exception on overflow
    conv.ovf.i8         Convert to an int64 (on the stack as int64) and throw an
                        exception on overflow
    conv.ovf.u8         Convert to a unsigned int64 (on the stack as int64) and
                        throw an exception on overflow
    refanyval           Push the address stored in a typed reference
    ckfinite             Throw ArithmeticException if value is not a finite number
    mkrefany            Push a typed reference to ptr of type class onto the stack
    ldtoken             Convert metadata token to its runtime representation
    conv.u2             Convert to unsigned int16, pushing int32 on stack
    conv.u1             Convert to unsigned int8, pushing int32 on stack
    conv.i              Convert to native int, pushing native int on stack
    conv.ovf.i          Convert to an native int (on the stack as native int) and
                        throw an exception on overflow
    conv.ovf.u          Convert to a native unsigned int (on the stack as native int)
                        and throw an exception on overflow
    add.ovf             Add signed integer values with overflow check.
    add.ovf.un          Add unsigned integer values with overflow check.
    mul.ovf             Multiply signed integer values. Signed result must fit in
                        same size
    mul.ovf.un          Multiply unsigned integer values. Unsigned result must
                        fit in same size
    sub.ovf             Subtract native int from a native int. Signed result must fit
                        in same size
    sub.ovf.un          Subtract native unsigned int from a native unsigned int.
                        Unsigned result must fit in same size
    endfinally           End finally clause of an exception block
    leave               Exit a protected region of code.
    leave.s             Exit a protected region of code, short form
    stind.i             Store value of type native int into memory at address
    conv.u              Convert to native unsigned int, pushing native int on stack
    arglist             Return argument list handle for the current method
    ceq                 Push 1 (of type int32) if value1 equals value2, else 0
    cgt                 Push 1 (of type int32) if value1 > value2, else 0
    cgt.un              Push 1 (of type int32) if value1 > value2, unsigned or un-
                        ordered, else 0
    clt                 Push 1 (of type int32) if value1 < value2, else 0
    clt.un              Push 1 (of type int32) if value1 < value2, unsigned or un-
                        ordered, else 0
    ldftn               Push a pointer to a method

M.D.W. van Oudheusden                                                                   147
      OpCode       Description
      ldvirtftn    Push address of virtual method mthd on the stack
      ldarg        Load argument numbered num onto stack.
      ldarga       fetch the address of argument argNum.
      starg        Store a value to the argument numbered num
      ldloc        Load local variable of index indx onto stack.
      ldloca       Load address of local variable with index indx
      stloc        Pop value from stack into local variable indx.
      localloc     Allocate space from the local memory pool.
      endfilter     End filter clause of SEH exception handling
      unaligned.   Subsequent pointer instruction may be unaligned
      volatile.    Subsequent pointer reference is volatile
      tail.        Subsequent call terminates current method
      initobj      Initialize a value type
      cpblk        Copy data from memory to memory
      initblk      Set a block of memory to a given byte
      rethrow      Rethrow the current exception
      sizeof       Push the size, in bytes, of a value type as a unsigned int32
      refanytype   Push the type token stored in a typed reference
                     Table A.1: CIL instruction set

148                         Automatic Derivation of Semantic Properties in .NET
                                                                        APPENDIX   B

                                                             Evaluation Stack Types

The execution engine of the common language runtime implements a coarse type system for
the evaluation stack. Only the types listed in Table B.1 can be present on the stack.

          Type         Description
          int32        Signed 4-byte integer
          native int   Native integer, size dependent on the underlying platform
          int64        Signed 8-byte integer
          Float        80-bit floating point number (covering both 32 and 64 bit)
          &            Managed or unmanaged pointer
          o            Object reference

                             Table B.1: Evaluation Stack types

                                                                                 APPENDIX        C

                                                  Semantic Extractor Configuration File

     The Semantic Extractor uses a provider design pattern to specify the provider used to handle
     the calls to the base class. Different types of providers are defined and stored in the app.config
     file. The contents of this file is shown in Listing C.1.
 1   <?xml version="1.0" encoding="utf-8" ?>
 2   <configuration>
 3     <configSections>
 4       <section
 5        name="SemanticExtractors"
 6        type="SemanticLibrary.SemanticExtractorSection, SemanticLibrary"
 7        allowLocation="true" allowDefinition="Everywhere" />
 8     </configSections>
 9     <!-- Semantic Extractor Provider Settings -->
10     <SemanticExtractors defaultProvider="phoenix">
11       <providers>
12         <clear/>
13         <add name="cecil" description="Mono Cecil 0.2"
14          type="SemanticExtractorCecil.SemanticExtractorCecil,
15          SemanticExtractorCecil" />
16         <add name="phoenix" description="Microsoft Phoenix"
17          type="SemanticExtractorPhoenix.SemTexPhoenix,
18          SemanticExtractorPhoenix" />
19         <add name="rail" description="Runtime Assembly Instrumentation Library"
20          type="SemanticExtractorRail.SemTexRail, SemanticExtractorRail" />
21         <add name="postsharp" description="PostSharp reads .NET binary modules,
22          represents them as a Code Object Model, lets plug-ins analyze and
23          transforms this model and writes it back to the binary form."
24          type="SemanticExtractorPostSharp.SemTexPostSharp,
25          SemanticExtractorPostSharp" />
26       </providers>
27     </SemanticExtractors>
28   </configuration>
                               Listing C.1: Contents of the app.config file

                                                                               APPENDIX     D

                                                                          Semantic Actions

Table D.1 list all the available semantic action kinds and their properties.

 Action            Properties                    Description
 Assign            Source, destination           Assignment of a value. The value of source is
                                                 assigned to destination.
 Negate            Source, destination           Negates a value
 Add               Source1,           source2,   Add the value of source1 to source2 and store
                   destination                   result in destination
 Not               Source1, destination          Bitwise complement of the source
 Multiply          Source1,           source2,   The values of source1 and source2 will be mul-
                   destination                   tiplied and placed in the destination.
 Divide            Source1,           source2,   The value of source1 will be divided by the
                   destination                   value of the source2 and the result will be
                                                 placed in the destination.
 Remainder         Source1,           source2,   The remainder action divides source1 by
                   destination                   source2 and places the remainder result in the
 Subtract          Source1,           source2,   The value of source1 will be subtracted from the
                   destination                   value of source2 and the result is placed in the
 And               Source1,           source2,   An and operation is performed on the values of
                   destination                   source1 and source2 and the result is placed in
                                                 the destination.
 Or                Source1,           source2,   An or operation is performed on the values of
                   destination                   source1 and source2 and the result is placed in
                                                 the destination.
 Xor               Source1,           source2,   A bitwise Xor operation is performed on the
                   destination                   values of source1 and source2 and the result is
                                                 placed in the destination.

 Action           Properties                  Description
 Jump             Labelname                   A jump to the LabelName is performed.
 Branch           ConditionAction,            A branch action is performed and based on the
                  truelabel, falselabel       condition it will jump either to the TrueLabel-
                                              Name or the FalseLabelName. The condition
                                              can be found in the ConditionAction property.
 Compare        ComparisonType, source1, A comparison is performed on the source1
                source2, destination          and source2 values using the ComparisonType.
                                              The resulted boolean value is placed in the
 Create         Destination,                  An new instance or a new array is created. The
                operationname                 new object or array is placed in the destination
                                              and in this operand you can find the type and
                                              name. If the create operation calls a construc-
                                              tor, then you can find this constructor in the
                                              operationname property.
 Convert        Source, destination           A conversion is performed on the source to the
                                              destination. The new type can be found in the
                                              destination properties while the old type infor-
                                              mation is still present in the source.
 Call           OperationName,                A call to another operation is made. The
                destination                   name of this operation can be found in
                                              the OperationName property.            It’s return
                                              value, when available, will be placed in the
 Return         Source                        The control flow is returned to the calling op-
                                              eration. This means the end of the current op-
                                              eration. The return value can be placed in the
 RaiseException Source                        Raises an exception. The exception type is
                                              placed in source.
 Test           Source1,           source2, Test if source1 is equal to source2 and store the
                destination                   result in the destination.
 Switch         Source1, SwitchLabels         A switch construction where the source1 de-
                                              fines a value indication the label to jump to. The
                                              labels can be found in the SwitchLabels.
                        Table D.1: Available semantic actions kinds

152                                       Automatic Derivation of Semantic Properties in .NET
                                                                        APPENDIX      E

                                                                      Semantic Types

Table E.1 list all the available semantic common type kinds.

        Type name             Description
        Unknown               It is an unknown type. If a type cannot be determined
                              or is subject to change, set the semantic type to Un-
        Char                  A single character.
        String                Strings are used to hold text.
        Byte                  Represents an 8-bit signed integer.
        Short                 Represents a signed 16-bit integer.
        Integer               Represents a signed 32-bit integer.
        Long                  Represents a signed 64-bit integer.
        Float                 The float keyword denotes a simple type that stores
                              32-bit floating-point values.
        Double                The double keyword denotes a simple type that stores
                              64-bit floating-point values.
        Boolean               Represents a boolean value.
        Object                Represents a general object.
        Unsigned Short        Represents an 16-bit unsigned integer.
        Unsigned Integer      Represents an 32-bit unsigned integer.
        Unsigned Long         Represents an 64-bit unsigned integer.
        Unsigned Byte         Represents an 8-bit unsigned integer.
        DateTime              A date and/or time field.
                       Table E.1: Available semantic common types

                                                                                 APPENDIX        F

                                                                   SEMTEX Generated File

     The ComposeStarAnalyserPlugin is called during the compilation process and analyzes the as-
     semblies created by the weaver. Basically it provides three main tasks; extracting the semantics
     of the ReifiedMessage, the resource usage of fields and arguments, and the calls made to other
     functions. See Section 8.4 for more information. The information is stored in an xml file so
     it can be imported and used by the other Compose modules. An example of this xml file is
     shown in Listing F.1.
1    <?xml version="1.0" encoding="utf-8"?>
2    <SemanticContainers>
3       <SemanticContainer name="pacman.ConcernImplementations.ScoreIncreaser" sourcefile
4        <SemanticClass name="pacman.ConcernImplementations.ScoreIncreaser">
5          <FullName>[pacman.ConcernImplementations.ScoreIncreaser.dll]pacman.
 6         <BaseType>[mscorlib]System.Object - Object</BaseType>
 7         <SemanticMethod name="increase">
 8           <ReturnType>System.Void</ReturnType>
 9           <CallsToOtherMethods>
10             <Call operationName="getArgs" className="Composestar.RuntimeCore.FLIRT.
                   Message.ReifiedMessage" />
11             <Call operationName="toString" className=""
12             <Call operationName="parseInt" className="java.lang.Integer" />
13             <Call operationName="handleReturnMethodCall" className="Composestar.
                   RuntimeCore.FLIRT.MessageHandlingFacility" />
14             <Call operationName="setArgs" className="Composestar.RuntimeCore.FLIRT.
                   Message.ReifiedMessage" />
15             <Call operationName="append" className="java.lang.StringBuffer" />
16             <Call operationName="getSelector" className="Composestar.RuntimeCore.FLIRT
                   .Message.ReifiedMessage" />
17             <Call operationName="append" className="java.lang.StringBuffer" />
18             <Call operationName="handleReturnMethodCall" className="Composestar.
                   RuntimeCore.FLIRT.MessageHandlingFacility" />
19             <Call operationName="append" className="java.lang.StringBuffer" />

20             <Call operationName="append" className="java.lang.StringBuffer" />
21             <Call operationName="handleReturnMethodCall" className="Composestar.
                   RuntimeCore.FLIRT.MessageHandlingFacility" />
22             <Call operationName="println" className="" />
23             <Call operationName="handleVoidMethodCall" className="Composestar.
                   RuntimeCore.FLIRT.MessageHandlingFacility" />
24           </CallsToOtherMethods>
25           <ReifiedMessageBehaviour>
26             <Semantic value="" />
27           </ReifiedMessageBehaviour>
28           <ResourceUsages>
29             <ResourceUsage name="message" operandType="SemanticArgument" accessType="
                   read" accessOccurence="AtLeastOnce" />
30             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="AtLeastOnce" />
31             <ResourceUsage name="this" operandType="SemanticArgument" accessType="
                   write" accessOccurence="AtLeastOnce" />
32             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="AtLeastOnce" />
33             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="MaybeMoreThenOnce" />
34             <ResourceUsage name="message" operandType="SemanticArgument" accessType="
                   read" accessOccurence="MaybeMoreThenOnce" />
35             <ResourceUsage name="message" operandType="SemanticArgument" accessType="
                   read" accessOccurence="MaybeMoreThenOnce" />
36             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="MaybeMoreThenOnce" />
37             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="MaybeMoreThenOnce" />
38             <ResourceUsage name="this" operandType="SemanticArgument" accessType="read
                   " accessOccurence="MaybeMoreThenOnce" />
39           </ResourceUsages>
40         </SemanticMethod>
41       </SemanticClass>
42     </SemanticContainer>
43   </SemanticContainers>
                    Listing F.1: Part of the SEMTEX file for the pacman example

     M.D.W. van Oudheusden                                                            155

To top