andy maule - thesis00022

Document Sample
andy maule - thesis00022 Powered By Docstoc
					                                         4.1. Fact Extraction                                            54

     The predicates and terms in facts can be arbitrary, but for the remainder of this dissertation we shall
consider only facts with two terms, where the first term is always an identifier that represents a SOI, and
the second term represents some information that is related to the SOI by the predicate. These SOI facts
can be thought of as something interesting that we can observe about a particular SOI.
     We represent SOIs and facts more formally, as follows:

 SourceLoc         sourceLoc
 Predicate         d                 ∈ { ConcAtLine, Concats, CreatesStr, CreatesStrWithId,
                                     ExecutesQuery,         ExecutesQueryWithId,       Executes,
                                     ExecutedAtLine,      LdStr,    LdStrAtLine,   PrmAddName,
                                     PrmAddNameWithId,             PrmAddType,      PrmAddLine,
                                     PrmAddTo, ReadsColumn, ReadsColumnWithId, ReadAtLine,
                                     ReadsResultObject, ReturnsResult, UsesParams }
 Term              t                 ∈ {Reg + RegLoc + Loc + SourceLoc}
 Fact              e                 ∈ Predicate × Term

A sourceLoc is a token that represents a location in the program source code; this would be specified in
our analysis using the full path to the file and the line number of the location within the file.
     A predicate d, is member of the set Predicate. We define predicates for the constant values
shown, and the semantic meanings of these predicates are described in Appendix A, but as an example,
the ConcAtLine predicate indicates that a concatenation SOI occurred at the location in the program
specified by the second term, and the Concats predicate indicates that a concatenation SOI involves
the concatenation of the regular string identifier specified by the second term. These predicates are
not exhaustive and consist of the predicates were identified during the implementation of our analysis.
We created change impact analysis scripts for a set of likely schema changes, as we shall discuss in
Section 4.2.2. We then created predicates that were suitable for supplying all the information required to
perform this required impact calculation. The predicates defined here are suitable for analysing libraries
such as ADO.NET, and were suitably expressive to perform the case study described in Chapter 7. We
believe that these predicates will be sufficient for the majority of standard data access libraries. These
predicates are not sufficient when impact calculation scripts need more information. For example, if we
wanted to create an impact analysis script that searched for queries that were in a transaction, we might
need a predicate such as InTransaction that denotes if a given query is executed within a transaction. We
can add more predicates arbitrarily, as required, simply by expanding the definition of the Predicate
set, then they can be used in the same way as the predicates we have already defined, as we shall discuss
in the remainder of this section. Our analysis must be able to cope with new and varied DBMS features
and different persistence technologies, as specified by Requirement-4, Chapter 2. Being able to add new
predicates as required, helps satisfy this requirement.
     A term is a regular string (r in the query analysis abstract domain), a heap location identifier (l in
the query analysis abstract domain) or a regular string heap location identifier (lreg in the query analysis
abstract domain). These identifiers are required to establish the traceability relationships between SOIs
                                           4.1. Fact Extraction                                          55

(as we shall describe below), whilst the regular strings represent the values of queries at given locations.
A member of Term will always represent the second term in the fact, as the first term is always specified
by the identifier of the SOI to which the facts belong. For example, consider the SOI with the identi-
fier SOI1, which has a represented as {ExecutesQuery, ”SELECT...”}. This representation actually
describes the fact:

                                  ExecutesQuery(SOI1, ”SELECT...”)

We shall see more examples of facts and SOIs in the remainder of this chapter.
     A statement occurrence in the query analysis becomes an SOI if it produces facts. Thus for every
type of interesting statement we define a function to discover these facts, of the form:

                      F acts t : (σ × h × h reg × OriginalReg × sourceLoc) → P(Fact)

The F acts function, supplied with the outgoing flow state from the fixed point solution, and a source
location parameter, will calculate the fact produced by the SOIs of type t. Continuing our example, we
define a F acts function for calls to OleDbCommand.ExecuteReader() as follows:
 F acts x = call OleDbCommand.ExecuteReader(y) (σ × h × h reg , O, sourceLoc)             =
       {(ExecutesQuery, p) | p ∈ lookup(V , h reg )}+
       {(ExecutesQueryWithId, o) | lreg ∈ V , o ∈ O(lreg )}+
       {(Executes, l) | l ∈ σ(y)}+
       {(ReturnsResult, l) | l ∈ σ(x)}+
     {(ExecuteAtLine, sourceLoc)}
   where V =          {V | l ∈ σ(y), h(l) = V u }

The function returns a set of facts, based upon the values in the supplied flow state. A fact with the
ExecutesQuery predicate is created for every string value in the content of the heap variable y; it is
assumed that y is the variable containing a reference to the receiver object upon which the method is
being called, in this case an instance of the OleDbCommand class.
     The fact with the ExecutesQueryWithId predicate, lists the original strings’ identifiers for each string
that was found by the ExecutesQuery fact. This allows us to trace the constituent strings of each possible
query that could be executed here; we shall discuss exactly how this traceability is established using
these values, later in Section 4.1.1.
     The fact with the Executes predicate, is added for every possible query representing heap object that
is being executed here, allowing us to trace this query execution to its original definition.
     The fact with the ReturnsResult predicate, is added for the values of the x variable that represents
the query results object, in this case an OleDbDataReader. This lets us trace the returned results of the
query to the places where they are used.
     The fact with ExecutedAtLine predicate, is added with the supplied source code location parameter.
Almost every SOI will include a similar location fact, so that we can find the place where it occurred in
                                            4.1. Fact Extraction                                         56

the original source code.
      The where clause for the function contains the value V u which is used to represent the value of a
heap location, consisting of a value with a uniqueness. In this case we are only interested in the value V ,
so the uniqueness, u, is a free variable that is discarded.
      Evaluating this fact extracting function for our example, on Line 5 of Listing 3.7, we would obtain
the following:
QueryExecSOI1 =
      (ExecutesQuery, ”SELECT id, contact name, company name FROM Supplier WHERE id=@ID;”),
      (ExecuteQueryWithId, l1 ),
      (Executes, l1 ),
      (ReturnsResult, l2 ),
      (ExecutedAtLine, ”Example.cs : 05”)

These tuples represent facts that provide information about the SOI. We can extract facts by evaluating
fact extracting functions, as we have shown for the the OleDbCommand.ExecuteReader method, but we
will also need to extract the facts for other interesting statements. For example, consider the statement
for evaluating a string expression:

 F acts x := (s, l1 ) (σ × h × h reg , O, sourceLoc)     =
               reg reg
     {(LdStr, l2 )|l2    ∈ V )}+
     {(LdStrAtLine, sourceLoc)}
                reg         reg
     where (V, h2 ) = E s, l1 (σ, h reg )

This function creates two facts, the first fact relates the SOI to the term using the LdStr predicate, and
uses the regular-string-heap location-identifier of the newly added string as the term. This first fact
simply represents that a constant string has been loaded at this location. The second fact relates the SOI
to the term using the LdStrAtLine predicate, and uses the source code location of the SOI as the term.
This fact simply represents where in the program this SOI occurred.
      Just as we have defined fact extraction for calls to OleDbCommand.ExecuteReader(), and for string
literal instructions, we have also defined fact extraction functions for significant items in the C# language
and the .NET framework, as used by our prototype implementation, which we will describe in Chapter 6.
The fact extraction functions we have defined, can be found in Appendix A. For any new API or data
access features we wish to analyse, we can add new fact extraction functions as required.

4.1.1     Relating Facts
Some of the predicates we have described, are used to create facts for providing traceability to other
facts or other SOIs. In Section 3.3.1 we discussed the need for traceability in our analysis, which we can
achieve by establishing relationships between the facts of different SOIs. We shall illustrate how facts

Shared By: