Erlang refactoring with relational database by coronanlime


									   Erlang refactoring with
   relational database

         Anikó Víg and Tamás Nagy
    Zoltán Horváth and Simon Thompson
Project members: László Lövei, Tamás Kozsik

     Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent
 Refactoring is restructuring program code
  without altering its external behaviour
 Aims: restructuring of code, code quality
  improvement, coding conventions,
  optimization, migrating to new API
 In general not available for functional
  languages, only: HaRe (AST) and Clean
  refactoring (relational database)
 Functional programming language and runtime
    environment developed by Ericsson
   Designed to build distributed, reliable, soft realtime
    concurrent systems (telecommunication)
   Highly dynamic nature
   Lightweight processes and message passing
   Messages can be sent to a process ID or registered
    name,which are bound to function code at runtime.
   Possibility of running dynamically created code (eval,
    hot code replacement).
 Function id-s are atoms, atoms may be constructed
    runtime (and passed as arguments to apply,spawn).
    The syntactical category of an atom may be a
    function or ”string”, etc.
   Side effects are restricted to message passing and
    built-in functions
   Modules with explicit interface definitions and static
    export and import lists
   No static type system
   Variables are assigned a value only once in their life
   Variables are not typed statically, they can have a
    value of any data type.
The refactor tool

                              ODBC (?)

   Emacs        Erlang node                MySQL

  Source code   AST of the               Source in the
   In Emacs      source                   database
Code , AST
       Storing the code in database
 Every node in a tree has a unique identifier.
 Every module has a unique module identifier.
 Almost every syntax-tree node type has an
  own database table
 The records in the tables contain the
  identifiers of the current nodes and their
  children’s ones.
 We store the positions, node types and
  names in separate tables.
Semantic informations
 We store the semantic informations in
  separate tables: identical variables, function
  definitions and their calling expressions,
  scope of the nodes, hierarhy of scopes
 AST + semantic informations = graph, not
   + Searching and using a graph is more efficient
     with relational database as traversing the tree.
   - The storing-recovering of the code and
     communication with the database is more
Our own tables: visibility of variables
Our own tables: function calls
Our own tables: visibility of scopes
Problems during the building up
 The Erlang prepocessor substitutes the
  macro definitions     using epp_dodger
  instead of epp
 Every node has only the line number of
  position informations      using erl_scan1
  (modified by Huiqing) instead of erl_scan: it
  can give back the column information too
Problems during the building up
 In Erlang language there are many types of
  comments: we have not only comments, but pre- and
  postcomments too, which can be list of comment
 The erl_comment_scan:file collects the comments
  from the file, and we have to put them too with the
  correct position information into the database.
 There is the same problem with the column
  infomation so we use erl_recomment1 instead of
The algorithms
 We use postorder traverse on the syntax tree
  to give identifiers and put the nodes into the
  correct table.
 We use preorder traverse on the syntax tree
  to get the visibility information.
 The other information come from the
  database (collected by separate processes),
  for example the information of the function
Analysis for Refactoring Steps
 Syntax analysis (AST)
 Static semantics (Annotated AST or relational
  database): scope, visibility, binding
  structure, type information.
 Side-condition analysis
 Compensations
 Dynamic function calls (apply, spawn, etc.)
 Syntactical, semantical and library
Rename variable
 Definition: Find every occurrence of the
  variable (i.e. the variables with the same
  name in the visibility range of the variable)
  and replace every occurrence with the new
 Precondition: The new variable name is not
  visible at any occurrence of the variable
 Limited to one module (no global variables)
Rename function
 Definition: The refactoring rely on finding the
  definition and every place of call for a given
  function and substitute it with a new name.
 Preconditions:
      No name clash in the current module (existing
       functions, import list)
      No name clash in other modules, if the
       function is exported
Reorder arguments
 Definition: Change the order of arguments in
  the same way at the definition and every
  place of call for a given function.
 Preconditions:
      No side effect of the parameters (just planned)
 Tricky implicit function calls   delete,
  create subtree (the same problem will be at
  the tuple arguments refactor step too)
Implicit function example
Tuple arguments
 Definition: Change the way of using some
  arguments at the definition and at every place of call
  for a given function by grouping some arguments into
  one tuple argument.
 Preconditions:
      The given position must be within a formal argument of
       a function definition
      The function must be a declared function, not a fun-
      The given number must not be too large
      No name clash if the arity is changing (not only in the
       current module if the function is exported)
Eliminate variable
 Definition: All instances of a variable are replaced
  with its bound value in that region where the variable
  is visible. The variable can be left out where its value
  is not used.
 Preconditions:
      It has exactly one binding occurrence on the left hand
       side of a pattern matching expression, and not a part of
       a compound pattern.
      The expression bound to the variable has no side
      Every variable of the expression is visible (that is, not
       shadowed) at every occurrence of the variable to be
Eliminate variable cont.
 Decide if an occurrence is needed (remove or
  replace). Remove if:
      Not at the end of block expression
      Not at the end of clause body
      Not at the end of the recieve expression action
      Not at the end of try expression body, after branch and
 Replicate subtree, because we need unique id-s in
  the new subtrees. The subtree can contain every
  node type, we had to implement the most of the
  syntax tool module for database representation.
Planned refactor steps (short term)
 Merge subexpression duplicates: All instances of
  the same subexpressions are stored in a variable that
  the user gives, then all instances of the original
  subexpression are changed to the variable.
 Extract function: An alternative of a function
  definition might contain a sequence of expressions
  which can be considered as a logical unit, hence a
  function definition can be created from it. The
  extracted function is lifted to the module level, and it
  is parameterised with the variables that the
  expressions depend on. The sequence of
  expressions is replaced with a function call
Planned refactor steps (middle term)
 Tuple to record
 Specialisation of functions
 Generalisation of functions
 Fusion of functions
 Modification of data structures
Future work
 Testing in big projects
 Compare the two approaches
 Making a common release version

To top