Source Code Exploration with Google

Document Sample
Source Code Exploration with Google Powered By Docstoc
					Source Code Exploration with
          Google

Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus,
              Xinrong Xie, Dapeng Liu

             Wayne State University



        Presented by: Roli Shrivastava
                   HISTORY

  Global Regular Expression Print (G/RE/P )
  Existing Integrated Development Environments
  (IDE) File Searches
  Both are based on Regular Expression Matching

Limitations of GREP and IDEs
  Supports only specific development or maintenance
  task
  Not in the mainstream of the software development
  practice.
  Case sensitive
  Limited interaction with potential users
                  MOTIVATION

To understand large & new parts of the Software
systems.
People search codes for:
–   Concept location in source code
–   Impact analysis
–   Change propagation
–   Debugging
–   Comprehension of software in general
Hence to support them, we needed a fast and
accurate tools and techniques.
         PROPOSAL OF PAPER

New approach to Source Code Exploration

Integration of Google Desktop Search + IBM’s
Eclipse Development Environment.

Known as Google Eclipse Search (GES)
         EXISTING APPROACH

Searching based on Information Retrieval (IR)
Indexing technique
IR allows formulation of queries with multiple
words
More popular than regular expression matching

Problems:
Computational Efficiency
Online-Re-indexing of the software
                           GES
Allows you to search software projects in a manner similar to
searching the internet or their own desktops.

Searching Within Projects / working set of files

Uses Natural Language Queries

GES has advantages of GDS + Eclipse’s Extensibility.

GES based on IR indexing technique

Idea is also integrated with MS Visual Studio

Uses GDS to index and search source code files and project files

Is efficient as GDS

Re-indexing as the search space changes
Problems with GDS as a Standalone :
                ??
           LIMITATIONS OF GDS!!

  GDS is not project specific search
  – Searches files in the entire system

  Needs an internet browser

  Awkward !!
   – User has to switch between IDE and the browser


Solution is definitely GES
           GDS + ECLIPSE !!!
On-the-Fly preprocessing and indexing of the
context
Continual indexing
 – maintains and updates current location changes
 – Accurate results
Immediate response for queries
History of searches
Advanced Search Options
 – Project specific search
Sorting of the results
 – Relevance
 – Dates
                  ADVANTAGES
features specific to IR-based searching
 – multiple term queries
 – natural language queries
 – Boolean operators
 – ranking of search results
scalability & high reliability of the proven search engine
(i.e., GDS)
 – important for massive file
 – repositories, such as large scale software systems
display of and access to the search results within Eclipse’s
IDE
 – its native interfaces that provide direct links between the
    search results and the actual
 – source code in the editor.
       SYSTEM REQUIREMENT

To run GES, you will need:

Eclipse SDK 3.2 or higher;

Google Desktop Search (GDK) 2.0 or higher;

Java Run-Time Environment (JRE) 1.5 or higher.
GES DESIGN & IMPLEMENTATION

GES similar to File Search in Eclipse.
Type a Query into the GES dialogue Box.
Specify the Scope of the search
 – workspace
 – selected resources
 – enclosing projects
 – working sets
After the query, the search is displayed in GES
search Results Tab.
Results can be explored by browsing in the editor.
GES SCREEN SHOT
SCREEN SHOT
             PILOT CASE STUDY
  Performed on Violet
  (http://www.horstmann.com/violet/)
  Violet is a Cross Platform UML Editor written in
  JAVA
  Has 65 classes + 448 methods + 9000 LOC

Approach:
  To request for a new feature
  GOAL: “introduce a user-defined arrow type for the
  class diagram”.
            QUERIES FOR PCS-I

  Q2 : “arrow class diagram”

OOPS… Did not return any matches

  Q3: “edge class diagrams”

Worked
                    RESULTS
11 files as search results
–   UseCaseDiagramGraph
–   StateDiagramGraph
–    SequenceDiagramGraph
–   StateTransitionEdge
–   ObjectDiagramGraph
–    NoteNode
–   ObjectNode
–    FieldNode
–   ImplicitParameterNode
–   ClassDiagramGraph
–   CallNode.
           ANALYSIS OF RESULTS

 ClassDiagramGraph had the relevant result.

To verify this finding:

 ‘draw’ and ‘getPath’ methods in ‘ArrowHead’ are
 modified.
 Related methods in ArrowHeadEditor file are also
 modified successfully.
            GES vs. FILE SEARCH

Problem : “concept location task” in violet

Goal : “to locate the place in the source code which
  specifies the width of the class diagrams”

File : “value saved in DEFAULT_WIDTH variable”
                 GES BEHAVIOR

  Q1: “default width”

“Bingo” in the first step itself…!!
         FILE SEARCH BEHAVIOR

Q1: “default width”
            “OOPSS !!! No results”

Q2:”default”
               “yes …. Hmmm closer”

Q3: “width”
               “yes… Much Closer”
FILE SEARCH can be made BETTER??

 In this particular case …
 “Default *Width” would have worked fine.
 Gave same result as GES in the 1st attempt

Drawback:
 To construct such expressions,
  – programmer should have additional information
    about identifiers
  – Unusable to construct such complex
    expressions all the time (this was a relatively
    simpler expression)
  – What will happen if the expression was more
    complex ?? !!!
  FILE SEARCH vs. GES RESULTS

File Search had to be modified to get to the result
– Narrow down the result by performing the search within
  the query
GES gave results in the first query itself.
GES is faster than File Search.
GES investigates less LOCs.
GES returns the ranked list of results.
Developers learn relevant information faster than
File Search.
           STILL NOT SURE !!

Authors say “This study has a proof-of-concept
role, we do not generalize these conclusions”.

Need more detailed case study to extend the
results.
        OTHER CASE STUDIES
Needed a bigger project than “violet”
Queries were run on
 – P4 2.8Ghz with 1GB of RAM
 – GES plug-in
 – File Search in Eclipse 3
Art of illusion : 3D modeling studio
 – Written in JAVA
 – Has 442 classes , 20 interfaces, 100838 LOC

Eclipse Version 3.1 + complete sources
 – 20000 files
 – 2 million LOC
            METHODOLOGY

10 queries were run on each system

Average response time needed for GES and File
Search
COMPARING THE RESULTS
         DERIVED RESULTS !!!

GES is more effective in terms of response time

GES scales up very well with the size of the search
space
                   LIMITATIONS
GES uses GDS

GDS’s background indexing

Only when user’s computer is idle

User has to wait for the (re)-indexing of the file.

None of the GDS APIs handles this issue.
        Q: Is this really an issue??

A: As this is 1-time step, it only affects the
     first search on a software system
               CONCLUSION
Integrating GDS into Eclipse
 – Improves source code searching
 – Produce easier to adopt approach

GES allows to perform searches in
– all the source code
– Associated documentation

Faster than the file search

Queries do not take into account the format of the
identifiers in the source code
               RELATED WORKS
JIRiSS – an Eclipse plug-in for Source Code Exploration
(Information Retrieval based Software Search for Java)

http://mercury.cs.wayne.edu/~vip/publications/Poshyvanyk.
ICPC.2006.JIRiSS.pdf

JIRiSS includes other advanced features
 – automatically generated software vocabulary
 – advanced query formulation options
 – including spell-checking as well as fragment-based search.

Information Retrieval – A book by C. J. van RIJSBERGEN

http://www.dcs.gla.ac.uk/Keith/Preface.html
DISCUSSIONS ‘n’ QUESTIONS??

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:9/17/2012
language:Unknown
pages:33