Source Code Exploration with
Denys Poshyvanyk, Maksym Petrenko, Andrian Marcus,
Xinrong Xie, Dapeng Liu
Wayne State University
Presented by: Roli Shrivastava
Global Regular Expression Print (G/RE/P )
Existing Integrated Development Environments
(IDE) File Searches
Both are based on Regular Expression Matching
Limitations of GREP and IDEs
Supports only specific development or maintenance
Not in the mainstream of the software development
Limited interaction with potential users
To understand large & new parts of the Software
People search codes for:
– Concept location in source code
– Impact analysis
– Change propagation
– Comprehension of software in general
Hence to support them, we needed a fast and
accurate tools and techniques.
PROPOSAL OF PAPER
New approach to Source Code Exploration
Integration of Google Desktop Search + IBM’s
Eclipse Development Environment.
Known as Google Eclipse Search (GES)
Searching based on Information Retrieval (IR)
IR allows formulation of queries with multiple
More popular than regular expression matching
Online-Re-indexing of the software
Allows you to search software projects in a manner similar to
searching the internet or their own desktops.
Searching Within Projects / working set of files
Uses Natural Language Queries
GES has advantages of GDS + Eclipse’s Extensibility.
GES based on IR indexing technique
Idea is also integrated with MS Visual Studio
Uses GDS to index and search source code files and project files
Is efficient as GDS
Re-indexing as the search space changes
Problems with GDS as a Standalone :
LIMITATIONS OF GDS!!
GDS is not project specific search
– Searches files in the entire system
Needs an internet browser
– User has to switch between IDE and the browser
Solution is definitely GES
GDS + ECLIPSE !!!
On-the-Fly preprocessing and indexing of the
– maintains and updates current location changes
– Accurate results
Immediate response for queries
History of searches
Advanced Search Options
– Project specific search
Sorting of the results
features specific to IR-based searching
– multiple term queries
– natural language queries
– Boolean operators
– ranking of search results
scalability & high reliability of the proven search engine
– important for massive file
– repositories, such as large scale software systems
display of and access to the search results within Eclipse’s
– its native interfaces that provide direct links between the
search results and the actual
– source code in the editor.
To run GES, you will need:
Eclipse SDK 3.2 or higher;
Google Desktop Search (GDK) 2.0 or higher;
Java Run-Time Environment (JRE) 1.5 or higher.
GES DESIGN & IMPLEMENTATION
GES similar to File Search in Eclipse.
Type a Query into the GES dialogue Box.
Specify the Scope of the search
– selected resources
– enclosing projects
– working sets
After the query, the search is displayed in GES
search Results Tab.
Results can be explored by browsing in the editor.
GES SCREEN SHOT
PILOT CASE STUDY
Performed on Violet
Violet is a Cross Platform UML Editor written in
Has 65 classes + 448 methods + 9000 LOC
To request for a new feature
GOAL: “introduce a user-defined arrow type for the
QUERIES FOR PCS-I
Q2 : “arrow class diagram”
OOPS… Did not return any matches
Q3: “edge class diagrams”
11 files as search results
ANALYSIS OF RESULTS
ClassDiagramGraph had the relevant result.
To verify this finding:
‘draw’ and ‘getPath’ methods in ‘ArrowHead’ are
Related methods in ArrowHeadEditor file are also
GES vs. FILE SEARCH
Problem : “concept location task” in violet
Goal : “to locate the place in the source code which
specifies the width of the class diagrams”
File : “value saved in DEFAULT_WIDTH variable”
Q1: “default width”
“Bingo” in the first step itself…!!
FILE SEARCH BEHAVIOR
Q1: “default width”
“OOPSS !!! No results”
“yes …. Hmmm closer”
“yes… Much Closer”
FILE SEARCH can be made BETTER??
In this particular case …
“Default *Width” would have worked fine.
Gave same result as GES in the 1st attempt
To construct such expressions,
– programmer should have additional information
– Unusable to construct such complex
expressions all the time (this was a relatively
– What will happen if the expression was more
complex ?? !!!
FILE SEARCH vs. GES RESULTS
File Search had to be modified to get to the result
– Narrow down the result by performing the search within
GES gave results in the first query itself.
GES is faster than File Search.
GES investigates less LOCs.
GES returns the ranked list of results.
Developers learn relevant information faster than
STILL NOT SURE !!
Authors say “This study has a proof-of-concept
role, we do not generalize these conclusions”.
Need more detailed case study to extend the
OTHER CASE STUDIES
Needed a bigger project than “violet”
Queries were run on
– P4 2.8Ghz with 1GB of RAM
– GES plug-in
– File Search in Eclipse 3
Art of illusion : 3D modeling studio
– Written in JAVA
– Has 442 classes , 20 interfaces, 100838 LOC
Eclipse Version 3.1 + complete sources
– 20000 files
– 2 million LOC
10 queries were run on each system
Average response time needed for GES and File
COMPARING THE RESULTS
DERIVED RESULTS !!!
GES is more effective in terms of response time
GES scales up very well with the size of the search
GES uses GDS
GDS’s background indexing
Only when user’s computer is idle
User has to wait for the (re)-indexing of the file.
None of the GDS APIs handles this issue.
Q: Is this really an issue??
A: As this is 1-time step, it only affects the
first search on a software system
Integrating GDS into Eclipse
– Improves source code searching
– Produce easier to adopt approach
GES allows to perform searches in
– all the source code
– Associated documentation
Faster than the file search
Queries do not take into account the format of the
identifiers in the source code
JIRiSS – an Eclipse plug-in for Source Code Exploration
(Information Retrieval based Software Search for Java)
JIRiSS includes other advanced features
– automatically generated software vocabulary
– advanced query formulation options
– including spell-checking as well as fragment-based search.
Information Retrieval – A book by C. J. van RIJSBERGEN
DISCUSSIONS ‘n’ QUESTIONS??