More Info
                                            Susan A. Mengel, Joseph Ulans
                                                Texas Tech University
                                                  Computer Science
                                                 306 EC, Box 43104
                                              Lubbock, TX 79409-3104
Abstract - It is difficult for teachers and graders to give an   automated dynamic analysis of the programs and correlate
in-depth evaluation of student programs to the point of          the static analysis to programs which work properly and
checking every line of code due to the amount of time            those which do not work well. In this way, the instructor
checking would take. The difficulty worsens when a typical       could find out what coding habits directly contribute to poor
introductory programming course may have over 100                program submissions.
students. Solutions to this difficulty may involve only               Static analysis, which is the focus of this paper,
checking to see if the program executes correctly (dynamic       involves looking at the code on paper. Dynamic analysis,
analysis), glancing over the program to see if appropriate       which is not addressed in this paper, encompasses the
documentation is present (static analysis), and glancing         checking of the program during its execution for correct
over the code for any problems (static analysis).                operation and ability to handle incorrect input.
     Automated solutions are difficult to construct as can be         Commercial tools that allow static analysis to be
seen by the fact that a few exist in homegrown versions          performed are several and very expensive. Fortunately,
(which typically are not available for general use and few       Verilog has a program whereby universities may receive a
examples are given in the literature) and commercial             few free licenses of LOGISCOPE which can analyze a large
solutions are in the thousands of dollars. One commercial        number of commercially available programming languages.
solution, however, is Verilog LOGISCOPE which offers a           LOGISCOPE allows a set of user-specified metrics to be
limited number of licenses free to educators. LOGISCOPE          collected on a program and shows the metric values via a
is a static analysis checker capable of taking hundreds of       Kiveat Diagram or tabular format. It also shows the control
individual measurements of a program, such as lines of           flow diagram for each module in the program.
code, McCabe's cyclomatic complexity, and number of                   The use of LOGISCOPE in analyzing the C++
operators. It also shows the control flow graph of a             programs for a freshman-level programming course, CS 1462
program which is a depiction of the statements, if               Fundamentals of Computer Science I, in the Computer
structures, and looping structures in a program.                 Science Department at Texas Tech University is shown in
LOGISCOPE enables the complexity and quality of a                this paper. Three different programming assignments were
program to be analyzed yielding valuable feedback to both        analyzed and a comparison in student performance for each
students and educators. It allows visualization of the           group of programs is made. Example control flow diagrams
measurements taken through the control flow graphs and           are given to show the diversity in how students performed
Kiveat diagrams.                                                 on the programming assignments. The results are quite
     The operation of LOGISCOPE is shown by using                varied and show the wide range of measurement values
typical student programs taken from the introductory             among students.
computing course at Texas Tech University. Then the results
of analyzing several programs from the same class are                               SIMILAR WORK
given to show the diversity of results. Finally, how
LOGISCOPE can be used in education to help students              While there is a significant body of literature on software
improve their programming and help instructors evaluate          metrics, there is much less on static metrics and even less
programs better is considered.                                   still on using metrics to evaluate programs. Those that
                                                                 discuss the evaluation of programs generally do so through
                  INTRODUCTION                                   the use of a custom designed software package that is not
                                                                 generally available.      Although the survey of literature
When students are not programming as well as they should,        conducted did not turn up any discussion of the evaluation of
it may be difficult to get to the root cause of the problem.     student programs through the use of static metrics generated
Certainly looking at their code can help unless one is           by a generally available software package, there was literature
grading over 100 programs in which case in-depth analysis        that touched on aspects of that topic as given below.
becomes prohibitive for most instructors. In order to get a           Dr. Ronald Leach [6] discussed the use of software
more detailed look at the way students are programming, the      metrics to evaluate student programs written in C++. The
instructor could perform an automated static analysis of a set   metrics were originally developed for determining the quality
of programs and flag programs with anomalous or out of           of the program, but in this article they were used instead to
range values.     The instructor could also perform an           detect possible plagiarism. There were three metrics used in
the analysis. The first was the Halstead effort metric, which           Flow graphs along with metrics can show the level of
looks at the number of operators and operands and their            complexity of a program and can be evaluated by measuring
usage. It was used because changing the names of the               “knots” [7, 8]. Knots are those cases in which a line in the
variables in a program has no effect on the metric value. The      control flow structure crosses another line. For example, a
second was the McCabe cyclomatic complexity value. This            loop, which is nested inside another loop and jumps outside
metric gives a measure of the number of logical predicates in      of the outer loop, crosses the line of the outer loop, creating
a program. It was used since changes in the format of the          a knot which is an undesirable programming habit. A
control structure do not result in changes in its value. The       nested loop that remains within the outer loop does not
third metric was a count of the number of instances of             create a knot. Knots can be reduced usually by simply
coupling in a program. It is invariant to changing the order       reordering the sequence of statements.
of files, rearranging the order of files, or moving functions to
different files. By analyzing the metrics of different programs                   VERILOG LOGISCOPE
submitted for the same assignment, he showed that incidents
of possible plagiarism could be readily identified.                The version of Verilog LOGISCOPE used in this study
     A more recent article [5] evaluated student programs by       works with Microsoft Visual Studio 97 to analyze C and
how well they met absolute, ideal criteria and how much            C++ programs. It integrates itself into the Visual Studio
they differed from the performance of a model program. The         environment and generates the metric values and control flow
article discussed in detail metrics to measure correctness,        diagrams as a program is compiled in the Visual Studio
efficiency, data coverage, complexity, and style metrics.          environment. Once the program is compiled, the user may
Correctness was determined by measuring how well the               bring up the Winviewer component of LOGISCOPE to view
output generated by the student program matched the                the metric values and control flow diagrams graphically on
expected structure of that output. Efficiency was computed         the screen. The control flow diagrams are shown for each
by measuring the execution time and the total number of            module in the program, not the entire program. Further, if a
statements executed. Data coverage was calculated using the        knot is present in a control flow diagram, LOGISCOPE
test effectiveness ratio, which is the number of statements        shows how the program can be restructured to eliminate the
executed at least one time divided by the total number of          knot. Other diagrams are also available to view, such as the
executable statements. McCabe's cyclomatic complexity              call graph and code diagrams.
value was used to assess the complexity. Finally, style was             The user may specify which metrics to view on the
determined as a weighted sum of module length, identifier          screen by modifying the appropriate definition file (.ref
length, percentage of commented lines, percentage of               extension) which is a text file which can be easily edited.
indentation, percentage of blank lines, characters per line,       The user may also combine metric values into equations via
spaces per line, number of reserved words, and number of           the definition file and view the results in Winviewer.
     A technique for assessing style, which can be subjective,
was articulated by Berry and Meekings [2]. They developed                  ANALYSIS OF THE PROGRAMS
a procedure whereby each program was given a score for:
module length, identifier length, percentage of lines              All of the programs submitted for grading were collected
containing comments, extent of indentation, percentage of          from the Fall 1997 CS 1462 course. CS 1462 is the first
blank lines, the average length of a line of code, the number      majors course and helps the student to learn sound
of embedded spaces per line, the percentage of all user            programming skills through the vehicle of C++ in Microsoft
identifiers that are defined constants, the number of reserved     Visual Studio 97. The prerequisite for the course is
words, standard functions used, the number of “include”            programming knowledge through looping in any
files, and the number of goto statements. Each metric was          programming language (although some students take the
assigned a weight indicating its contribution to the overall       course without programming proficiency through looping).
score. Minimum and maximum values for each metric were             In the course, the student goes to a weekly lab reinforcing
determined and values outside the maximum-minimum                  lecture material and is given four progressively difficult
range did not contribute to the overall style score. Between       programming assignments.
the maximum and minimum values, two values,                             Programs were randomly selected from the first three
representing the boundaries of the ideal range were also           programming assignments.         The fourth programming
computed. Any values within that ideal range received the          assignment submissions were omitted since the assignment
maximum possible number of points. Any values between              consisted of filling in code to make a partially completed
the maximum and minimum, but outside the ideal range,              program work. The other three assignments had to be
received points based on how far outside the ideal range they      completed by the students from scratch.           The first
were. It has been suggested that the Berry-Meekings                assignment's objectives were to give the student practice in
approach needs additional study [3], but HUNG et al [4]            working with a new programming environment (Visual
arrive at the same methodology, concluding that the                Studio) and to work with a simple program having
technique is quite effective in discriminating between good        assignment, I/O, and if-then-else statements). For this
and poor programmers.                                              assignment, the students had to determine the minimum,
                                                                   maximum, and average of a set of four numbers. The second
assignment's objectives were to use file I/O, looping, reduced? The answer appears to lie in the fact that the
modularization, and the math library. The students read in second programs also showed an increase in the use of
one neatly formatted polynomial and one poorly formatted functions. This is reflected in an increase in passes by
polynomial. They output both polynomials neatly after reference, passes by value, and the number of call graph
computing values for the polynomial equations. The third nodes. There is also an increase in comment frequency and
assignment's objectives were to implement a multi-way the number of lines of comments. While the trend in
branching construct, a count-controlled loop, an enumeration warnings is a potential source of concern, the metrics seem
type, and modularization. The students implemented an to indicate that the code was probably better documented,
algebraic system with pre-defined numeric values and made more use of functions and was less complex, all very
operators using a finite state machine.                        positive trends. In general, the control flow paths were
     The objectives of the initial study were to look at how simpler.
the same students performed over the course semester and to         In looking at programs two and three, the students’
choose from five to ten students randomly for analysis to get performance on the third program was much worse on
an idea of the range of metric values and of the structure of complexity. The mean values for the third program were
the control flow diagrams. This led to interesting problems higher for the number of levels, cyclomatic complexity, the
in the data collection with randomly choosing programs number of distinct operators, and the number of statements.
from assignment three submissions and discovering missing The number of distinct operands was lower. Add to these
or non-compiling submissions in the previous two results the dramatic increases in passes by reference and
assignments. Seven students were chosen eventually that cyclomatic complexity. What has apparently happened is
had compiling programs that could be analyzed with that the programmers have learned how to create functions,
LOGISCOPE (more have since been discovered, but these but are not disciplined enough to make them relatively small
fulfilled the objectives).                                     and simple. This is indicated by the fact that the mean
     Table I shows the summarized metric values for the number of call graph nodes for program 3 is essentially the
three programs submitted by the seven students. The same as for that of program 2 while the number or operators
metrics collected include errors in the program, warnings is up significantly. Another troubling area is that of
generated by the Visual Studio C++ compiler, the number of documentation. The results for program 3 show a reduction
distinct operands (n2), the number of distinct operators (n1), in the mean number and percentage of comment lines. This
number of internal comments (LCOM), number of levels could well be due to the fact that the programmers were
plus one in the flow graph (LEVL), parameters passed by overwhelmed with the complexity of the task and did not
address (PARAadd), parameters passed by value (PARAval), have time to “go back and add comments.”
comment frequency (COMF), McCabe’s cyclomatic                       Example control flow graphs are shown for programs
complexity (VG), lines of code (STMT), number of call one through three in Figures 1 through 6. Flow graphs
graph nodes (GANode), and the number of knots (Knots). show structure in the program by showing if-statement
These metrics were chosen to get an idea of how efficiently bodies and looping bodies as a level above the current level.
students were using the programming language (n1, n2, For example, a program that is a sequence of statements
errors, warnings), how well they were implementing would be represented as a flat line. Programs with if and
concepts such as “call by address” and documentation looping structures would have several levels; i.e., the
(LCOM, PARAadd, PARAval, COMF), how well they program starts out at the bottom and goes up a level when
were modularizing the code (GANode), and how complex an if or looping structure is encountered. The taller the flow
their programs were (VG, STMT, Knots). The minimum, graph, the more nested structures the program has. Two
maximum, mean, median, and standard deviation values are flow graphs are shown for each program to demonstrate the
given for the metrics.                                         extremes in complexity of the student programs.
     In looking at programs one and two, on average, the            The flow graph in Figure 1 is much less complicated
performance on the second programs was much better in than the flow graph in Figure 2. Figure 2’s flow graph has a
terms of complexity. The mean values for the second cyclomatic complexity of 11 versus Figure 1’s 1.4.
programs were lower for the number of levels, cyclomatic Similarly, the levels are 5 and 1.4. The explanation is in
complexity, the number of distinct operators, and the the fact that Figure 2’s program has no parameters passed by
number of statements. The number of distinct operands was either value or reference and has an average of 31 statements
essentially the same. Initially, this may seem contradictory. per function. Figure 1’s program, on the other hand, has
How can the challenge be greater and yet the complexity be parameters passed by both value and reference and the
                                                           TABLE I

              Errors Warnings    n2    n1     LCOM LEVL PARAadd PARAval COMF              VG    STMT GA Node Knots
   min         0.00    0.00      7.20 7.20    0.00   1.40      0.00      0.00     0.00    1.40 4.60       8.00    0.00
   max         0.00    0.00     22.00 22.00   7.00   5.00      2.00      2.00     0.58   11.00 31.00     11.00    0.00
   mean        0.00     0.00    16.60 17.60     2.96    3.49    1.06      0.36    0.20   6.20    20.44    10.00   0.00
  median       0.00     0.00    19.00 19.00     3.20    4.00    1.00      0.00    0.16   7.00    22.00    10.00   0.00
  std dev      0.00     0.00     4.81 4.89      2.68    1.30    0.55      0.69    0.18   3.14     8.88     1.07   0.00
    min        0.00     0.00    11.00 14.5       0.00   2.00    2.33      0.00    0.00   2.33     9.50    11.00   0.00
    max        0.00     3.00    24.00 29.00     14.33   5.00    5.00      2.00    0.96   5.00    20.00    19.00   0.00
   mean        0.00     1.29    15.29 17.83      3.76   2.86    3.12      0.62    0.35   3.12    13.05    14.71   0.00
  median       0.00     2.00    15.00 16.50      0.50   2.50    2.50      0.50    0.27   2.50    13.00    14.00   0.00
  std dev      0.00     1.16     4.07 4.67       5.07   0.91    0.93      0.72    0.28   0.93     3.31     2.25   0.00
    min        0.00     0.00     15.4   12.00   0.00    3.00    1.00      0.00    0.02    5.00   14.00    10.00   0.00
    max        3.00     3.00     30.5   27.00   6.00    6.25    26.00     1.40    0.21   80.00   182.00   18.00   0.00
   mean        0.43     0.43    21.74   15.94   3.14    4.18    13.24     0.67    0.11   24.47   60.27    15.00   0.00
  median       0.00     0.00    19.70   13.50   3.00    4.00    10.40     0.75    0.09   19.40   44.60    16.00   0.00
  std dev      1.05     1.05     6.45    5.22   1.77    1.37    9.49      0.54    0.07   24.09   53.83     2.62   0.00

average number of statements per function is 4.6.
     The flow graph in Figure 3 has a cyclomatic complexity
of 4 and a number of levels’ value of 2.5 versus Figure 4's
values of 2.33 and 2. More than just greater complexity,
Figure 3’s flow graph represents a much more cluttered and
disorderly structure.
     The flow graph in Figure 6 has a cyclomatic complexity
of 5 while Figure 5’s has a cyclomatic complexity of 80.
Figure 5’s program had no functions (other than main), had
an average of 182 statements per function, 32 distinct
operands per function, 27 distinct operators per function. On
the other hand, Figure 6’s program, with functions, had an
average of 15 statements per function and 15.4 distinct
operands and 13.4 distinct operators per function.
                                                                Figure 2. Program One Flow Graph B.

Figure 1. Program One Flow Graph A.
                                                                Figure 3. Program Two Flow Graph A.
                                                                particularly if they have not mastered the concept of

                                                                                   FUTURE WORK
                                                                       Introducing LOGISCOPE to the Students

                                                                Gathering metrics and flow graphs on programs has been a
                                                                historically sensitive subject in industry and academia.
                                                                Most individuals do not want their inefficiencies so glaringly
                                                                presented before supervisors or peers in tabular and graphical
                                                                format. Most individuals do not want to be characterized by
                                                                a set of numbers and a graph. Another embarrassing fact
Figure 4. Program Two Program Graph B.                          about collecting metrics and flow graphs is finding out that
                                                                grading accuracy really has not been very good.
                                                                     In order to ease the way for using static analysis, it is
                                                                planned in the future to make LOGISCOPE available to the
                                                                students with example programs so that they can see the
                                                                difference for themselves between poorly and well written
                                                                programs. They will run LOGISCOPE on their own
                                                                programs. Later as the students become used to the idea of
                                                                getting to know their programming style through static
                                                                analysis, they will be graded on their ability to stay within
                                                                an acceptable range of metrics. Hopefully, the students will
                                                                get used to the tool and view it as helpful rather than as a
                                                                punitive measure taken in grading.


                                                                The small set of data shown above illustrates the ability of
                                                                LOGISCOPE to make a distinction among student programs
Figure 5. Program Three Flow Graph A.                           written within an acceptable range of limits and those
                                                                showing little mastery of the concept of modularization. To
                                                                continue with the use of LOGISCOPE requires the analysis
                                                                of more of the student program data to be able to set an
                                                                acceptable range of limits on metrics for CS 1462.
                                                                Instructors at other universities would have to go through
                                                                the same procedure. The process of analyzing the acceptable
                                                                range would be continuous as students would over a period
                                                                of years get better at programming through progressively
                                                                better example programs being given and better ways to do
                                                                LOGISCOPE analysis formulated.
                                                                     More metrics need to be collected to determine their
                                                                usefulness in determining if a program is written well. Too
                                                                many metrics, however, can mask influence of metrics on a
                                                                desired outcome [1] plus students would become confused
Figure 6. Program Three Flow Graph B.                           keeping track of so many metrics. Too few metrics and
                                                                students would optimize their programs for only those
     The results were surprising in that the measurements for   metrics. The metrics also need to be correlated to a ranking
programs of the same assignment could be spread so far          of the programs in terms of an expert’s opinion of how well
apart. Some students use very few operands and operators        the program’s were written.
(probably not completing the programming assignment) and             Two studies are currently being performed to correlate
others seem to use too many. Students need to see these         static analysis to well-written programs and to code coverage
results so that they can start considering how to improve       on tests run during dynamic analysis. These studies are
their own programming style. Instructors need to have           hoped to show that static analysis is indicative of well
anomalous values flagged so students having trouble can be      written student code and how well test cases will execute
helped. Left unattended, students will form bad habits and      and cover all of the student’s code. These studies will be
have difficulty later when working on larger programs           published in later papers.
      Later, it is hoped to use LOGISCOPE in programming
classes coming next in the course sequence after CS 1462.
In using LOGISCOPE throughout a programming sequence,
it is hoped that students will have increased competency in
programming so that more advanced topics may be
introduced, such as safety and survivability of software.

All terms known to be trademarks or registered trademarks
have been capitalized.
     The authors would like to thank the CS 1462 faculty,
Tom English, Susan Mengel, and Nancy Van Cleave, and
teaching assistants, Chandra Pallemoni, Seelam Reddy
Muralidhar, Kirk Watson, and Pramob Kumar Yenmanpra,
for their help and cooperation in gathering the programs.
     The authors would like to thank Verilog for the use of
LOGISCOPE at Texas Tech University.

[1] V.R. Basili and D. Weiss, "A Methodology For
    Collecting Valid Software Engineering Data." IEEE
    Transactions on Software Engineering, Volume SE-10,
    Issue 6, 1984, pp. 728 – 738.

[2] R.E. Berry and B.A.E. Meekings. "A Style Analysis
    of C Programs." Communications of the ACM, Volume
    28, Issue 1, January 1985, pp. 80 – 88.

[3] W. Harrison and C. Cook. "A Note on the Berry-
    Meekings Style Metric." Communications of the ACM,
    Volume 29, Issue 2, February 1986, pp. 123-125.

[4] S. Hung, L. Kwok, and R. Chan. "Automatic Program
    Assessment." Computers and Education, Volume 20,
    Issue 2, 1993, pp. 183-190.

[5] D. Jackson. "A Software System For Grading Student
    Computer Programs." Computers and Education,
    Volume 27, Issue 3/4, 1996, pp. 171 - 180.

[6] R.J. Leach. "Using Metrics to Evaluate Student
    Programs." SIGCSE Bulletin, Volume 27, Issue 2,
    June 1995, pp. 41-43.

[7] M.R. Woodward, M.A. Hennell, and D. Hedley. "A
    Measure Of Control Flow Complexity in Program
    Text." IEEE Transactions on Software Engineering,
    Volume SE-5, Issue 1, November 1979, pp. 45 – 50.

[8] M.R. Woodward, D. Hedley, and M.A. Hennell.
    "Experience With Path Analysis and Testing of
    Programs."      IEEE Transactions on Software
    Engineering , Volume SE-6, Issue 3, May 1980, pp.
    278 – 286.

To top