COBOL-Tool
Document Sample


12/19/2011Confidential Page 1 12/19/2011
KCS COBOL Tool: An Overview and Current Status
Introduction
The KCS COBOL Tool (sometimes called the COBOL Gopher) is based on the Knowledge-
Centric Software (KCS) technology – a framework for developing tools for software evolution
and maintenance. An overview of the KCS technology and its application for developing COBOL
tools is given in [1]. This document is written to give an overview of the internals of the KCS
COBOL Tool and it current status. This document will help the readers in understanding the evo-
lution path for the tool. The COBOL tool has three major categories of capabilities: analysis (e.g.
data modeling), knowledge extraction (e.g. extracting business rules), and restructuring (e.g.
eliminating GO TO, dead code etc.). This report focuses only on the analysis capabilities.
Internal Structure
The Figure 1 shows the organization of the key components of the tool. The eXtensible Common
Intermediate Language (XCIL) format in KCS technology enables creation of an integrated set of
tools for different programming languages. Currently, C, C++, COBOL, FORTRAN, and Java
are supported. Different types of graph objects are produced depending on the type of the analy-
sis. The Paragraph Flow Diagram (PFD) for viewing intra-program structure, the Component
View Diagram (CVD) for viewing inter-program interactions, and the Variable Trace Diagram
(VTD) for a pictorial view of the trace are the graph objects currently available. The Database
Generator can support any SQL-based database. The parser and some aspects of the analyzer (e.g.
the analysis of different types of MOVE, RENAME, and REDEFINE etc.) are language specific
the others are standard components used across KCS tools for different languages.
SOURCE COBOL XCIL LOADER ANALYZER
CODE PARSER FORMAT
GRAPH
OBJECT
DATABASE VISUALIZER REPORT GEN-
GENERATOR ERATOR
Figure 1: Internal Structure of the COBOL Tool
Current Status
This is a brief report of the current status of components and their integration. This gives an idea
of how the COBOL tool will evolve to the next stage. The evolution path will alter if new press-
ing needs are identified.
A. Parser
Currently we are using our own parser for COBOL. We plan to change to a commercial grade
parser and build an XCIL converter for it. We use Edison Design Group (EDG) front-ends for
1
12/19/2011Confidential Page 2 12/19/2011
other languages. COBOL is not supported by EDG. Currently we are examining the different al-
ternatives for parsers. An open COBOL parser appears to be one viable alternative at this point.
The parsing capabilities of the current parser are summarized in Table 1.Based on the hand trac-
ing and testing we have done, we believe that the current parsing capabilities are adequate for
performing the necessary analysis for data modeling.
Supported Syntax IFCOND, ELSECOND, ENDIF, GOTO, PERFORM,
PERFORMTHRU, STARTPROC, CALL
EXEC CICS, EXEC CICS (READ | WRITE | REWRITE |SEND | RE-
CEIVE | XCTL | LINK), ENDEXEC
READ, WRITE, REWRITE, FILE, DATASET, MAP, MAPSET, IN-
TO, FROM
COMMAREA, PROGRAM, PROGRAM-ID,
MOVE, SMOVE, NMOVE, COPY
EVALUATE, WHEN
RENAME, REDEFINE, COPY REPLACE
Syntax not supported STRING, UNSTRING, DELIMITED, SET, COMPUTE, MOVE
LENGTH, ERR, EXEC CICS UNLOCK
Table1: A Summary of Current Capabilities of the Parser
B. Analyzer
All the COBOL specific algorithms are developed but not all of them are implemented. A sum-
mary of the pending implementations is provided in Table 2.
Loop Analysis In transaction-processing programs many loops iterate over transactions and
for data modeling an analysis of loop iteration is not required. We believe this
to be the case for the given programs. We would like confirmation from do-
main experts.
MOVE analysis The MOVE to unstructured buffers is currently analyzed in a conservative
way. It is likely to give false positive. We believe that a more accurate analy-
sis is possible if domain-knowledge is used. We will implement more accu-
rate analysis after discussions with the domain experts. We did a test to check
if a large number of false positives are produced when the given program is
analyzed. We found that no more false positives are produced beyond what is
generated by the conservative IF analysis, discussed next.
Domain-specific In a conservative static analysis, one has to consider the possibility that all
IF analysis possible execution paths generated by IF conditions may be taken and one has
to take the union of the results to be accurate. This type of conservative analy-
sis can produce false positives. In the current program this is a dominant fac-
tor and completely overshadows the conservative MOVE analysis. We be-
lieve that instead of conservative analysis, a precise analysis can be performed
if the meanings of the IF flag are clarified by domain experts.
Table2: A Summary of Pending Enhancements for the Analyzer
2
12/19/2011Confidential Page 3 12/19/2011
We support batch processing so that hundreds of variables can be analyzed in a matter of minutes.
This can save significant time compared to manual processing. The systematic automated tracing
can be verified intrinsically by checking the algorithm used for the analysis. This is an important
advantage because it is a formal check as opposed to heuristics or random sampling, both of
which are not completely reliable. It is possible that there are errors in coding the algorithm. We
sampled a few variables and did tracing by hand to check the results. Manual tracing is tedious
and prone to errors. Another way to verify the results will be to use domain knowledge about the
application.
An example is given at the end to illustrate the difference between conservative IF analysis and a
more precise form of analysis. The example is taken from the actual code and it shows how some
false positives may be generated in absence of domain-specific knowledge. One important aspect
of the KCS technology is the ability to support for customizing analysis using domain knowledge.
The example illustrates an opportunity for customization.
C. Visualization
The visualization components are tested separately and used in many of our other tools. We have
encountered some integration problems and the current visualization in COBOL tool needs fur-
ther improvement. If visualization becomes a priority, this can be done fairly quickly.
D. Report Generation
This capability is flexible and easily extensible. The available reporting is based on the current
needs identified by the domain experts.
E. Database Generation
Since our output is XML-based it is very easy for us to store the analysis results in any standard
SQL-based database. We have tested the Microsoft Access and a Linux-based public domain da-
tabase as two possibilities. We have designed a database schema that we believe will be useful for
the data modeling exercise. This database facility can be easily customized to suit the specific
needs in a given analysis or knowledge extraction project.
REFERENCE
1. An Overview of Knowledge-Centric Software Technology with Applications to Legacy
COBOL Code.
3
12/19/2011Confidential Page 4 12/19/2011
Example: Possibility of Substituting More Precise IF Analysis Instead of Conservative
Analysis
047370 A000-MainLine.
.....
.....
047530 MOVE 'LNFILE' TO DATASET-NM.
047540 String '01ML' KEY-LPO KEY-LOAN ' 01'
047550 delimited size into NMI-RCD-KEY.
047560 MOVE KEY-LPO TO PA-LPO-NBR.
047570 MOVE KEY-LOAN TO PA-NEW-LOAN-NBR.
047580 Exec CICS Read
047590 DATASET (DATASET-NM)
047600 RIDFLD (NMI-RCD-KEY)
047610 LENGTH (RECORD-LEN)
047620 INTO (NMI-RCD-AREA)
047630 KEYLENGTH (22)
047640 RESP (CICS-RESP)
047650 END-Exec.
.....
.....
047770*>>PROCESS-FILES
047780 IF CB-DATA-FLAG (001) = SPACE
047790 GO TO R001-END.
047800 MOVE 'BORSEL ' TO DATASET-NM.
047810 MOVE 804 TO RECORD-LEN.
.....
.....
047850 MOVE NMI-RCD-AREA TO BORS1001-DEF. 1
047860 PERFORM P005-READ-RECORD THRU P005-EXIT.
047870 MOVE NMI-RCD-AREA TO BORS1001-INPT.
.....
.....
051930 R001-END.
051940
051950 IF CB-DATA-FLAG (002) = SPACE
051960 GO TO R002-END.
051970 MOVE 'BORSEL ' TO DATASET-NM.
.....
.....
052020 MOVE NMI-RCD-AREA TO BORS1101-DEF.
052030 PERFORM P005-READ-RECORD THRU P005-EXIT. 2
052040 MOVE NMI-RCD-AREA TO BORS1101-INPT.
NOTE: The conservative analysis will assume that either of the two marked possibilities can oc-
cur and it will save LNFILE-NMI-RCD-AREA and BORSEL-NMI-RCD-AREA to BORS1101-
DEF. Most likely, during runtime only one of the CB-DATA-FLAG is not empty and thus only
one of the two possibilities occur and hence only LNFILE-NMI-RCD-AREA should be saved to
BORS1101-DEF.
4
12/19/2011Confidential Page 5 12/19/2011
5
Get documents about "