Systems Programming Lecture Notes by Bakudan

VIEWS: 4,375 PAGES: 100

More Info
									          SCHOOL OF COMPUTING AND INFORMATICS
                      TECHNOLOGY

COURSE:               CSC 2209 SYSTEMS PROGRAMING
PRE-REQUISITE:               CSC 1100 Computer Literacy

Lecture Hours:        Day Class             Thursdays      3.00 p.m. – 5.00 p.m.
                                            Fridays        9.00 a.m. - 10.00 a.m.
                                            Venue:         LLT1A

                      Evening Class         Thursdays      5.00 p.m. - 6.00 p.m.
                                            Fridays        6.00 p.m. – 8.00 p.m.
                                            Venue:                LLT2A

Course Objectives.
The course will introduce the students the designs of the different Systems Software and will
also consider the implementations of such software on a variety of real machines.

Grading Policy:
Grades will be based on your performance on two in-class tests, a
comprehensive final examination and Course work.
       40% Course work which includes, tests and Course work assignments.
       60% Final Examination.

N.B.    Late homework decreases your overall score by 20% per day.


Recommended Text Books
1.     SYSTEMS PROGRAMMING AND OPERATING SYSTEMS by D
       M Dhamdhere.
2.     SYSTEMS SOFTWARE; An Introduction to Systems Programming
       by Leland L Beck
3.     SYSTEMS PROGRAMMING by John J. Donovan
4.     MICROCOMPUTERS FOR ENGINEERS AND SCIENTISTS by
       Glenn A. Gibson and Yu-cheng Liu
                                          2


COURSE OUTLINE:
1.  Review Micro Computer Architecture
    1.1   CPU
    1.2   Memory
    1.3   The Intel 8085/8088 CPU
    1.4   Machine Language Instructions
    1.5   Instruction Formats and Addressing Modes

2.    The Simplified Instructional Computer
      2.1    The SIC Machine Structure
      2.2    The SIC / XE Machine Structure

3.    Assemblers
      3.1  Assembler Tables and Logic
      3.2  Instruction Formats and addressing Modes.
      3.3  Program Relocation
      3.4  Literals
      3.5  Program Blocks and Control sections

4.    Loaders and Linkers
      4.1   Design of an Absolute Loader
      4.2   Relocation, Program Linking
      4.3   Tables for a Linking Loader
      4.4   Loader Options and Overlay Programs.

5.    Compilers
       5.1 Basic Compiler Functions
           5.1.1 Grammers
           5.1.2 Lexical Analysis
           5.1.3 Syntactic Analysis
           5.1.4 Code Generation
       5.2 Machine Dependent Compiler Features
           Code Optimisation
       5.3 Machine Independent Code Optimization
           5.3.1 Storage Allocation
           5.3.2 Structured Variables
       5.4 Compiler Design Options


6.    Macroprocessors
      7.1  Macro Definitions
      7.2  Macro Processors Tables and Logic
      7.3  Macro expansions
                                    3


                  SYSTEMS PROGRAMMING
A computer system is sometimes subdivided into two functional entities:
Hardware and Software.
The hardware of a computer consists of all the electronic components
and the electro mechanical devices that comprise the physical entity of
the device.
Software consists of the instructions and data that the computer
manipulates to perform various data processing tasks.
Three types of software exist:
1.    Systems Software
2.    Development Software
3.    Application Software
The system software of a computer consists of a collection of programs
whose purpose is to make more effective use of the computer. They
control the operation of the machine and carry out the most basic
functions the computer performs. They control the way in which the
computer receives input, produces output, manages and stores data,
carries out or executes instructions of other programs etc.
Examples of systems programs include language processors (compilers
and assemblers that accept people like languages and translate them
into machine language), loaders (they prepare machine language
programs for execution), macro processors (allow programmers to use
abbreviations), operating systems and file systems (allow flexible storing
and retrieval of information).
Application programs are written by the user for the purpose of solving
particular problems using a computer as a tool e.g. application
packages.
Development Software is used to create, update and maintain other
programs e.g. programming languages.
                                                    4


            Systems software supports the operation and use of the computer itself
            rather than any particular application. They are therefore related to the
            structure of the machine on which they are to run.
            There are however some aspects of systems software that do not
            directly depend upon the type of the computing system being supported
            e.g. the general design and logic of an assembler is basically the same
            on most computers.



                          MICRO COMPUTER ARCHITECTURE


                                                    Interface   Memory Module
Timing Circuitry


Microprocessor         Bus Control
(CPU)                  Logic
                                                    Interface   Memory Module




                       System Bus
                                                    Interface     Mass Storage
                                                                  Device         I/O
                                                                                 Sub System

                                                    Interface     I/O Devices



            The microprocessor
            At the centre of all operations is the MPU (Microprocessor Unit). In a
            microcomputer the CPU is the microprocessor. Its purpose is to
                  Decode the instructions and use them to control the activities within
                   the system
                  It also performs the arithmetic( + , -, /, *) and logical (>,>=,<,<=, =,
                   =!) computations.
                                      5


Timing Circuitry (Clock)
Used to synchronise the activities within the microprocessor and the bus
control logic.


Memory
     Stores both data and instructions that are currently being used.
      Memory is broken down into modules where each module contains
      several thousand locations.
     Each location is associated with an identifier called a memory
      address.


The I/O Sub System
Consists of a variety of devices for communicating with the external
world and for storing large quantities of information e.g keyboards, light
pens, e.t.c. for input and CRT monitors, printers, plotters for output.
Computer components for permanently storing programs and data are
referred to as mass storage units e.g the magnetic tapes, disk units,
magnetic bubble memory e.t.c
N.B. Both programs and data although they can be stored on mass
storage devices they must be transferred to memory first.


System Bus.
A set of conductors that connect the CPU to its memory and I/O devices.
The bus conductors are normally separated into 3 groups:
1.    The Data Lines: for transmitting information
2.    Address Lines: Indicate where information is to come from or
      where it is to be placed.
3.    Control Lines: To regulate the activities on the bus.
                                               6


Interface
Circuitry needed to connect the bus to a device. Memory interfaces
consist of logic
     Needed to decode the address of the memory location being
      accessed.
     Buffer data onto/off the bus.
     Contain circuitry to perform memory reads or write.


I/O interfaces must
     Buffer data onto/off the system bus
     Receive commands from the CPU
     Transmit information from their devices to the CPU.


Bus Control Unit
Co-ordinates the CPU activities with the external world.


THE CPU
              Control memory
                                Control Unit                    Working Registers
                             Program counter                      Address Registers

                             Instruction Register

                             Processor status word


                             Stack pointer
                                                                       Arithmetic
                                                                       registers




                   Bus
                   Control
                   Unit
                                               Arithmetic/Logic Unit
                                       7


A typical CPU consists of the control unit which contains the following
registers:
1.     The Program Counter (PC) :It holds the address of the main
       memory location from which the next instruction is to be fetched.
2.     Instruction Register (IR) Receives the instruction when it is brought
       from memory and holds it while it gets decoded and executed.
3.     Processor Status Word (PSW) contains condition flags which
       indicate the current status of the CPU and the important
       characteristics of the result of the previous instruction.
4.     Stack Pointer (SP): Accesses a special part of memory called a
       stack. It is used to temporarily store important information while
       sub routines are being executed. It hold the address at the top of
       the stack.


Working Registers
They are Arithmetic registers or accumulators and address registers.
(i)    Arithmetic Registers: They temporarily hold the operands and the
       result of the arithmetic operations.
       Accessing a register is faster than accessing memory. If several
       operations are to be performed it is better to have the operands in
       registers than in memory and return only the result to memory. The
       more arithmetic registers a CPU has the faster it can execute
       computations.


(ii)   Address Registers: for addressing data and instructions in main
       memory.
If a register can be used for both arithmetic operations and addressing it
is then called a general purpose register.
                                                 8


Arithmetic/Logic Unit
It performs arithmetic and logical operations on the contents of the
working registers, the PC, memory locations etc. It also sets and clears
the appropriate flags.


EXAMPLES OF CPU’S
The Intel 8085


          Address/Data (16 lines)

                                         Control Lines (20)



              Bus Control                                Clock


                     S   Z          AC     P         C

                             Processor Status Word



                                           Program Counter (16    bits)

                                           Stack Pointer (16 bits)

            Accumulator (8 bits)
                                           B (8 bits)       C (8 bits)
                                           D (8 bits)       E (8 bits)
                                           H (8 bits)       L (8 bits)
                                               General Purpose Registers

                      ALU




It is an 8 bit processor i.e. has 8 data lines (1 byte of data can be
      transmitted at a time.
Has 6 general purpose registers namely B, C, D, E, H, L with 8 bits each
      and associated in pairs.
1 8 bit accumulator.
1 16 bit stack pointer
                                       9


1 16 bit program counter
1 PSW with 5 flags.
Zero (Z): set when the result of the operation is zero.
Sign (S): set when sign of the result is negative.
Parity (P): When the parity of the bits in the result is even.
Carry (C): Addition resulted into a carry or subtraction or      comparison
      resulted into a borrow.
Auxiliary Carry (AC) Carry in BCD arithmetic.


Has 16 address and data lines (address space is 0 – 216 -1
20 control lines i.e. it has 20 control signals.
The address and data share the same bus lines and they must take
turns to use them. (They are time multiplexed). The address must be
sent first and then data is sent or received.


MACHINE LANGUAGE INSTRUCTIONS
At the time of execution all instructions are made up of a sequence of
bytes (a combination of zeros and ones).
Because instructions in their 0‟s and 1‟s form can be directly understood
by the computer they are therefore called machine language
instructions.
All other forms of programs, assembler, high level etc must be reduced
to their machine level form.


INSTRUCTION FORMATS
The arrangement of an instruction with respect to assigning meaning to
its various groups of bits is called its format.
The portion of the instruction that specifies what the instruction does is
called the operation code (opcode).
                                                                          10


Any address or piece of data that is required by the instruction to
complete its execution is called an operand
An instruction therefore consists of an operation code and a number of
operands.
Most computers are designed so that not more than 2 operands are
needed by a single instruction.
Some instructions require only one operand and they are called single
operand instructions; others are double operand instructions.
If a quantity is taken from a location, it is called the source operand.
The location that is changed or where the source operand is taken is
called the destination.


All instruction formats reserve the first bits of the instruction for at least
part of the opcode but beyond this the formats vary considerably from
one computer to the next. The remaining bits designate the operands or
their locations.
Instructions vary in length from 1 byte to 3 or 6 bytes.
e.g.
Register to register transfer
         0     1     0     0         0       1     1      1


             pcode             Destination              Source
                               (Register B)            (Register A)

Load accumulator from memory

                         0 0 1 1 1 0                          1       0
                         Low order Address
                         High Order Address

Transfer of immediate Data
                                                 Opcode
                                                                               Destination Register C
                           0     0       0       0 1      0       1       0
                                                  Data
                                          11


Conditional branch to Zero Result
                            Opcode
                                               Condition Code
                  1 1 0 0 1 0 1 0
                  Low order branch address
                  High Order branch Address




ADDRESSING MODES
They are the methods used to locate and fetch an operand from an
internal CPU register or from a memory location.
Each processor has its own addressing modes.
1.    Immediate Addressing: Information is part of the instruction. No
      addressing is needed to get the information.
      It is mostly used for quantities that are constants.
      They are 2 byte instructions where the operand is the second byte.
2.    Direct addressing: The address is part of the instruction.
3.    Register addressing: The operand is in the register and the
      register‟s address is part of the instruction.
4.    Indirect Addressing: The address is in the location whose address
      is specified as part of the instruction. This location may be a
      register (register indirect addressing) or it may be a memory
      location.
      e.g., add contents of register R1 to the memory location whose
      address is in register R2.
5.    Base addressing: The address is formed by adding the contents of
      a memory location or register to a number called a displacement
      which is part of the instruction. It is used primarily to reference
      arrays or in relocating a program in memory.
6.    Indexing: It is a process of incrementing or decrementing an
      address as the computer sequences through a set of consecutive
      or evenly spaced addresses. This is done by successively
                                  12


     changing an address that is stored in a register called an index
     register that can be incremented or decremented.
7.   Auto incrementing / decrementing: The index is automatically
     incremented by an instruction.
                                    13


ASSEMBLER LANGUAGE
It is a type of language that is closer to machine language instructions.
There is an assembler language instruction for each machine language
instruction.
An assembler converts Assembler Language into machine instructions.
There are 2 types of statements in assembler:
(i) Instructions: These are translated into machine code by the
      assembler.
(ii) Directive: Gives directions to the assembler during the assembly
      process but they are not translated into machine code.
Acronyms called mnemonics indicate the type of instruction.
Character strings called symbols or identifiers represent addresses and
perhaps numbers.
A typical assembler instruction would be
      MOV A , M
A typical assembler directive would be
      COST : DS      1
This directive causes the assembler to reserve a byte and associate a
symbol COST to it.


Example:
For a problem ANS: = X + Y; it can be solved as follows in the 8085
microprocessor.


               LDA       X
               MOV       B, A
               LDA       Y
               ADD       B
               STA       ANS


3 points are worth noting.
                                            14


(i)       Most instructions involve movement of information from one part of
          the computer to another.
(ii)      Computers do not work on entities with a flexible manner. e.g. for
          the ADD instruction, the second operand must be in the
          accumulator.
(iii)     All programs whatever the language involve inputting, processing
          and outputting.


Assembler Instruction Format
The General format is
Label:          Mnemonic            Operand, Operand, ; remarks
Label: It is a symbol assigned to the address of the first byte of the
instruction in which it appears.
Its presence is optional; if present it provides a symbolic name that can
be used in branch instructions to branch to an instruction.
If there is no label then there is no colon.
All instruction must contain a mnemonic.
The presence of operands depends on the instruction. If there is more
than one operand they are separated by a comma.
Remarks are for documenting the program; they are optional.

e.g.            BRADDR:             MOV            A, M

        Label                      mnemonic        destination    Source
                                                   Operand       Operand
Register – Register Transfer
e.g.            NOW:                MOV            B, D

        Label                mnemonic            destination  Source
                                                 Operand     Operand
Load accumulator from memory
e.g.            LDA          NUM

        Mnemonic      address of operand to be loaded into memory
                                                15



Transfer of immediate operand to a register
e.g.               MVI           E, 6

        Mnemonic         Destination register        Immediate Data

Conditional branch on non-zero branch
e.g.               JNZ           HERE

        Mnemonic        address
              Branch Condition

(i)       All instructions are 1, 2, or 3 bytes long.
(ii)      Instructions that involve only register or register indirect addressing
          are 1 byte long;
(iii)     Those that involve I/O or immediate operands are 2 or 3 bytes
          long.


The working registers are A, B, C, D, E, H and L.
Register Addresses are:


Register                 Address                     Register pair
          B              000                         BC           00
          C              001
          D              010                         DE           01
          E              011
          H              100                         HL           10
          L              101
          A              111
The registers are sometimes considered in pairs of BC, DE and HL.
Both registers in the pair have the same higher 2 bits in their registers.


Assembler Directives.
The directives direct the assembler during the assembly process.
The ASM 85 has 3 directives.
They have the format
                   Label:        Mnemonic            Operand, Operand
                                       16


The label is optional
The directives are DS, DB, and DW


DS (Define Storage)
It is used to reserve memory and perhaps to assign a label to the first
byte of the reserved area. e.g. ARRAY:           DS   20   reserves   20
bytes and assigns the label ARRAY to the byte with the lowest address.


DB (Define Byte)
Used to put values into or pre-assign values to memory locations as well
as reserve space and assign labels.
It serves as the DATA statement in Fortran. It can include up to 8
operands where each operand is a string constant with no more than
128 characters or constant expressions that evaluate to a 2‟s
complement number from -128 to 127.
e.g. NUM: DB 14H, „ABC‟,011101000B
reserves 5 bytes associated with a label NUM with the first byte.

             14         41   42   43        68


            NUM


DW (Define Word)
Similar to DB except that it reserves words instead of bytes. Each of its
possible 8 operands should evaluate to a 16 bit number or a single string
of one or two characters.
The lower order byte of the word is stored in the lower byte address and
the high order byte in the higher byte address e.g.


TABLE:      DW TASK1, TASK2, 092AH
                                 17


TASK1 and TASK2 are labels. Assuming that TASK1 and TASK2 have
been assigned memory locations 2010 and 108C respectively


      10 20 8C 10      2A   09



      Table
                                                18


            THE SIMPLIFIED INSTRUCTIONAL COMPUTER
It is similar to a typical microcomputer. It comes in two versions: the
standard model and the XE version. (XE = Extra Equipment or “Extra
Expensive”)


THE SIC MACHINE
Memory:
Memory consists of 8-bit bytes; any three consecutive bytes form a word
(24 bits). There are a total of 32,768 (215) bytes in the computer memory.


Registers
There are 5 registers where each has a special use. Each register is 24
bits.
Mnemonic                 Number       Use
A                        0            Accumulator; used for arithmetic Operations
X                        1            Index register; used for addressing
L                        2            Linkage register; the Jump to subroutine (JSUB)
                                      instruction stores the return address in this register.
PC                       8            Program Counter; Contains the address of the next
                                      instruction to be fetched for execution.
SW                       9            Status word; contains the condition codes

Data Formats
Integers are stored as 24 bit binary numbers; 2‟s complement
representation is used for negative numbers. Characters are stored
using their 8 bit ASCII codes. There is no floating point hardware on the
simple standard version.


Instruction formats
All machine instructions have the following 24 bit format.
                     8            1              15
            Opcode                x         address

The flag bit x is used to indicate indexed addressing mode.
                                           19


Addressing Modes
There are 2 addressing modes designated by the bit x. When x = 1 the
addressing mode is indexed, when it is 0 it is direct.
Mode                  Indication    Target address calculation
Direct                x=0           TA = address
Indexed               x=1           TA = address + (X)
(X) means contents of register X.


Instruction Set
Instructions available include LDA, LDX, STA, STX, ADD, SUB, MUL,
DIV etc.
All Arithmetic operations involve register A and a word in memory; the
result is left in the register.
An instruction COMP compares values in register A with a word in
memory, it sets the condition code CC to indicate the result (<, = , or >).
Conditional jump instructions are JLT, JEQ, JGT. For subroutine linkage,
there is jump to subroutine JSUB and the return address is placed in
register L and return from subroutine RSUB where the program returns
by jumping to address contained in register L.


Input and Output
This is performed by transferring one byte at a time to or from the
rightmost 8 bits of register A. Each device is assigned a unique 8 bit
code.
There are three I/O instructions each of which specifies the device code
as an operand.
        TD Test Device: tests whether the addressed device is ready to
         send or receive a byte of data. The condition code is set to indicate
         the result of this test.
        < implies ready to send or receive.
                                            20


      = device is busy.
     RD Read data: Reads the data from a device when the device is
      ready otherwise the operation is delayed.
     WR Writes data to a device.
The sequence is repeated for each byte of data to be read or written.


THE SIC/XE MACHINE
Memory:
The maximum memory is one megabyte (220 bytes). This increase leads
to a change in instruction formats and addressing modes.


Registers
The following additional registers are provided
Mnemonic            Number      Use
B                   3           Base register, used for addressing
S                   4           General Working register, no special use.
T                   5           General Working register, no special use.
F                   6           Floating Point Accumulator.


Data Formats
In addition to the standard formats for the standard version there is a 48
bit floating point data type with the following format.
            1      11                   36
            S   exponent         fraction

The fraction is a value between 0 and 1. For normalized floating point
numbers, the high order bit of the fraction must be a 1. The exponent is
an unsigned binary number between 0 and 2047.
If the exponent is e and the fraction is f the absolute value of the fraction
is represented as f * 2 (e – 1024)
The sign of the floating point number is indicated by the value of s ( 0 =
positive , 1 = negative.)
                                                                21




Instruction formats
Format 1 (1 byte)
                             8
             Opcode

Format 2 (2 bytes)
                      8                  4                      4
             opcode                          r1            r2

Format 3 (3 bytes)
                  6              1   1        1        1   1    1              12
             opcode              n   i        x       b    p    e    disp

Format 4 (4 bytes)
                      6          1   1            1    1   1    1              20
             opcode              n   i        x        b   p    e    address


Since we now have more memory, an address can‟t now fit into a 15 bit
field. Two possible options are available in the extended version by
using some sort of relative addressing (format 3) or extend the address
field to 20 bits (format 4). There are also instructions that do not
reference memory at all (formats 1 and 2)
Bit e in formats 3 and 4 is used to distinguish between formats 3 and 4.
(e = 0 means format 3, e = 1 means format 4)


Addressing Modes
Two new relative addressing modes are provided by the extended
version of format 3: Base relative addressing and program counter
relative addressing
Mode                      Indication                  Target address calculation
Base relative             b= 1, p = 0                 TA =(B) + disp (0 <= disp <= 4095)
Program Counter           b = 0, p = 1                TA = (PC) + disp (-2048 <= disp <= 2047)


For base addressing disp is a 12 bit unsigned integer. For program
counter relative addressing it is a 12 bit signed integer with negative
values represented in 2‟s complement.
                                                 22


If bits b and p are set to 0, the disp field in format 3 is taken to be the
target address. For format 4 bits b and p must be 0 and the target
address is taken from the address field of the instruction. This is called
direct addressing.
Any of these addressing modes can also be combined with indexed
addressing if bit x is set to 1. In such a case the contents of X, (X) is
added in the target address calculation.
Bits i and n are used to specify how the target address is used. If i = 1
and n = 0 the target address itself is used as the operand value. No
memory reference is made. This is immediate addressing.
If i = 0 and n = 1 the word at the location given by the target address is
fetched. The value in this word is taken as the address of the operand
value. This is indirect addressing. If bits i and n are both 0 or both = 1
the target address is taken as the location of the operand. This is
referred to as simple addressing. Indexing cannot be used with
immediate or indirect addressing.
If bits n and i are both 0 then bits b, p and e are considered to be part of
the address field of the instruction rather than flags indicating addressing
modes. This makes Instruction Format 3 identical to the format used on
the standard version of SIC.
b   p   Addressing Mode               Target Address
0   0   Direct                        TA = Disp + (X)
0   1   PC Relative                   TA = (PC) + Disp + (X)
1   0   Base Relative                 TA = (B) + Disp + (X)
1   1   -----



n   i   Addressing Mode       Operand
0   0   Simple Addressing     Operand = Contents of TA (bits b,p,e are part of address bits)
0   1   Immediate             Operand = TA
1   0   Indirect Addressing   Word at TA is the address of the Operand
1   1   Simple addressing     Operand = Contents of TA
                                                23


Example
The figure below gives the different addressing modes available. The
contents of registers B, PC, and X and some selected memory locations
are shown.
The machine code for a series of LDA instructions is given. The target
address generated by each instruction and the value that is loaded into
register A are also shown.
(B) = 006000 (PC) = 003000 (X) = 000090

      ……….        003600                      103000       00C303        003030


                   3030                       3600          6390          C303
                                                                                value
                                                                               loaded
Hex                                                                  Target      into
           Op       n i    x   b   p   e      disp/address          Address        A
032600   000000     1 1    0   0   1   0   0110 0000 0000              3600    103000
03C300   000000     1 1    1   1   0   0   0011 0000 0000              6390   00C303
022030   000000     1 0    0   0   1   0   0000 0011 0000              3030    103000
010030   000000     0 1    0   0   0   0   0000 0011 0000                30    000030
003600   000000     0 0    0   0   1   1   0110 0000 0000              3600    103000
0310C303 000000     1 1    0   0   0   1   0000 1100 0011 0000 0011    C303   003030
                                                            24


                                    Addressing Modes
The following addressing modes apply to Format 3 and 4 instructions. Combinations of
addressing bits not included in this table are treated as errors.

4          Format 4 Instruction
D          Direct addressing
A          Assembler selects either program counter relative or base relative mode

Addressing Mode      n i     x b p e          Calculation of target   Operand   Notes
                                              Address TA
Simple               1   1   0   0    0   0   disp                    (TA)      D
                     1   1   0   0    0   1   addr                    (TA)      4 D
                     1   1   0   0    1   0   (PC) + disp             (TA)      A
                     1   1   0   1    0   0   (B) + disp              (TA)      A
                     1   1   1   0    0   0   Disp + (X)              (TA)      D
                     1   1   1   0    0   1   Addr + (X)              (TA)      4 D
                     1   1   1   0    1   0   (PC) + disp + (X)       (TA)      A
                     1   1   1   1    0   0   (B) + disp + (X)        (TA)      A
                     0   0   0   -    -   -   b /p/e/disp             (TA)      D
                     0   0   1   -    -   -   b/ p/e/disp + (X)       (TA)      D

Indirect             1   0   0   0    0   0   disp                    ((TA))    D
                     1   0   0   0    0   1   addr                    ((TA))    4 D
                     1   0   0   0    1   0   (PC) + disp             ((TA))    A
                     1   0   0   1    0   0   (B) + disp              ((TA))    A

Immediate            0   1   0   0    0   0   disp                    TA        D
                     0   1   0   0    0   1   addr                    TA        4 D
                     0   1   0   0    1   0   (PC) + disp             TA        A
                     0   1   0   1    0   0   (B) + disp              TA        A



Instruction Set
All instructions in the standard version are still available. In addition there
are instructions to load and store the new registers (LDB, STB, etc) and
to perform floating point arithmetic. (ADDF, SUBF, MULF, DIVF).
Other instructions work on registers e.g. RMO, ADDR, SUBR, MULR,
DIVR.
In the instruction set Table below, uppercase letters refer to specific registers. The notation
m .indicates a memory address, n indicates an integer between 1 and 16 and r1 and r2
represent register identifiers.
Parentheses are used to indicate the contents of a register or memory location. Thus
A        (m..m+2) specifies that the contents of the memory location m through m+2 are loaded
into register A; m..m+2              (A) specifies that the contents of register A are stored in the word
that begins at address m.
P          Priviledged Instruction
                                            25


X     Instruction available only on XE version
F     Floating point Instruction
C     Condition code CC set to indicate result of operation


DIRECTIVES
RESW         Reserve the indicated number of words for a data area.
RESB         Reserve the indicated number of bytes for a data area
WORD         Generate a one word integer constant
BYTE         Generate character or hexadecimal constant, occupying as
             many bytes as needed to represent the constant
START        Specifies the name and starting address for the program.
END          Indicates the end of the source program and optionally
             specify the first executable instruction in the program.
                                              26


                            SIC/XE INSTRUCTION SET
MNEMONIC      FORMAT   OPCODE   EFFECT                                        Notes
ADD m         ¾        18       A            (A) + (m...m+2)
ADDF m        ¾        58       F           (F) + (m...m+5)                   X F
ADDR r1,r2    2        90       r2           (r2) + (r1)                      X
AND m         ¾        40       A            (A) & (m…m+2)
CLEAR r1      2        B4       r1           0                                X
COMP m        ¾        28       (A) : (m…m+2)                                 C
COMPF m       ¾        88       (F) : (m…m+5)                                 X F C
COMPR r1,r2   2        A0       (r1) : (r2)                                   X   C
DIV m         ¾        24       A            (A) / (m….m+2)
DIVF m        ¾        64       F           (F) / (m….m+5)                    X   F
DIVR r1,r2    2        9C       r2           (r2) / (r1)                      X
FIX           1        C4       A            (F) convert to integer           X   F
FLOAT         1        C0       F            (A) convert to float             X   F
HIO           1        F4       Halt I/O channel number (A)                   P   X
Jm            ¾        3C       PC         m
JEQ m         ¾        30       PC           m if CC set to =
JGT m         ¾        34       PC           m if CC set to >
JLT m         ¾        38       PC           m if CC set to <
JSUB m        ¾        48       L           (PC); PC           m
LDA m         ¾        00       A            (m..m+2)
LDB m         ¾        68       B           (m..m+2)                          X
LDCH m        ¾        50       A [rightmost byte]               (m)
LDF m         ¾        70       F           (m..m+5)                          X F
LDL m         ¾        08       L           (m..m+2)
LDS m         ¾        6C       S           (m..m+2)                          X
LDT m         ¾        74       T           (m..m+2)                          X
LDX m         ¾        04       X            (m..m+2)
LPS m         ¾        D0       Load       processor        status    from    P X
                                information beginning at address m
MUL m         ¾        20       A          (A) * (m..m+2)
MULF m        ¾        60       F         (A) * (m..m+5)                      X F
MULR r1,r2    2        98       r2          (r2) * (r1)                       X
NORM          1        C8       F            (F) normalised                   X F
OR m          ¾        44       A            (A) | (m..m+2)
RD m          ¾        D8       A [rightmost byte]               data from    P
                                device specified by (m)
RMO r1,r2     2        AC       r2             (r1)                           X
RSUB          ¾        4C       PC              (L)
SHIFTL r1,n   2        A4       r1          (r1) left shift n bits            X
SHIFTR r1,n   2        A8       r1          (r1) right shift n bits           X
SIO           1        F0       Start I/O channel number (A)                  P X
SSK m         ¾        EC       Protection key for address m                  P X
                                (A)
STA m         ¾        0C       m..m+2              (A)
STB m         ¾        78       m..m+2              (B)                       X
STCH m        ¾        54       m            (A) [rightmost byte]
STF m         ¾        80       m..m+5              (F)                       X F
STI m         ¾        D4       Interval timer value            (m ..m+2)     P X
STL m         ¾        14       m..m+2              (L)
STS m         ¾        7C       m..m+2              (S)                       X
STSW m        ¾        E8       m..m+2              (SW)                      P
STT m         ¾        84       m..m+2              (T)                       X
STX m         ¾        10       m..m+2              (X)
SUB m         ¾        1C       A            (A) - (m..m+2)
SUBF m        ¾        5C       F           (F) - (m..m+5)                    X   F
SUBR r1,r2    2        94       r2           (r2) – (r1)                      X
SVC n         2        B0       Generate SVC Interrupt                        X
TD m          ¾        E0       Test Device specified by (m)                  P   C
TIO           1        F8       Test I/O channel number (A)                   P   X C
TIX m         ¾        2C       X          (X) + 1; (X) : (m..m+2)            C
TIXR r1       2        B8       X          (X) + 1; (X) : (r1)                X   C
WD m          ¾        DC       Device specified by (m)                 (A)   P
                                [rightmost byte]
                                         27


EXAMPLES

1.   Sample data movement operations for SIC
                   LDA           FIVE
                   STA           ALPHA
                   LDCH          CHARZ
                   STCH          C1

     ALPHA         RESW          1
     FIVE          WORD          5
     CHARZ         BYTE          C’Z’
     C1            RESB          1


     Same problem for the SIC/XE
                   LDA           #5
                   STA           ALPHA
                   LDCH          #90           (Load ASCII code for Z)
                   STCH          C1

     ALPHA         RESW          1
     C1            RESB          1




2.   Sample arithmetic operation for SIC
     All arithmetic operations are done using register A with the result being left in
     Register A.

             BETA = (ALPHA + INCR – 1); DELTA = (GAMMA + INCR – 1)

                   LDA           ALPHA
                   ADD           INCR
                   SUB           ONE
                   STA           BETA
                   LDA           GAMMA
                   ADD           INCR
                   SUB           ONE
                   STA           DELTA
     ONE           WORD          1
     ALPHA         RESW          1
     BETA          RESW          1
     GAMMA         RESW          1
     DELTA         RESW          1
     INCR          RESW          1


     Same problem for the SIC/XE
                   LDS           INCR
                   LDA           ALPHA
                   ADDR          S, A
                   SUB           #1
                   STA           BETA
                   LDA           GAMMA
                                           28

                    ADDR           S, A
                    SUB            #1
                    STA            DELTA

     ALPHA          RESW           1
     BETA           RESW           1
     GAMMA          RESW           1
     DELTA          RESW           1
     INCR           RESW           1




3.   Sample looping and Indexing operations for SIC
     The loop copies one 11 –byte character string to another.
                    LDX            ZERO
     MOVECH:        LDCH           STR1,X
                    STCH           STR2,X
                    TIX            ELEVEN
                    JLT            MOVECH

     STR1           BYTE           C’TEST STRING’
     STR2           RESB           11
     ZERO           WORD 0
     ELEVEN         WORD 11


     Same problem for the SIC/XE
                    LDT            #11
                    LDX            #0
     MOVECH:        LDCH           STR1,X
                    STCH           STR2,X
                    TIXR           T
                    JLT            MOVECH


     STR1           BYTE           C’TEST STRING’
     STR2           RESB           11

4.   Sample looping and Indexing operations for SIC
     The variables ALPHA, BETA and GAMMA are arrays of 100 words each. The loop
     adds the corresponding elements of ALPHA and BETA and stores the result in the
     elements of GAMMA.
     The value in the index register must be incremented by 3 for each iteration of the loop
     because each iteration processes a 3 byte (1 word) element of the array.

                    LDA            ZERO
                    STA            INDEX
     ADDLP          LDX            INDEX
                    LDA            ALPHA, X
                    ADD            BETA, X
                    STA            GAMMA, X
                    LDA            INDEX
                    ADD            THREE
                    STA            INDEX
                    COMP           K300
                    JLT            ADDLP
                                           29


     INDEX         RESW           1
     ALPHA         RESW           100
     BETA          RESW           100
     GAMMA         RESW           100
     ZERO          WORD           0
     K300          WORD           300


     Same problem for the SIC/XE
                   LDS            #3
                   LDT            #300
                   LDX            #0
     ADDLP         LDA            ALPHA, X
                   ADD            BETA, X
                   STA            GAMMA, X
                   ADDR           S, X
                   COMPR          X, T
                   JLT            ADDLP

     ALPHA         RESW           100
     BETA          RESW           100
     GAMMA         RESW           100



5.   Sample input and output operations for SIC
     The same instructions will also work on SIC/XE.
     The program reads 1 byte of data from device F1 and copies it onto device 05.
     If the device is ready the condition code is set to “Less than”; if not ready the
     condition code is set to “equal”.

     INLOOP        TD             INDEV
                   JEQ            INLOOP
                   RD             INDEV
                   STCH           DATA

     OUTLP         TD             OUTDEV
                   JEQ            OUTLP
                   LDCH           DATA
                   WD             OUTDEV

     INDEV         BYTE           X’F1’
     OUTDEV        BYTE           X’05’
     DATA          RESB           1

6.   Sample Subroutine call and record input operations for SIC
     The program reads a 100 byte record from an input device into memory. The read
     operation is placed in a subroutine which is called the main program by using the
     JSUB instruction. At the end of the subroutine there is an RSUB instruction which
     returns control to the instruction that follows the JSUB.
     The READ subroutine also consists of a loop. Each execution of this loop reads one
     byte of data from the input device. The bytes that are read are stored in a 100 byte
     buffer area labeled RECORD.
                                  30

               JSUB       READ

     READ:     LDX        ZERO
     RLOOP     TD         INDEV
               JEQ        RLOOP
               RD         INDEV
               STCH       RECORD,X
               TIX        K100
               JLT        RLOOP
               RSUB

     INDEV     BYTE       X’F1’
     RECORD    RESB       100
     ZERO      WORD       0
     K100      WORD       100


Same problem for the SIC/XE
               JSUB       READ


     READ:     LDX        #0
               LDT        #100
     RLOOP     TD         INDEV
               JEQ        RLOOP
               RD         INDEV
               STCH       RECORD,X
               TIXR       T
               JLT        RLOOP
               RSUB

     INDEV     BYTE       X’F1’
     RECORD    RESB       100
                                          31


                                  ASSEMBLERS
Basic Assembler Functions
An assembler is a program that accepts as input an assembler language
program and it produces its machine language equivalent along with
information for the loader.

Assembler Language
Program            Assembler                         Machine              To linker
                                                     Language


                                                    Listing

COPY      START        1000             Copy file from input to output
FIRST     STL          RETADR           Save Return Address
CLOOP     JSUB         RDREC            Read input record
          LDA          LENGTH           Test for EOF (length = 0)
          COMP         ZERO
          JEQ          ENDFIL           Exit if EOF found
          JSUB         WRREC            Write output record
          J            CLOOP            Loop
ENDFIL    LDA          EOF              Insert end of file marker
          STA          BUFFER
          LDA          THREE            Set length = 3
          STA          LENGTH
          JSUB         WRREC            Write EOF
          LDL          RETADR           Get Return Address
          RSUB                          Return to Caller
EOF       BYTE         C’EOF’
THREE     WORD         3
ZERO      WORD         0
RETADR    RESW         1
LENGTH    RESW         1                Length of Record
BUFFER    RESB         4096             4096 Byte Buffer Area
                        Subroutine to read record into Buffer
RDREC      LDX         ZERO             Clear Loop Counter
           LDA         ZERO             Clear A to Zero
RLOOP      TD          INPUT            Test input device
           JEQ         RLOOP            Loop until ready
           RD          INPUT            Read character into a Register
           COMP        ZERO             Test for End of Record (X ‘00’)
           JEQ         EXIT             Exit Loop if EOF
           STCH        BUFFER,X         Store character in buffer
           TIX         MAXLEN           Loop unless MAX Length
           JLT         RLOOP               Has been reached
EXIT       STX         LENGTH           Save Record Length
           RSUB                         Return to Caller
INPUT      BYTE        X’F1’            Code for Input Device
MAXLEN     WORD        4096
                        Subroutine to write record from Buffer
WRREC      LDX         ZERO             Clear Loop counter
WLOOP      TD          OUTPUT           Test output device
           JEQ         WLOOP            Loop until ready
           LDCH        BUFFER,X         Get character from buffer
           WD          OUTPUT           Write Character
           TIX         LENGTH           Loop Until all characters
           JLT         WLOOP                 have been written
           RSUB                         Return to Caller
OUTPUT     BYTE        X’05’            Code for output Device
           END         FIRST
                                    32


The example above shows an assembler language program that
contains a main routine that reads records from an input device (F1) and
copies them to an output device (code 05).
The main routine calls subroutine RDREC to read a record into a buffer
and subroutine WRREC to write the record from the buffer to the output
device. Only one character is transferred at a time. The end of each
record is marked with a null character (hexadecimal 00). If a record is
longer than the length of the buffer (4096 bytes) only the first 4096 bytes
are copied.
The end of a file to be copied is indicated by a zero length record.
When the end of file is detected the program writes EOF on the output
device and terminates by executing an RSUB instruction.


A simple SIC Assembler
The code for the program is rewritten below with the generated object
code for each statement.
The translation of the assembler program to object code needs the
following:
1     Convert mnemonic operation codes to their machine language
      equivalent; e.g. translate STL to 14.
2     Convert symbolic operands to their equivalent machine addresses
      e.g. translate RETADR to 1033.
3     Build the machine instructions in the correct format.
4     Convert   the   data    constants   into   their   internal   machine
      representations e.g. translate EOF to 454F46
5.    Write the object program and the assembly listing.
All these functions except 2 can easily be accomplished by simple
processing of the source program one line at a time but the translation of
                                         33


addresses is a bit complicated because the address to be assigned to
the symbol is unknown.
Because of this, most assemblers make 2 passes over the source
program.
1000       COPY                 START          1000
1000       FIRST                STL            RETADR            141033
1003       CLOOP                JSUB           RDREC             482039
1006                            LDA            LENGTH            001036
1009                            COMP           ZERO              281030
100C                            JEQ            ENDFIL            301015
100F                            JSUB           WRREC             482061
1012                            J              CLOOP             3C1003
1015       ENDFIL               LDA            EOF               00102A
1018                            STA            BUFFER            0C1039
101B                            LDA            THREE             00102D
101E                            STA            LENGTH            0C1036
1021                            JSUB           WRREC             482061
1024                            LDL            RETADR            081033
1027                            RSUB                             4C0000
102A       EOF                  BYTE           C’EOF’            454F46
102D       THREE                WORD           3                 000003
1030       ZERO                 WORD           0                 000000
1033       RETADR               RESW           1
1036       LENGTH               RESW           1
1039       BUFFER               RESB           4096
                        Subroutine to read record into Buffer
2039       RDREC                 LDX           ZERO              041030
203C                             LDA           ZERO              001030
203F       RLOOP                 TD            INPUT             E0205D
2042                             JEQ           RLOOP             30203F
2045                             RD            INPUT             D8205D
2048                             COMP          ZERO              281030
204B                             JEQ           EXIT              302057
204E                             STCH          BUFFER,X          549039
2051                             TIX           MAXLEN            2C205E
2054                             JLT           RLOOP             38203F
2057       EXIT                  STX           LENGTH            101036
205A                             RSUB                            4C0000
205D       INPUT                 BYTE          X’F1’             F1
205E       MAXLEN                WORD          4096              001000
                        Subroutine to write record from Buffer
2061       WRREC                LDX             ZERO             041030
2064       WLOOP                TD              OUTPUT           E02079
2067                            JEQ             WLOOP            302064
206A                            LDCH            BUFFER,X         509039
206D                            WD              OUTPUT           DC2079
2070                            TIX             LENGTH           2C1036
2073                            JLT             WLOOP            382064
2076                            RSUB                             4C0000
2079       OUTPUT               BYTE            X’05’            05
                                END             FIRST



Pass 1 (Define symbols)
1.     The first pass scans the source program for label definitions and
       assigns addresses to all statements in the program.
2.     Saves the addresses assigned to all labels for use in pass 2
                                           34


3.    Perform some processing of the assembler directives. e.g.,
      determining the length of data areas defined by BYTE, RESW etc.


Pass 2 (assemble the instructions and generate machine code)
1.    Assemble instructions (translating operation codes and looking up
      addresses)
2.    Generate data values defined by BYTE, WORD, etc.
3.    Perform processing of the directives not done during pass 1.
4.    Write the object program and assembly listing onto some output
      device which will later be loaded in memory for execution.


A simple object program contains three types of records:
1.    The Header: contains the program name, the starting address of
      the program and the length of the whole object program.
2.    The Text record contains the translated instructions and the data
      of the program together with an indication of the addresses where
      they are loaded.
3.    The End record marks the end of the object program and specifies
      the address of the program where execution is to begin.


Header Record
      Col. 1       H
      Col 2-7      Program Name
      Col 8-13     Starting address of the object program
      Col 14-19    Length of object program in bytes.

Text Record
      Col. 1       T
      Col 2-7      Starting address for object code in this record
      Col 8-9      Length of object code in this record in bytes.
      Col 10-69    Object code in hexadecimal.

End Record
      Col. 1 E
      Col 2-7      Address of first executable instruction in object
                   program.
                                          35

H^ COPY ^001000^00107A
T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^00102D
T^00101E^15^0C1036^482061^081033^4C0000^454F46^000003^000000
T^002039^1E^041030^001030^E0205D^30203F^D8205D^281030^302057^549039^2C205E^38203F
T^002057^IC^101036^4C0000^F1^001000^041030^E02079^302064^509039^DC2079^2C1036
T^002073^07^382064^4C0000^05
E^001000


Assembler Tables and Logic
Two major internal tables are used: The Operation Code table (OPTAB)
and the Symbol Table (SYMTAB).
OPTAB is used to look up mnemonic operation codes and translate
them into their machine language equivalents. SYMTAB is used to store
addresses assigned to labels.
A location counter LOCCTR is used to help in the assignment of
addresses. It is initialized to the beginning address specified in the
START statement. After each source statement is processed the length
of the assembled instruction or data area to be generated is added to the
LOCCTR. Whenever a label in the source program is encountered the
current value of the LOCCTR gives the address to be associated with
that label and it is then said to be defined. If the same label is defined
twice in the source module, an error occurs.
During pass 1 OPTAB is used to look up and validate operation codes in
the source program. In pass 2 it is used to translate the opcodes to
machine language.
The SYMTAB includes the name and address for each label in the
source code program together with flags to indicate error conditions.
In pass 1 the labels are entered into the SYMTAB as they are
encountered in the source program along with their assigned addresses.
In the second pass symbols used as operands are looked up in
SYMTAB to obtain the addresses to be inserted in the assembler
instructions.
                                          36


Machine Dependent Assembler Features
We consider features that get affected when different machines are used
by comparing the previous program which run on the SIC machine as
compared to the SIC/XE machine. We examine the effect of the
hardware on the structure and functions of the assembler.
0000     COPY      START        0
0000     FIRST     STL          RETADR             17202D
0003               LDB          #LENGTH            69202D
                   BASE         LENGTH
0006     CLOOP     +JSUB        RDREC              4B101036
000A               LDA          LENGTH             032026
000D               COMP         #0                 290000
0010               JEQ          ENDFIL             332007
0013               +JSUB        WRREC              4B10105D
0017               J            CLOOP              3F2FEC
001A     ENDFIL    LDA          EOF                032010
001D               STA          BUFFER             0F2016
0020               LDA          #3                 010003
0023               STA          LENGTH             0F200D
0026               +JSUB        WRREC              4B10105D
002A               J            @RETADR            3E2003
002D     EOF       BYTE         C’EOF’             454F46
0030     RETADR    RESW         1
0033     LENGTH    RESW         1
0036     BUFFER    RESB         4096

                       Subroutine to read record into Buffer
1036      RDREC     CLEAR         X               B410
1038                CLEAR         A               B400
103A                CLEAR         S               B440
103C                +LDT          #4096           75101000
1040      RLOOP     TD            INPUT           E32019
1043                JEQ           RLOOP           332FFA
1046                RD            INPUT           DB2013
1049                COMPR         A,S             A004
104B                JEQ           EXIT            332008
104E                STCH          BUFFER,X        57C003
1051                TIXR          T               B850
1053                JLT           RLOOP           3B2FEA
1056      EXIT      STX           LENGTH          134000
1059                RSUB                          4F0000
105C      INPUT     BYTE          X’F1’           F1
                       Subroutine to write record from Buffer
105D      WRREC     CLEAR         X                B410
105F                LDT           LENGTH           774000
1062      WLOOP     TD            OUTPUT           E32011
1065                JEQ           WLOOP            332FFA
1068                LDCH          BUFFER,X         53C003
106B                WD            OUTPUT           DF2008
106E                TIXR          T                B850
1070                JLT           WLOOP            3B2FEF
1073                RSUB                           4F0000
1076      OUTPUT    BYTE          X’05’            05
                    END           FIRST
The program above shows the previous first program rewritten using the
SIC/XE features. Indirect addressing is indicated by adding an appendix
@ to the operand. Immediate operands are denoted with the prefix #.
Instructions that refer to memory are usually assembled using either
                                     37


program counter relative or the base relative mode. The assembler
directive BASE is used in conjunction with base relative addressing. If
the displacement required is too large to fit into a 3 byte instruction, the 4
byte extended format (Format 4) is used. It is specified with the prefix +
The main difference between the two versions is the use of register-
register instructions wherever possible. E.g. COMP ZERO is replaced by
COMPR A, S; TX MAXLEN to TIXR T
In addition immediate and indirect addressing has been used (COMP #0,
LDA #3, J @RETADR).
These changes result in increased execution speed of the program
because register-register instructions are faster than the corresponding
memory operations since they are shorter and do not require another
memory reference. The use of indirect addressing often avoids the need
for another instruction.


1.    Instruction Formats and addressing Modes
For translation of register-register instructions e.g. CLEAR and COMPR
the assembler simply converts the mnemonic operation code to machine
language using the OPTAB. The conversion of register mnemonics to
numbers can be done with a separate table or to use SYMTAB which
can be preloaded with the register names.
Most of the register – memory instructions are assembled using either
program counter relative addressing or base relative addressing. The
assembler must in either case calculate the displacement to be
assembled as part of the object code. This is computed so that the
correct target address results when the displacement is added to the
contents of the program counter or the Base register.
                                     38


If neither Program counter nor base relative addressing can be used
(because the displacements are too large) Format 4 is then used so that
there is no displacement to be calculated. For example in the instruction
       0006           CLOOP          +JSUB       RDREC         4B101036
the operand address is 1036. This full address is stored in the instruction
with bit e set to 1 to indicate extended instruction format.
The instruction
0000          FIRST       STL   RETADR     172020D
is an example of a typical Program Counter relative assembly. During
execution the PC will contain the address of the next instruction (LDB).
From the addresses RETADR is assigned the address 0030 (from
SYMTAB). The displacement needed is 0030 – 0003 = 2D.
At execution time the target address calculation performed will be (PC) +
disp resulting into the correct address 0030.
Bit p is set to 1 to indicate program counter relative addressing making
the last two bytes of the instruction 202D.
Bits n and i are both set to 1 indicating neither indirect nor immediate
addressing which makes the first byte 17 instead of 14.
The instruction
0017          J       CLOOP     3F2FEC
is another example of Program Counter relative assembly. The operand
address is 0006. During execution the PC will contain 001A. The
displacement required is 0006 – 001A = -14 = FEC as a 2‟s complement
number in a 12 bit field.
Displacement for Base relative addressing is much the same except that
the assembler knows what the contents of the program counter will be at
execution time for Program Counter addressing while the base register
on the other hand is under the control of the programmer. The statement
BASE          LENGTH for example informs the assembler that the base
                                      39


register will contain the address of LENGTH which is loaded by the
preceding example LDB #LENGTH.
The instruction          104E        STCH            BUFFER, X 57C003

is an example of base relative assembly.
Register B contains 0033. The address of buffer is 0036. The
displacement is therefore 0036-0033 = 3. Bits x and b are set to 1 to
indicate indexed and base relative addressing.
Immediate addressing just requires converting the operand to its internal
representation and insert it into the instruction.
The instruction          0020        LDA             #3    010003

is an example of immediate addressing. The operand is within the
instruction as 003. Bit i is set to 1 to indicate immediate addressing.
For         103C         +LDT        #4096           75101000

The operand is too large to fit into 12 bits, so extended format is used.
            0003         LDB         #LENGTH               69202D
is also immediate addressing. The value of the symbol is the address
assigned to it so it loads the address of length into register B. Here
program counter addressing is combined with immediate addressing.
The instruction          002A        J       @RETADR             3E2003

combines program counter relative and indirect addressing.


2.      Program Relocation
It is always impossible to plan where the program will be executed in
memory because there are processes that are always going on. In such
a case the actual starting address of a program is not known until load
time.
The SIC program on page 41 is an example of an absolute program. It
must be loaded at address 1000 (the address that was specified at
assembly time) in order to execute properly.
                                          40


Example: In the instruction
       101B       LDA THREE                    00102D
register A is to be loaded from memory address 102D. If an attempt is
made to load the program at address 2000 instead of 1000 the address
102D will not contain the expected value.
Some changes in the address portion of this instruction are needed
before executing the program at address 2000.
The object program that contains information necessary to perform this
kind of modification is called a relocatable program.
For the SIC/XE program on page 44, the JSUB instruction is loaded at
address 0006. The address field of this instruction contains 01036. If the
program is loaded at address 5000 the address of the instruction labeled
RDREC is then 6036. The JSUB instruction will have to be modified as
shown to contain the new address.
No matter where the program is loaded, RDREC will always be 1036
bytes past the starting address.
0000

0006   4B101036   (+JSUB RDREC)


1036   B410          RDREC

1076


                          5000

                          5006     4B106036    (+JSUB RDREC)

                          6036     B410
                                    6             RDREC

                          6076

                                                           7420

                                                            7426    4B108456    (+JSUB RDREC)

                                                    8456            B410       RDREC


                                                             8496



The relocation problem is solved in the following way:
                                            41


1.    When the assembler generates the object code for the JSUB
      instruction it will insert the address of RDREC relative to the start of
      the program.
2.    The assembler will also produce a command for the loader
      instructing it to add the beginning address of the program to the
      address field in the JSUB instruction at load time.

The command for the loader must also be a part of the object program.
This is accomplished by having a modification record with the format:
      Col. 1        M
      Col 2-7       starting location of the address field to be modified
                    relative to the beginning of the program
      Col 8-9       length of the address field to be modified in half
                    Bytes


For the JSUB instruction the modification record would be M00000705
The beginning address of the program is to be added to a field that
begins at the address 000007 and it is 5 half bytes in length.
The same relocation must be added for the instructions at addresses
0013 and 0026 respectively.
The other instructions in the program need not to be modified because in
some cases the operand is not a memory address e.g. CLEAR, S or
LDA #3. In other cases the operand is specified using program counter
relative addressing or base relative addressing.
The only parts of the program that require modification at load time are
those that specify direct addresses.
The object code for the above program would be
H^COPY ^000000^001077
T^000000^1D^17202D^69202D^4B101036^032026^290000^332007^4B10105D^3F2FEC^032010
T^00001D^13^0F2016^010003^0F200D^4B10105D^3E2003^454F46
T^001036^1D^B410^B400^B440^75101000^E32019^332FFA^DB2013^A004^332008^57C003^B850
T^001053^1D^3B2FEA^134000^4F0000^F1^B410^774000^E32011^332FFA^53C003^DF2008^B850
T^001070^07^3B2FEF^4F0000^05
M^000007^05
M^000014^05
M^000027^05
E^000000
                                                 42


Machine Independent Assembler Features.
These are features that are not closely related to machine structure.
Presence or absence of these features is related to issues like
programmer‟s convenience and software environment than to machine
structure.
0000         COPY     START            0
0000         FIRST    STL              RETADR                            17202D
0003                  LDB              #LENGTH                           69202D
                      BASE             LENGTH
0006         CLOOP    +JSUB            RDREC                             4B101036
000A                  LDA              LENGTH                            032026
000D                  COMP             #0                                290000
0010                  JEQ              ENDFIL                            332007
0013                  +JSUB            WRREC                             4B10105D
0017                  J                CLOOP                             3F2FEC
001A         ENDFIL   LDA              =C’EOF’                           032010
001D                  STA              BUFFER                            0F2016
0020                  LDA              #3                                010003
0023                  STA              LENGTH                            0F200D
0026                  +JSUB            WRREC                             4B10105D
002A                  LTORG
002D         *        =C’EOF’                                            454F46
0030         RETADR   RESW            1
0033         LENGTH   RESW            1
0036         BUFFER   RESB            4096
1036         BUFEND   EQU             *
1000         MAXLEN   EQU             BUFEND-BUFFER
                                Subroutine to read record into Buffer
1036         RDREC    CLEAR           X                                  B410
1038                  CLEAR           A                                  B400
103A                  CLEAR           S                                  B440
103C                  +LDT            #4096                              75101000
1040         RLOOP    TD              INPUT                              E32019
1043                  JEQ             RLOOP                              332FFA
1046                  RD              INPUT                              DB2013
1049                  COMPR           A,S                                A004
104B                  JEQ             EXIT                               332008
104E                  STCH            BUFFER,X                           57C003
1051                  TIXR            T                                  B850
1053                  JLT             RLOOP                              3B2FEA
1056         EXIT     STX             LENGTH                             134000
1059                  RSUB                                               4F0000
105C         INPUT    BYTE             X’F1’                             F1
                                Subroutine to write record from Buffer
105D         WRREC    CLEAR            X                                 B410
105F                  LDT              LENGTH                            774000
1062         WLOOP    TD               =X’05’                            E32011
1065                  JEQ              WLOOP                             332FFA
1068                  LDCH             BUFFER,X                          53C003
106B                  WD               =X’05’                            DF2008
106E                  TIXR             T                                 B850
1070                  JLT              WLOOP                             3B2FEF
1073                  RSUB                                               4F0000
                      END              FIRST’
1076         *        =X’05’                                             05



1.     Literals
Writing the value of a constant operand as part of the instruction that
uses it avoids having to define the constant elsewhere in the program.
                                     43


Such an operand is called a literal because the value is stated literally in
the instruction.
e.g. the literal in the statement below specifies a 3 byte operand whose
value is the character string EOF.
001A           ENDFIL    LDA =C‟EOF‟       032010, Similarly
       1062        WLOOP      TD     =X‟05‟ specifies a 1 byte literal with a
hexadecimal value 05.
There is a difference between a literal and immediate addressing. With
immediate addressing the operand value is assembled as part of the
machine instruction. With a literal the assembler generates the specified
value as a constant at some other memory location. The address of this
generated constant is used as the target address for the machine
instruction.
Literal operands used in a program are gathered together into one or
more literal pools at the end of the program showing the assigned
addresses and the generated data values.
In other cases literal pools can be placed at some other location in the
object code. This is enabled by using a directive LTORG. When the
assembler encounters LTORG it creates a literal pool that contains all
the literal operands used since the previous LTORG or the beginning of
the program.
The assembler handles literals in such a way that a Literal Table LITTAB
is used. For each literal used the table contains the literal name, the
operand value and the address assigned to the operand when it is
placed in the literal pool.
During pass 1 the assembler gathers all the literals and puts them in the
LITTAB. During pass 2 the data values specified by the literals in each
literal pool are inserted at the appropriate places in the object program.
                                       44


2.    Symbol Defining Statements
Most Assemblers provide the EQU (equate) directive that allows the
programmer to define symbols and specify their values. The general
form for that statement is
      Symbol      EQU          Value
It defines the given symbol (enters it into the SYMTAB) and assigns it
the specified value. The value may be a constant or an expression
involving constants and previously defined symbols.
Example:
For a statement like +LDT      #4096 we could include a statement
      MAXLEN      EQU 4096,            then in the program we write
      +LDT #MAXLEN
EQU is also used in defining mnemonic names for registers A, X, L etc.
e.g. for machines which have only standard general purpose registers
the Base and index registers may be defined as
            BASE        EQU            R1
            INDEX       EQU            R2
Another common directive that can be used to assign values to symbols
is ORG (origin). Its form is
            ORG         VALUE
where VALUE is a constant or an expression involving constants and
previously defined symbols.
When the assembler encounters this statement it resets the LOCCTR to
the specified value.


3.    Expressions
Most assemblers allow the use of expressions wherever a single
operand is permitted. Each expression must be evaluated by the
assembler to produce a single operand address or value. Individual
                                     45


terms in the expression may be constants, user-defined symbols or
special terms; e.g. the most common special term is the current value of
the location counter (designated by *). It represents the value of the next
unassigned memory location.
The statement BUFEND          EQU        * in the previous program gives
BUFEND the value that is the address of the next byte after the buffer
area.
Expressions are classified as either absolute expressions or relative
expressions depending upon the value they produce. An expression that
contains only absolute terms (independent of the program location like
constants) is an absolute expression. It may also contain relative terms
so long as the relative terms occur in pairs and the terms in each pair
have opposite signs.
A relative expression is one in which all the relative terms except one
can be paired. The remaining unpaired relative term must have a
positive sign. (Non of the relative terms may enter into a multiplication or
division operation). In the expression
        MAXLEN    EQU BUFEND – BUFFER
Both BUFEND and BUFFER are relative terms but the expression
represents an absolute value; the difference between the two addresses.
Expressions such as BUFEND + BUFFER, 100 – BUFFER OR 3*
BUFFER represent neither absolute values nor locations within the
program.


4. Program Blocks
They are segments of code that are re-arranged within a single object
unit.
In the example below three program blocks have been used. The first
unnamed block contains the executable instructions. The second block
                                            46


CDATA contains data areas that are a few words in length, the third block

CBLKS      has data areas that have larger blocks of memory.
The assembler directive         USE    indicates which portion of the program
belongs to the various blocks.
0000   0      COPY          START          0
0000   0      FIRST         STL            RETADR                   172063
0003   0      CLOOP         JSUB           RDREC                    4B2021
0006   0                    LDA            LENGTH                   032060
0009   0                    COMP           #0                       290000
000C   0                    JEQ            ENDFIL                   332006
000F   0                    JSUB           WRREC                    4B203B
0012   0                    J              CLOOP                    3F2FEE
0015   0      ENDFIL        LDA            =C’EOF’                  032055
0018   0                    STA            BUFFER                   0F2056
001B   0                    LDA            #3                       010003
001E   0                    STA            LENGTH                   0F2048
0021   0                    JSUB           WRREC                    4B2029
0024   0                    J              @RETADR                  3E203F
0000   1                    USE            CDATA
0000   1      RETADR        RESW           1
0003   1      LENGTH        RESW           1
0000   2                    USE            CBLKS
0000   2      BUFFER        RESB           4096
1000   2      BUFEND        EQU            *
1000          MAXLEN        EQU            BUFEND-BUFFER
                           Subroutine to read record into Buffer
0027   0                    USE
0027   0      RDREC         CLEAR          X                        B410
0029   0                    CLEAR          A                        B400
002B   0                    CLEAR          S                        B440
002D   0                    +LDT           #4096                    75101000
0031   0      RLOOP         TD             INPUT                    E32038
0034   0                    JEQ            RLOOP                    332FFA
0037   0                    RD             INPUT                    DB2032
003A   0                    COMPR          A,S                      A004
003C   0                    JEQ            EXIT                     332008
003F   0                    STCH           BUFFER,X                 57A02F
0042   0                    TIXR           T                        B850
0044   0                    JLT            RLOOP                    3B2FEA
0047   0      EXIT          STX            LENGTH                   13201F
004A   0                    RSUB                                    4F0000
0006   1                    USE            CDATA
0006   1      INPUT         BYTE           X’F1’                    F1
                           Subroutine to write record from Buffer
004D   0      USE
004D   0      WRREC         CLEAR          X                        B410
004F   0                    LDT            LENGTH                   772017
0052   0      WLOOP         TD             =X’05’                   E3201B
0055   0                    JEQ            WLOOP                    332FFA
0058   0                    LDCH           BUFFER,X                 53A016
005B   0                    WD             =X’05’                   DF2012
005E   0                    TIXR           T                        B850
0060   0                    JLT            WLOOP                    3B2FEF
0063   0                    RSUB                                    4F0000
0007   1                    USE            CDATA
                            LTORG
0007   1      *             =C’EOF’                                 454F46
000A   1      *             =X’05’                                  05
                            END            FIRST’


The assembler will rearrange these segments to gather together the
pieces of each block. These blocks are then assigned addresses in the
                                        47


object program with the blocks appearing in the same order in which
they were first began in the source program.
During pass 1 a separate location counter is maintained for each block.
It is initialized to 0 when the block is first began.
At the end of pass 1 the latest value of the location counter for each
block indicates the length of that block.
MAXLEN is shown without a block number because it is an absolute
value whose value is not relative to the start of any block.
At the end of pass 1 the assembler constructs a working table that
contains the starting addresses and lengths of all blocks
      Block Name         Block Number        Address    Length
      Default            0                   0000       0066
      CDATA              1                   0066       000B
      CBLKS              2                   0071       1000


For the instruction 0006        LDA          LENGTH

the value of the operand has relative address 0003 within the CDATA
block. The starting address for CDATA is 0066. The desired target
address for this instruction is therefore 0003 + 0066= 0069. The
instruction is to be assembled using program counter relative
addressing.
The address of the next instruction is 0009 within the default block. The
required displacement therefore is 0069 – 0009 = 60. Similar
calculations are performed during pass 2.
Because the large buffer area is moved to the end of the object program
there is no need to use extended format instructions. Base register is
also no longer necessary.


5.    Control Sections and Program Linking
A control Section is part of the program that maintains its identity after
assembly; each section can be loaded and relocated independently of
                                             48


each other. Different control sections are most often used for
subroutines or other logical subdivisions of a program.
0000     COPY       START           0
                    EXTDEF          BUFFER,BUFEND, LENGTH
                    EXTREF          RDREC,WRREC
0000     FIRST      STL             RETADR                         172027
0003     CLOOP      +JSUB           RDREC                          4B100000
0007                LDA             LENGTH                         032023
000A                COMP            #0                             290000
000D                JEQ             ENDFIL                         332007
0010                +JSUB           WRREC                          4B100000
0014                J               CLOOP                          3F2FEC
0017     ENDFIL     LDA             =C’EOF’                        032010
001A                STA             BUFFER                         0F2016
001D                LDA             #3                             010003
0020                STA             LENGTH                         0F200A
0023                +JSUB           WRREC                          4B100000
0027                J               @RETADR                        3E2000
002A     RETADR     RESW            1
002D     LENGTH     RESW            1
                    LTORG
0030     *          =C’EOF’                                        454F46
0033     BUFFER     RESB            4096
1033     BUFEND     EQU             *
1000     MAXLEN     EQU             BUFEND-BUFFER

0000    RDREC      CSECT
                           Subroutine to read record into Buffer
                    EXTREF          BUFFER,LENGTH,BUFEND
0000                CLEAR           X                              B410
0002                CLEAR           A                              B400
0004                CLEAR           S                              B440
0006                LDT             MAXLEN                         77201F
0009     RLOOP      TD              INPUT                          E3201B
000C                JEQ             RLOOP                          332FFA
000F                RD              INPUT                          DB2015
0012                COMPR           A,S                            A004
0014                JEQ             EXIT                           332009
0017                +STCH           BUFFER,X                       57900000
001B                TIXR            T                              B850
001D                JLT             RLOOP                          3B2FE9
0020     EXIT       +STX            LENGTH                         13100000
0024                RSUB                                           4F0000
0027     INPUT      BYTE            X’F1’                          F1
0028     MAXLEN     WORD            BUFEND-BUFFER                  000000

0000    WRREC      CSECT
                        Subroutine to write record from Buffer
                    EXTREF          LENGTH,BUFFER
0000                CLEAR           X                              B410
0002                +LDT            LENGTH                         77100000
0006      WLOOP     TD              =X’05’                         E32012
0009                JEQ             WLOOP                          332FFA
000C                +LDCH           BUFFER,X                       53900000
0010                WD              =X’05’                         DF2008
0013                TIXR            T                              B850
0015                JLT             WLOOP                          3B2FEE
0018                RSUB                                           4F0000
                    END             FIRST’
001B      *         =X’05’                                         05
When control sections form logically related parts of a program it is
necessary to provide some means for linking them together e.g.
instructions in one section might refer to instructions or data in another
section. Because control sections are independently loaded and
                                      49


relocated the assembler is unable to process these references in the
usual way because the assembler does not know where any other
control section will be loaded at execution time.
Such references between control sections are called external references.
Control sections differ from program blocks in that they are handled
separately by the assembler. (It is not even necessary for all control
sections in a program to be assembled at the same time.) Symbols
defined in one control section may not be used directly by another
control section; they must be identified as external references for the
loader to handle.
The program above has been written using three control sections, the
main program and a section for each subroutine. The directive CSECT
signals the start of a new control section.
The EXTDEF statement names symbols called external symbols that are
defined in this control section but may also be used by other sections.
EXTREF names symbols used in this section but are defined elsewhere.
0003 CLOOP          +JSUB     RDREC        4B100000
The operand RDREC is an EXTREF. The assembler has no idea where
the section that has RDREC will be loaded; it is difficult to assemble the
address for this instruction. The assembler inserts an address of zeros
and passes information to the loader which will cause the proper
address to be inserted at load time. Relative addressing is not possible
so an extended format must be used to provide room for the actual
address to be inserted. This is true for all instructions involving external
references.
The instruction     0017      +STCH        BUFFER,X   57900000    is   also
assembled using extended format but the x bit is set to 1 to indicate
indexed addressing.
                                             50


The instruction 0028 MAXLEN             WORD        BUFEND-BUFFER 000000                  both
BUFEND and BUFFER are external references which are stored as
zeros.
For 1000 MAXLEN EQU BUFEND-BUFFER BUFEND and BUFFER are
defined in the same control section, the value of the expression can
therefore be calculated immediately by the assembler.
Since the assembler leaves room for inserting values for external
symbols it must also include information in the object program that will
cause the loader to insert the proper values where they are required.
Two new record types are defined, they are DEFINE and REFER.
A DEFINE record gives information about EXTDEF and a REFER record
lists the EXTREFs.
The Define record:
Col 1       D
Col 2-7     Name of external symbol defined in this control section
Col 8-13    Relative address of symbol within this section
Col 14-73   Repeat information in col 2-13 for other external symbols.


The Refer record:
Col 1       R
Col 2-7     Name of external symbol defined in this control section
Col 8-73    Names of other external reference symbols.


The other needed information is added to the modification record whose
format is revised as follows:
Col 1       M
Col 2-7     Starting address of the field to be modified, relative to the beginning of the
            control section
Col 8-9     Length of the field to be modified in half bytes.
Col 10      Modification flag (+ or -)
Col 11-16   External symbol whose value is to be added to or subtracted from the indicated field.


The figure below shows the object program corresponding to the source
code in the previous program. Note that there is a separate set of object
program records from Header through End for each section.
                                           51


The modification record         M^000004^05^+RDREC   implies that the address of
RDREC is to be added onto this field in order to produce the correct
machine instruction for execution.
For the instruction at address 0028 both BUFEND and BUFFER are in a
different control section. The assembler generates an initial value of zero
for this word. The last two modification records in RDREC direct that the
address of BUFEND be added to this field and the address of BUFFER
be subtracted from it.
If an expression is to be used, all terms in an expression must be relative
within the same section because if the terms are in different sections
their difference has a value that is unpredictable.
H^COPY ^000000^001033
D^BUFFER^000033^BUFEND^001033^LENGTH^0002D
R^RDREC ^WRREC
T^000000^1D^172027^4B100000^032023^290000^332007^4B100000^3F2FEC^032016^0F2016
T^00001D^0D^010003^0F200A^4B100000^3E2000
T^000030^03^454F46
M^000004^05^+RDREC
M^000011^05^+WRREC
M^000024^05^+WRREC
E^000000

H^RDREC ^000000^00002B
R^BUFFER^LENGTH^BUFEND
T^000000^1D^B410^B400^B440^77201F^E3201B^332FFA^DB2015^A004^332009^57900000^B850
T^00001D^0E^3B2FE9^13100000^4F0000^F1^000000
M^000018^05^+BUFFER
M^000021^05^+LENGTH
M^000028^06^+BUFEND
M^000028^06^-BUFFER
E

H^WRREC ^000000^00001C
R^LENGTH^BUFFER
T^000000^1C^B410^77100000^E32012^332FFA^53900000^DF2008^B850^3B2FEE^4F0000^05
M^000003^05^+LENGTH
M^00000D^05^+BUFFER
E



ASSEMBLER DESIGN OPTIONS
Two Pass Assembler with Overlay Structure
Most assemblers divide the processing of the source program into 2
passes. The internal tables and subroutines that are used only during
                                        52


pass 1 are not needed after the pass is completed; others like the
SYMTAB are needed for both passes.
Since pass 1 and pass 2 are not needed at the same time, they can
occupy the same locations in memory during execution of the
assembler.                     Driver

                               Shared Tables and
                               Routines




              Pass 1 Tables                         Pass 2 tables and
             and Routines                           Routines

Three program segments exist. The root segment contains a simple
driver program whose function is to call in turn the other two segments. It
also contains the tables and routines needed by both passes.
Since Pass 1 and Pass 2 segments are never needed at the same time
they can occupy the same locations in memory during execution. Initially
the root and pass 1 segments are loaded into memory and the
assembler makes the first pass. At the end of the first pass, the pass 2
segment is loaded in memory replacing the pass 1 segment.
In this way the assembler needs less memory hence reducing its
memory requirements.


One Pass assemblers
The main problem in trying to assemble a program in one pass is
forward references because instruction operands are often symbols that
are not yet defined.
There are two types of one pass assemblers; one type produces object
code directly in memory for immediate execution while the other
produces the usual kind of object program for later execution.
                                                  53


The program below illustrates both types. It is similar to the one on page
41 but has data item definitions placed ahead of the code that
references them. In the first type of assembler no object program is
written out and no loader is needed (load and go). It avoids the overhead
of writing the object program out and reading it back in.

       Object Code in Memory and symbol table entries for the program below after the
                               instruction at address 2021
Memory
Address                 Contents                            Symbol    Value

1000    454F4600      00030000     00xxxxxx   xxxxxxxx     LENGTH    100C
1010    xxxxxxxx      xxxxxxxx     xxxxxxxx   xxxxxxxx     RDREC     * .        2013    ø
                                                           THREE     1003
                                                           ZERO      1006
2000    xxxxxxxx      xxxxxxxx    xxxxxxxx    xxxxxx14
2010    100948--      --00100C    28100630    ------48--   WRREC     * .        201F    ø
2020    ---3C2012                                          EOF       1000
                                                           ENDFIL     * .       201C    ø
                                                           RETADR 1009
                                                           BUFFER 100F
                                                           CLOOP  2012
                                                           FIRST  200F


       Object Code in Memory and symbol table entries for the program below after the
                               instruction at address 2052
Memory
Address                 Contents                            Symbol    Value

1000    454F4600      00030000     00xxxxxx   xxxxxxxx     LENGTH 100C
1010    xxxxxxxx      xxxxxxxx     xxxxxxxx   xxxxxxxx     RDREC  203D
                                                           THREE  1003
                                                           ZERO   1006
2000    xxxxxxxx       xxxxxxxx    xxxxxxxx   xxxxxx14     WRREC     * .        201F    ø
2010    10094820      3D00100C     28100630   202448--     EOF       1000
2020    ---3C2012     0010000C     100F0010   030C100C     ENDFIL    2024
2030    48-------08   10094C00     00F10010   00041006     RETADR    1009
2040    001006E0      20393020     43D82039   28100630     BUFFER    100F
2050    -------5490   0F                                   CLOOP     2012
                                                           FIRST     200F
                                                           MAXLEN    203A
                                                           INPUT      2039
                                                           EXIT      * .           2050     ø
                                                           RLOOP     2043


The assembler generates object code instructions as it scans the source
program. If the operand is a symbol that has not yet been defined, the
operand address is omitted when the instruction is assembled. The
                                             54


symbol used as an operand is entered in the symbol table. This entry is
flagged as undefined. When the definition for the symbol is encountered
the symbol table is scanned and the proper address is inserted into any
instructions previously generated.
Any SYMTAB entries that are still marked with * at the end of the
program indicate undefined symbols and they should be flagged by the
assembler as errors.
1000      COPY     START             1000
1000      EOF      BYTE              C’EOF’          454F46
1003      THREE    WORD              3               000003
1006      ZERO     WORD              0               000000
1009      RETADR   RESW              1
100C      LENGTH   RESW              1
100F      BUFFER   RESB              4096

200F      FIRST    STL               RETADR          141009
2012      CLOOP    JSUB              RDREC           48203D
2015               LDA               LENGTH          00100C
2018               COMP              ZERO            281006
201B               JEQ               ENDFIL          302024
201E               JSUB              WRREC           482062
2021               J                 CLOOP           3C2012
2024      ENDFIL   LDA               EOF             001000
2027               STA               BUFFER          0C100F
202A               LDA               THREE           001003
202D               STA               LENGTH          0C100C
2030               JSUB              WRREC           482062
2033               LDL               RETADR          0810009
2036               RSUB                              4C0000
                          Subroutine to read record into Buffer
2039      INPUT    BYTE             X’F1’            F1
203A      MAXLEN   WORD             4096             001000

203D      RDREC    LDX              ZERO             041006
2040               LDA              ZERO             001006
2043      RLOOP    TD               INPUT            E02039
2046               JEQ              RLOOP            302043
2049               RD               INPUT            D82039
204C               COMP             ZERO             281006
204F               JEQ              EXIT             30205B
2052               STCH             BUFFER,X         54900F
2055               TIX              MAXLEN           2C203A
2058               JLT              RLOOP            382043
205B      EXIT     STX              LENGTH           10100C
205E               RSUB                              4C0000
                       Subroutine to write record from Buffer
2061      OUTPUT   BYTE              X’05’           05

2062      WRREC    LDX               ZERO            041006
2065      WLOOP    TD                OUTPUT          E02061
2068               JEQ               WLOOP           302065
206B               LDCH              BUFFER,X        50900F
206E               WD                OUTPUT          DC2061
2071               TIX               LENGTH          2C100C
2074               JLT               WLOOP           382065
2077               RSUB                              4C0000
                   END               FIRST
                                           55


The second type of a one pass assembler that produces object code as
output is needed on systems where external working storage devices
are not available.
Forward references are entered into lists as before but when the
definition of the symbol is encountered another text record with the
correct operand address is generated. When the program is loaded, this
address will be inserted into the instruction by the action of the loader.
The second text record contains the object code generated from 200F
through 2021. The operand addresses for the instructions at addresses
2012, 201B and 201E have been generated at addresses 0000. When
the definition of ENDFIL at address 2024 is encountered the assembler
generates a third text record. This record specifies that the value 2024 is
to be loaded at location 201C. When this program is loaded the value
2024 will replace 0000 which was previously loaded.

H^COPY ^00100^00107A
T^001000^09^454F46^000003^000000
T^00200F^15^141009^480000^00100C^281006^300000^480000^3C2012
T^00201C^02^2024
T^002024^19^001000^0C100F^001003^0C100C^480000^081009^4C0000^F1^001000
T^002013^02^203D
T^00203D^1E^041006^001006^E02039^302043^D82039^281006^300000^54900F^2C203A^382043
T^002050^02^205B
T^00205B^07^10100C^4C0000^05
T^00201F^02^2062
T^002031^02^2062
T^002062^18^041006^E02061^302065^50900F^DC2061^2C100C^382065^4C0000
E^00200F
                                     56


                      LOADERS AND LINKERS
An object program contains translated instructions and data values from
the source program and specifies addresses in memory where these
items are to be loaded.
Loading: brings the object program into memory for execution.
Relocation: modifies the object program so that it can be loaded at an
address different from the location originally specified.
Linking: Combines two or more separate object programs and supplies
the information needed to allow references between them.


A loader is a system program that performs the loading function. Many
loaders also support relocation and linking. Some machines have a
linker to perform the linking operation and a separate loader to handle
relocation and loading but in most cases one system loader or linker can
be used regardless of the original source programming language.


Basic Loader Functions.
This is to bring an object program into memory and to start executing it.
For loading a simple absolute object module like the one for the simple
SIC machine on page 41 there is no linking and program relocation. All
functions are accomplished in one pass.
The header record is checked to verify that the correct program has
been presented for loading and that it will fit into the available memory.
As each text record is read the object code it contains is moved to the
indicated address in memory. When the END record is encountered the
loader jumps to the specified address to begin execution of the object
program.
                                          57


Machine Dependent Loader Features
The absolute loader is simple but it has some disadvantages e.g. we do
not know in advance where a program will be loaded. There is need to
write relocatable programs. Similarly we can‟t use subroutine libraries
efficiently since they can‟t be used efficiently if they are pre assigned
absolute addresses.


1.      Relocation
Loaders that allow program relocation are called relocating or relative
loaders. There are two methods for specifying relocation as part of the
object program.
The first method uses modification records which describe each part of
the object code that must be changed when the program is relocated.
Using the program on page 44 the only portions that must be relocated
are those that contain actual addresses at addresses 0006, 0013 and
0026.

H^COPY ^000000^001077
T^000000^1D^17202D^69202D^4B101036^032026^290000^332007^4B10105D^3F2FEC^032010
T^00001D^13^0F2016^010003^0F200D^4B10105D^3E2003^454F46
T^001036^1D^B410^B400^B440^75101000^E32019^332FFA^DB2013^A004^332008^57C003^B850
T^001053^1D^3B2FEA^134000^4F0000^F1^B410^774000^E32011^332FFA^53C003^DF2008^B850
T^001070^07^3B2FEF^4F0000^05
M^000007^05+COPY
M^000014^05+COPY
M^000027^05+COPY
E^000000

There is one modification record for each value that must be changed
during relocation. Each modification record specifies the starting address
and length of the field whose value is to be altered. It then specifies the
modification to be performed. Here all modifications add the value of the
symbol COPY which represents the starting address of the program.
This method is not suitable for a program which uses absolute
addressing and may require all records to be modified.
                                             58


A second method is to use a relocation bit associated with each word of
the object code in case of a machine that uses primarily direct
addressing and it has a fixed instruction format like the SIC machine.
The figure below illustrates this method. There are no modification
records. There is a relocation bit associated with each word of object code.
Since all SIC instructions occupy one word, this means that there is one
relocation bit for each possible instruction. The relocation bits are
gathered together into a bit mask following the length indicator in each text
record.
0000      COPY      START             0
0000      FIRST     STL               RETADR       140033
0003      CLOOP     JSUB              RDREC        481039
0006                LDA               LENGTH       000036
0009                COMP              ZERO         280030
000C                JEQ               ENDFIL       300015
000F                JSUB              WRREC        481061
0012                J                 CLOOP        3C0003
0015      ENDFIL    LDA               EOF          00002A
0018                STA               BUFFER       0C0039
001B                LDA               THREE        00002D
001E                STA               LENGTH       0C0036
0021                JSUB              WRREC        481061
0024                LDL               RETADR       080033
0027                RSUB                           4C0000
002A      EOF       BYTE              C’EOF’       454F46
002D      THREE     WORD              3            000003
0030      ZERO      WORD              0            000000
0033      RETADR    RESW              1
0036      LENGTH    RESW              1
0039      BUFFER    RESB              4096
                           Subroutine to read record into Buffer
1039      RDREC      LDX             ZERO          040030
103C                 LDA             ZERO          000030
103F      RLOOP      TD              INPUT         E0105D
1042                 JEQ             RLOOP         30103F
1045                 RD              INPUT         D8105D
1048                 COMP            ZERO          280030
104B                 JEQ             EXIT          301057
104E                 STCH            BUFFER,X      548039
1051                 TIX             MAXLEN        2C105E
1054                 JLT             RLOOP         38103F
1057      EXIT       STX             LENGTH        100036
105A                 RSUB                          4C0000
105D      INPUT      BYTE            X’F1’         F1
105E      MAXLEN     WORD            4096          001000
                        Subroutine to write record from Buffer
1061      WRREC     LDX             ZERO          040030
1064      WLOOP     TD              OUTPUT        E01079
1067                JEQ             WLOOP         301064
106A                LDCH            BUFFER,X      508039
106D                WD              OUTPUT        DC1079
1070                TIX             LENGTH        2C0036
1073                JLT             WLOOP         381064
1076                RSUB                          4C0000
1079      OUTPUT    BYTE            X’05’         05
                    END             FIRST
                                          59


In the object code for the program above this mask is represented as
three hexadecimal digits. They are underlined for easier identification. If
the relocation bit corresponding to a word of object code is set to 1 the
program‟s starting address is to be added to this word when the program
is to be relocated. A bit value of 0 indicates that no modification is
necessary. If a text record contains fewer than 12 words of object code,
the bits corresponding to the unused words are set to 0.
HCOPY ^000000^00107A
T^000000^1E^FFC^140033^481039^000036^280030^300015^481061^3C0003^00002A^0C0039^00002D
T^00001E^15^E00^0C0036^481061^080033^4C0000^454F46^000003^000000
T^001039^1E^FFC^040030^000003^ E0105D^30103F^D8105D^280030^301057^548039^2C105E^38103F
T^001057^0A^800^100036^4C0000^F1^001000
T^001061^19^FE0^040030^E01079^301064^508039^DC1079^2C0036^381064^4C0000^05
E^000000




Program Linking
Concepts of program linking were discussed under control sections.
The example below consists of three differently assembled programs
each having a list of items LISTA, LISTB and LISTC. Their ends are
marked by ENDA, ENDB and ENDC. The labels on the beginnings and
ends of the lists are external symbols. Each program has the same set
of references to these external symbols.
In PROGA, REF1 is a reference to a label within the program which is
assembled by program counter relative. No modification is necessary.
In PROGB, REF1 Refers to an external symbol. The assembler uses an
extended format instruction with the address field set to 000000. There is
a modification record in the object program instructing the loader to add
the value of LISTA to this address field when the program is linked.
REF2 and REF3 are explained similarly.
In PROGA the assembler can evaluate all of the expression in REF4
except for the value of LISTC. This results in an initial value of 000014
and one modification record.
                                    60

0000   PROGA   START    0
               EXTDEF   LISTA,ENDA
               EXTREF   LISTB, ENDB,LISTC,ENDC
               .

0020   REF1    LDA      LISTA                     03201D
0023   REF2    +LDT     LISTB + 4                 77100004
0027   REF3    LDX      #ENDA - LISTA             050014


0040   LISTA   EQU      *


0054   ENDA    EQU      *
0054   REF4    WORD     ENDA-LISTA+LISTC          000014
0057   REF5    WORD     ENDC-LISTC-10             FFFFF6
005A   REF6    WORD     ENDC-LISTC+LISTA-1        00003F
005D   REF7    WORD     ENDA-LISTA-(ENDB-LISTB)   000014
0060   REF8    WORD     LISTB-LISTA               FFFFC0
               END      REF1

0000   PROGB   START    0
               EXTDEF   LISTB,ENDB
               EXTREF   LISTA, ENDA,LISTC,ENDC
               .

0036   REF1    +LDA     LISTA                     03100000
003A   REF2    LDT      LISTB + 4                 772027
003D   REF3    +LDX     #ENDA - LISTA             05100000


0060   LISTB   EQU      *


0070   ENDB    EQU      *
0070   REF4    WORD     ENDA-LISTA+LISTC          000000
0073   REF5    WORD     ENDC-LISTC-10             FFFFF6
0076   REF6    WORD     ENDC-LISTC+LISTA-1        FFFFFF
0079   REF7    WORD     ENDA-LISTA-(ENDB-LISTB)   FFFFF0
007C   REF8    WORD     LISTB-LISTA               000060
               END

0000   PROGC   START    0
               EXTDEF   LISTC,ENDC
               EXTREF   LISTA, ENDA,LISTB,ENDB
               .

0018   REF1    +LDA     LISTA                     03100000
001C   REF2    +LDT     LISTB + 4                 77100004
0020   REF3    +LDX     #ENDA - LISTA             05100000


0030   LISTC   EQU      *


0042   ENDC    EQU      *
0042   REF4    WORD     ENDA-LISTA+LISTC          000030
0045   REF5    WORD     ENDC-LISTC-10             000008
0048   REF6    WORD     ENDC-LISTC+LISTA-1        000011
004B   REF7    WORD     ENDA-LISTA-(ENDB-LISTB)   000000
004E   REF8    WORD     LISTB-LISTA               000000
               END
                                 61


H^PROGA ^000000^000063
D^LISTA ^000040^ENDA ^000054
R^LISTB ^ENDB ^LISTC ^ENDC
T^000020^0A^03201D^77100004^050014
T^000054^0F^000014^FFFFF6^00003F^000014^FFFFC0
M^000024^05^+LISTB
M^000054^06^+LISTC
M^000057^06^+ENDC
M^000057^06^-LISTC
M^00005A^06^+ENDC
M^00005A^06^-LISTC
M^00005A^06^+PROGA
M^00005D^06^-ENDB
M^00005D^06^+LISTB
M^000060^06^+LISTB
M^000060^06^-PROGA
E^000020


H^PROGB ^000000^00007F
D^LISTB ^000060^ENDB ^000070
R^LISTA ^ENDA ^LISTC ^ENDC
T^000036^0B^03100000^772027^05100000
T^000070^0F^000000^FFFFF6^FFFFFF^FFFFF0^000060
M^000037^05^+LISTA
M^00003E^05^+ENDA
M^00003E^05^-LISTA
M^000070^06^+ENDA
M^000070^06^-LISTA
M^000070^06^+LISTC
M^000073^06^+ENDC
M^000073^06^-LISTC
M^000076^06^+ENDC
M^000076^06^-LISTC
M^000076^06^+LISTA
M^000079^06^+ENDA
M^000079^06^-LISTA
M^00007C^06^+PROGB
M^00007C^06^-LISTA
E

H^PROGC ^000000^000051
D^LISTC ^000030^ENDC ^000042
R^LISTA ^ENDA ^LISTB ^ENDB
T^000018^0C^03100000^77100004^05100000
T^000042^0F^000030^000008^000011^000000^000000
M^000019^05^+LISTA
M^00001D^05^+LISTB
M^000021^05^+ENDA
M^000021^05^-LISTA
                                      62


M^000042^06^+ENDA
M^000042^06^-LISTA
M^000042^06^+PROGC
M^000048^06^+LISTA
M^00004B^06^+ENDA
M^00004B^06^-LISTA
M^00004B^06^-ENDB
M^00004B^06^+LISTB
M^00004E^06^+LISTB
M^00004E^06^-LISTA
E
The same expression in PROGB contains no terms that can be
evaluated by the assembler. The object code therefore contains an initial
value of 000000 and three modification records.
In PROGC the assembler can supply the value of LISTC relative to the
beginning of the program which is not known until the program is loaded.
The initial value for this data word contains the relative address of LISTC
= 000030. The modification records instruct the loader to add the
beginning address of the program (PROGC), to add the value of ENDA
and to subtract the value of LISTA.
Assume that PROGA has been loaded starting at address 4000 followed
immediately by PROGB and PROGC. REF4 through REF8 result into
the same value after relocation and linking for all the three programs.
E.g. REF4 in PROGA is located at address 4054. The initial value was
000014. To this is added the address assigned to LISTC which is 4112
(40E2 + 30). This results in the value 004126. This value will be the
same at address 70 (40D3) in PROGB and at address 0042 in PROGC.


Tables and Logic for a Linking Loader
Modification records are used for relocation so that the linking and
relocating functions are performed using the same mechanism.
The input to a linking loader consists of a set of object programs, (i.e. the
control sections) that are to be linked together. Since it is possible for a
control section to make an external reference to a symbol whose
                                     63


definition does not appear until later in the input, the required linking
cannot be performed until an address is assigned to the external symbol
involved.
A linking loader therefore makes two passes just like the assembler.
Pass 1 assigns addresses to all external symbols and pass 2 performs
the actual loading, relocation and linking.
The main data structure used is the External symbol table ESTAB which
is similar to the SYMTAB. It is used to store names and addresses of
each external symbol in the set of control sections being loaded. It also
indicates in which control section the symbol is defined.
The beginning address in memory where the linked program is loaded is
called PROGADDR. Its value is supplied to the loader by the operating
system.
CSADDR is the starting address assigned to the control section currently
being scanned by the loader. It is added to all relative addresses within
the control section to convert them to actual addresses.
During pass 1 the loader is concerned with only the Header and Define
record types. The beginning load address for the linked program
PROGADDR becomes the starting address CSADDR for the first control
section.
The control section name for the Header record is entered into ESTAB
with a value given by CSADDR. All the external symbols appearing in
the Define record are also entered into ESTAB. Their addresses are
obtained by adding the value specified in the Define record to CSADDR.
When the End record is read, the control section length CSLTH which
was saved by the header record is added to CSADDR. This calculation
gives the starting address of the next control section.
At the end of pass 1 ESTAB contains all external symbols defined in the
control sections together with the address assigned to each.
                                     64


             Control      Symbol      Address     Length
             Section      Name
             PROG A                   4000        0063
                          LISTA       4040
                          ENDA        4054
             PROGB                    4063        007F
                          LISTB       40C3
                          ENDB        40D3
             PROGC                    40E2        0051
                          LISTC       4112
                          ENDC        4124


A printout of ESTAB is called a load map.
Pass 2 performs the actual loading, relocation and linking of the
program. As each text record is read the object code is moved to the
specified address (plus the current value of CSADDR). When the
modification record is encountered the symbol whose value is to be used
for modification is looked up in ESTAB. This value is then added to or
subtracted from the indicated location in memory.
The last step performed by the loader is the transferring of control to the
loaded program to begin execution.


MACHINE INDEPENDENT LOADER FEATURES
1.   Automatic Library Search
This feature allows the programmer to use standard subroutines without
explicitly including them in the program to be loaded. The routines are
automatically retrieved from a library, linked with the main program and
they are loaded. The programmer just mentions the subroutine names
as external references in the source program.
Linking loaders that support automatic library search keep track of
external symbols that are referred to but not defined in the input to the
loader. This is done by entering symbols for each refer record into
ESTAB. At the end of pass 1 symbols in ESTAB that remain undefined
represent unresolved external references. The loader searches the
                                         65


libraries specified for routines that contain definitions of these symbols
and processes the subroutines found by this search.
Subroutines fetched from a library in this way may themselves contain
external references. The library search process must therefore be
repeated until all external references are resolved.
The libraries to be searched by the loader usually contain assembled or
compiled versions of the subroutines


2.      Loader Options
Many loaders allow the user to specify options that modify the standard
processing of the program. Below are a few of them:
(i)     An option that allows the selection of alternative sources of input
        e.g. INCLUDE program_name (library_name) directs the loader to read
        the designated object program from a library and treat it as if it
        were part of the primary loader input.
(ii)    An option to allow the user to delete external symbols or entire
        control sections e.g. DELETE          csect_name   instructs the loader to
        delete the named control section from the set of programs being
        loaded.
(iii)   An option to change external symbols e.g. CHANGE name1, name2
        causes the external symbol name1 to be changed to name2
        wherever it appears in the object programs.
(iv)    An option to specify alternative libraries to be searched e.g.
        LIBRARY    MYLIB will cause library MYLIB to be searched before

        the standard system libraries.
(v)     Loaders that perform automatic library search may be asked to
        exclude some functions that come with the library search e.g.
        NOCALL STDEV, PLOT, CORREL will instruct the loader to exclude
                                      66


       the said functions. This avoids the overhead of loading and linking
       the unneeded routines and saves on memory space.
(vi)   Options to control outputs from the loader. The user can specify
       whether an output is needed or not.
(vii) Options to specify whether external references will be resolved by
       library search.
(viii) An option to specify the location at which execution is to begin thus
       overriding any information given in the object program.
(ix)   An option to control whether or not the loader should attempt to
       execute the program if errors are detected during the load.


3.     Overlay Programs
They are programs that are designed to execute in such a way that if
both or all of them are not needed in memory at the same time, one can
execute first and the other will execute in the same memory space after
the first one has been executed.

                                  A




                         B        C             D/E




            F/G               H            I     J          K

Control            Length             Control         Length
Section            (bytes)            Section         (bytes)
A                  1000               G               400
B                  1800               H               800
C                  4000               I               1000
D                  2800               J               800
E                  800                K               2000
F                  1000
In the example above the letters represent control sections and the lines
show control between the control sections. Control section A (the root)
                                    67


can call B, C, or D/E etc. D/E means that control sections D and E are
closely related and they are always used together. The nodes in the tree
are called segments. The root segment (A) is loaded when execution of
the program begins and it remains in memory until the program ends.
The other segments are loaded as they are called.
If H is being executed both B and A should be in memory since H was
called by B, and B was called by A. Thus the three sections A, B and H
must be active. The other segments cannot be active since there is no
path from them to H. If for example the segment containing K was called
previously it must have returned to D/E and then to A before B could be
called by A.
Because segments at the same level e.g. B, C and D/E can be called
only from the level above, they cannot be required at the same time;
thus they can be assigned to the same locations in memory. If a
segment is loaded it overlays any segments at the same level and their
subordinate segments that may be in memory.             The entire program
therefore can be executed in a smaller total amount of memory. This is
the main reason for the use of overlay structures.
The structure of an overlay program is defined to the loader using the
following commands:
     SEGMENT           seg_name(control-section….) and
     PARENT            seg_name

SEGMENT seg_name(control-section….) defines a segment (i.e a
node in the tree structure), gives it a name and lists the control sections
to be included in it. The first segment defined is the root. Two
consecutive SEGMENT statements specify a parent child relationship
between the segments defined.
PARENT           seg_name identifies the (already existing) segment
that is to be the parent of the next segment defined.
                                      68


The statements below define the above overlay structure.
            SEGMENT       SEG1(A)
            SEGMENT       SEG2(B)
            SEGMENT       SEG3(F,G)
            PARENT        SEG2
            SEGMENT       SEG4(H)
            PARENT        SEG1
            SEGMENT       SEG5(C)
            PARENT        SEG1
            SEGMENT       SEG6(D/E)
            SEGMENT       SEG7(I)
            PARENT        SEG6
            SEGMENT       SEG8(J)
            PARENT        SEG6
            SEGMENT       SEG9(K)

Once the overlay structure has been defined the starting addresses for
the segments can be found because each segment begins immediately
after the end of its parent.
The figure below shows the length and the relative starting address of
each segment in our example. It assumes that the beginning load
address for the program is 8000.
                Segment    Starting        Address
                           Relative        Actual    Length
                1          0000            8000      1000
                2          1000            9000      1800
                3          2800            A800      1400
                4          2800            A800      800
                5          1000            9000      4000
                6          1000            9000      3000
                7          4000            C000      1000
                8          4000            C000      800
                9          4000            C000      2000
During the execution of the program many different segments may be in
memory together. Below are some possibilities.
The loader can assign an actual starting address to every segment in the
overlay program once the initial load address is supplied. Thus the
                                    69


addresses of all external symbols are known and all relocation and
linking operations can be performed.




                8000
                       A             A               A
                9000
                                     B               D
               A000

               B000
                                     H
               C000
                                                     E
               D000

                E000


The root segment can be loaded directly into memory; the other
segments with their linking information are loaded into a special working
file called SEGFILE that is created by the loader.
The actual loading of the segments during program execution is handled
by an overlay manager, OVLMGR. This is a special control section which is
automatically included in the root segment of the overlay program by the
loader. OVLMGR uses a segment table SEGTAB which has all the
information about the overlay program. SEGTAB also includes a special
transfer area for each segment except the root. If a segment is currently

loaded in memory the transfer area contains a jump instruction to the
entry point of that segment. If the segment is not currently loaded the
transfer area contains instructions that invoke OVLMGR and pass to it
information concerning the segment to be loaded.


LOADER DESIGN OPTIONS
1.   Linkage Editors
A linking loader performs all linking and relocation operations including
automatic library search if specified and loads the linked program directly
                                     70


into memory for execution. A linkage editor on the other hand produces a
linked version of the program (called a load module or an executable image)
which is written to a file or library for later execution. When the user is
ready to run the linked program a simple relocating loader can be used
to load the program in memory. The only object code modification
required is the addition of an actual load address to relative values within
the program.
If a program is to be executed many times without being reassembled
the use of linkage editors substantially reduces the overhead required.
Resolution of external references and library searching are only done
once (when the program is link edited). In contrast a linking loader
searches libraries and resolves external references every time the
program is executed.




2.    Dynamic Linking
The linking function is performed at execution time. A subroutine is
loaded and linked to the rest of the program when it is first called. It
provides for the ability to load the routines only when (and if) they are
needed.
                                    71


                             COMPILERS
A compiler bridges the semantic gap between a Programming Language
domain and an execution domain. Two aspects of the compiler are:
1.    To generate code to implement meaning of a source program in
      the execution domain and
2.    To provide diagnostics for violations of the programming language
      semantics in the source program.
For purposes of compiler construction a high level language is usually
described in terms of a grammar. The grammar specifies the form or
syntax of legal statements in the language.
For example an assignment statement might be defined by the grammar
as a variable name, followed by an assignment operator (:=) followed by
an expression. The problem of compilation becomes the matching of the
statements written by the programmer to structures defined by the
grammar, and generating the appropriate object code for each
statement.
The source program statements are regarded as tokens. Tokens are the
fundamental building blocks of the language. It might be a keyword, a
variable name, an integer, an arithmetic operator etc. The task of
scanning the source statement, recognizing and classifying the various
tokens is known as lexical analysis. The part of the compiler that
performs this analytical function is called the scanner.
After the token scan, each statement in the program must be recognized
as some language construct, such as a declaration, or an assignment
statement, described by the grammar. This process which is called
syntactic analysis or parsing is performed by part of the compiler that
is called the parser. The last step is the basic translation process in the
generation of object code.
                                      72


GRAMMARS
A grammar for a programming language is a formal description of the
syntax or form of programs and individual statements written in the
language. The grammar does not describe the semantics or meaning of
the various statements.
A number of different notations are used to write grammars. The
simplest and widely used notation is the BNF (Backus–Naur Form).
A BNF grammar consists of a set of rules each of which defines the
syntax of some construct in the programming language. Below is a BNF
grammar of a restricted Pascal Language.
1. <prog>      ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END
2. <prog-name>::= id
3. <dec-list> ::= <dec> | <dec-list> ; <dec>
4. <dec>       ::= <id-list> : <type>
5. <type>      ::= INTEGER
6. <id-list> ::= id | <id-list> , id
7. <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>
8. <stmt>      ::= <assign> | <read> | <write> | for
9. <assign> ::= id := <exp>
10. <exp>      ::= <term> | <exp> + <term> | <exp> - <term>
11. <term> ::= <factor> | <term * <factor> | <term> DIV <factor>
12. <factor> ::= id | int | ( <exp>)
13. <read> ::= READ ( <id-list> )
14. <write> ::= WRITE ( <id-list> )
15. <for>      ::= FOR <index-exp> DO <body>
16. <index-exp> ::= id := <exp> TO <exp>
17. <body> ::= <stmt> | BEGIN <stmt-list> END

Here below is an example of a Pascal program that relates to the above
grammar.
Consider rule 13 in the grammar, <read> ::= READ ( <id-list>)
The symbol ::= means “is defined to be.”
Character strings enclosed between the angle brackets <          and    > are
called non terminal symbols (i.e. names of the constructs defined in
the grammar). Entries not enclosed in angle brackets are terminal
symbols of the grammar (i.e. tokens).
                                         73


1. PROGRAM STATS
2. VAR
3.    SUM, SUMSQ, I, VALUE,MEAN, VARIANCE : INTEGER
4. BEGIN
5.     SUM := 0;
6.     SUMSQ := 0;
7.     FOR I := 1 TO 100 DO
8.          BEGIN
9.                READ (VALUE)
10.                SUM := SUM + VALUE;
11.                SUMQ := SUMQ + VALUE * VALUE
12.           END;
13.     MEAN : = SUM DIV 100;
14.     VARIANCE := SUMQ DIV 100 - MEAN * MEAN;
15.     WRITE (MEAN, VARIANCE)
16. END.


In this rule the non terminal symbols are <read> and <id-list>, and the
terminal symbols are the tokens READ, (, and ). Thus the rule specifies
that a <read> consists of the token READ, followed by the token “(“ ,
followed by a language construct <id-list>, followed by the token “)”.
To recognize a <read> of course we need the definition of <id-list> which
is provided for in rule 6.
It is often convenient to display the analysis of a source statement in
terms of a grammar as a tree called the parse tree or syntax tree.
Below are parse trees for statement number 9, READ (VALUE) and
statement 14 VARIANCE := SUMQ DIV 100 –MEAN * MEAN.
                             <read>



                             <id-list>


     READ        (               id           )
                               {value}
                                            74


                                <assign>



                                                   <exp>



                              <exp>

                            <term>                               <term>


                   <term>                                 <term>

                   <factor>            <factor>           <factor>        <factor>

    id       :=      id        DIV       int          _     id       *      id
  {variance}       {sumq}               {100}             {mean}          {mean}



Lexical Analysis
This involves scanning the program to be complied and recognizing the
tokens that make up the source statements. Scanners are usually
designed to recognize keywords, operators, identifiers, integers, floating
point numbers, character strings and other similar items that are written
as part of the source program.
Items such as identifiers and integers are usually recognized as either
single tokens or they could be defined as part of the grammar e.g.
<ident>      ::=     <letter> | <ident> <letter> | <ident> <digit>
<letter>     ::=     A | B | C | D |………|Z
<digit>      ::=     0 | 1 | 2 | 3 | ……..|9

In such a case the scanner would recognize as tokens the single
characters A, B, 0, 1 etc. Similarly the scanner recognizes both single
character and multiple character tokens directly.
The output of a scanner consists of a sequence of tokens. Each token is
usually represented by some fixed length code such as an integer.
Below is the token coding scheme for the grammar considered:
                                             75


       Token     code          Token Code         Token code
       PROGRAM   1             WRITE   9          -        17
       VAR       2             TO      10         *        18
       BEGIN     3             DO      11         DIV      19
       END       4             ;       12         (        20
       END.      5             :       13         )        21
       INTEGER   6             ,       14         id       22
       FOR       7             :=      15         int      23
       READ      8             +       16



In such a coding scheme the token PROGRAM would be represented by
the integer value 1, an identifier id would be represented by 22.
Line      Token Type    Token           Line      Token Type    Token Specifier
                        Specifier
1         1                             10        22            SUM
          22            STATS                     15
2         2                                       22            SUM
3         22            SUM                       16
          14                                      22            VALUE
          22            SUMQ                      12
          14                            11        22            SUMQ
          22            I                         15
          14                                      22            SUMQ
          22            VALUE                     16
          14                                      22            VALUE
          22            MEAN                      18
          14                                      22            VALUE
          22            VARIANCE        12        4
          13                                      12
          6                             13        22            MEAN
4         3                                       15
5         22            SUM                       22            SUM
          15                                      19
          23            #0                        23            #100
          12                                      12
6         22            SUMQ            14        22            VARIANCE
          15                                      15
          23            #0                        22            SUMQ
          12                                      19
7         7                                       23            #100
          22            I                         17
          15                                      22            MEAN
          23            #1                        18
          10                                      22            MEAN
          23            #100                      12
          11                            15        9
8         3                                       20
9         8                                       22            MEAN
          20                                      14
          22            VALUE                     22            VARIANCE
          21                                      21
          12                            16        5
                                     76


In case of an identifier or an integer it is necessary to specify the
particular identifier name or value that was scanned. A token specifier is
therefore associated with that particular type of code.
The figure shows the output from a scanner for the Pascal program we
considered.
Apart from recognizing tokens the scanner is also responsible for
reading the lines of the source program and possibly printing the source
listing.
The scanner must take into account any special format required of the
source statements e.g. in Fortran a number in columns 1-5 of a source
statement is a statement number not an integer, whether blanks function
as delimeters for tokens (as in Pascal) or not, whether statements can
be continued freely from one line to the next (as in Pascal) or whether
special continuation flags are necessary (as in Fortran)


Syntactic Analysis
During syntactic analysis the source statements written by the
programmer are recognized as language constructs described by the
grammar being used. This may be regarded as building the parse tree
for the statements being translated. Parsing techniques are of two types;
bottom up and top down according to the way in which the parse tree is
being constructed.
Top down methods begin with the rule of the grammar that specifies the
goal of the analysis (i.e. the root of the tree), and attempt to construct the
tree so that the terminal nodes match the statements being analyzed.
Bottom up methods begin with terminal nodes of the tree (the statements
being analyzed), and attempt to combine these into successively higher-
level nodes until the root is reached.
                                         77


A large number of different parsing techniques have been devised; one
of the bottom-up parsing techniques is the operator-precedence method
which is based on examining pairs of consecutive operators in the
source program and making decisions about which operation should be
performed first.
Consider     A+B*C–D
Multiplication and division usually have higher precedence than addition
and subtraction.
So for the first pair of operators i.e. (+ and *), + has lower precedence
than * i.e. + < *; similarly * > - for the next pair.
So for the expression
A+B*C–D
    < >
This implies that the expression B * C is to be computed before either of
the other operations. In form of a parse tree this means that the *
operation appears at a lower level than the + or – operators.
During this process the statement being analyzed is scanned for a sub
expression whose operators have higher precedence than the
surrounding operators. This sub expression then is interpreted in terms
of the rules of the grammar under consideration. This process continues
until the root of the tree is reached.
The first step in constructing an operator precedence parser is to
determine the precedence relations between the operators of the
grammar. In this context, operator means any terminal symbol (a token).
From the table:
PROGRAM = VAR means that the two tokens involved have equal
precedence.
BEGIN < FOR means that BEGIN has less precedence than FOR.
Precedence relations do not follow the ordinary rules for comparisons
e.g. ; > END but END > ;
                                                                        78


      Where there are no precedence relations between pairs of tokens
      means that the two tokens cannot appear together in any legal
      statement. If such a combination occurs during parsing it should be
      recognized as a syntax error.
           VAR    BEGIN    END       INTEGER              FOR   READ    WRITE       TO   DO    ;   :    ,    := + -      *   DIV   (   )   id   int
PROGRAM    =                                                                                                                               <
VAR               =                                                                            <   <    <                                  <
BEGIN                      =                              <     <       <                      <                                           <
END                        >                                                                   >
INTEGER           >                                                                      =     >                                           <
FOR
READ                                                                                                                               =
WRITE                                                                                                                              =
TO                                                                                       >                       <   <   <   <     <       <    <
DO                <        >                              <     <       <                      >                                           <
;                 >        >                              <     <       <                      >   <    <                                  <
:                 >                  <                                                         >
,                                                                                                                                          =
:=                         >                                                        =          >                 <   <   <   <     <       <    <
+                          >                                                        >    >     >                 >   >   <   <     <   >   <    <
-                          >                                                        >    >     >                 >   >   <   <     <   >   <    <
*                          >                                                        >    >     >                 >   >   >   >     <   >   <    <
DIV                        >                                                        >    >     >                 >   >   >   >     <   >   <    <
(                          >                                                                            <        <   <   <   <     <   =   <    <
)                          >                                                        >    >     >                 >   >   >   >         >
Id         >               >                                                        >    >     >   >    >    =   >   >   >   >         >
Int                        >                                                        >    >     >                 >   >   >   >         >
      The statements are scanned from left to right one token at a time. For
      each pair of operators the precedence relation between them is
      determined.
      Examples:
      1.         Read(Value);
      …(i) …BEGIN          READ          (   id       )
                      <           = <             >

      …(ii) BEGIN         READ       ( <N1> ) ;                                               < N1>
                      <          =           =            >
                                                                                               id
                                                                                              {Value}

      …(iii) BEGIN         <N2> ;                                                             < N2>



                                                                       READ     (             <N1>           )


                                                                                                id {Value}
                                                       79


In part (i) the parser identifies the portion of the statement delimited by
the precedence relationship < and > which consists of a single token id.
This portion can be identified as a factor (rule 12), prog_name (rule 2) or
an id_list (rule 6). It is simply interpreted as some non terminal symbol
<N1>.
Precedence relations hold only between terminal symbols, so <N1> is
not involved in this process.
2.      Variance := Sumq DIV 100 – Mean * Mean ;
(i)       id1    := id2 DIV int – id3 * id4 ;
        <     = <       >
(ii)     id1    := [N1] DIV int – id3 * id4 ;                                            <N1>
        <     =       <     <     >
                                                                                          id2
                                                                                         {SumQ}

(iii)       id1        :=    [N1] DIV [N2] –           id3     * id4 ;          <N1>              <N2>
        <         =           <        >
                                                                                 id2                  int
                                                                                {SumQ}             {100}

(iv)     id1   :=           [N3]   –    id3       * id4 ;                       <N3>
        <    =                <        <      >

                                                                      <N1>                 <N2>


                                                                      id2       DIV         int
                                                                      {SumQ}               {100}

(v)     id1           :=    [N3]   –   [N4] * id4 ;                      <N3>
        <         =           <         <    <    >

                                                             <N1>               <N2>        <N4>


                                                              id2        DIV     int            id3
                                                             {SumQ}              {100}      {mean}

(vi)     id1   :=           [N3]   –    [N4] * [N5] ;                    <N3>
        <    =                <          <    <   >

                                                             <N1>               <N2>       <N4>             <N5>


                                                              id2        DIV     int         id3              id4
                                                             {SumQ}              {100}     {mean}           {mean}
                                                     80


(vii)      id1   :=          [N3]   –    [N6] ;    <N3>                            <N6>
          <    =               <          >

                                         <N1>              <N2> <N4>                              <N5>


                                         id2      DIV       int        id3             *          id4
                                        {SumQ}             {100} {Mean}                           {Mean}

 (viii)       id1       :=    [N7] ;                              <N7>
          <         =           >

                                                  <N3>                                 <N6>


                                         <N1>              <N2>         <N4>                        <N5>


                                         id2      DIV       int    -         id3            *           id4
                                        {SumQ}             {100}         {Mean}                      {Mean}

(ix)       [N8] ;                                    <N8>



                                                                               <N7>


                                                           <N3>                                         <N6>


                                                  <N1>                  <N2>           <N4>                    <N5>


                id1          :=                    id2     DIV           int       -        id3            *     id4
           {Variance}                             {SumQ}                 {100}             {Mean}              {Mean}


Note that each portion of the parse tree is constructed from the terminal
nodes up towards the root, hence the term bottom up parsing.
There are a few differences between these parse trees and the first
ones. This is because the operator precedence parse is not concerned
with the names of the non terminals and it is not necessary to perform
this additional step in the recognition process.
                                    81


Code Generation
After the syntax has been analysed the last task of the compilation is the
generation of object code. A simple code generation technique is the
one that creates the object code for each part of the program as soon as
its syntax has been recognized.
The technique involves a set of routines one for each rule or alternative
rule in the grammar. When the parser recognizes a portion of the source
program according to some rule of the grammar, the corresponding
routine is executed. Such routines are called semantic routines because
the processing performed is related to the meaning associated with the
corresponding construct in the language. These semantic routines
generate object code directly so they can also be called code generation
routines.
The code to be generated depends upon the computer for which the
program is being compiled. We will use the generation of the object code
for the SIC/XE machine.
The code generation routines create segments of object code for the
compiled program which will be represented here using SIC assembler
language. The actual code generated is machine language not
assembler. As each piece of object code is generated a location counter
is updated to reflect the next available address in the compiled program.
Regardless of the method used to generate the parse tree, the parser
will always recognize at each step the left most substring of the input
that can be interpreted according to the rule of the grammar. In the
operator precedence method this recognition occurs when a substring of
the input is reduced to some non terminal <Ni>. The assembler code
below shows the symbolic representation of the object code to be
generated for the READ statement.
                                      82

+JSUB       XREAD
WORD        1
WORD        VALUE

It involves a call to subroutine XREAD which would be part of a standard
library associated with the compiler. It can be called by any program that
wants to perform a READ operation. XREAD is linked together with the
generated object program by a linking loader or a linking editor.
Since XREAD may be used to perform any READ operation it must be
passed parameters that specify the details of the READ. In this case the
parameter list for XREAD is defined immediately after the JSUB that
calls it. The first word in this parameter list contains a value that specifies
the number of variables that will be assigned values by the READ. The
following words give the addresses of these variables. Thus the second
line specifies that one variable is to be read and the third line gives the
address of this variable.
The parser in generating the parse tree recognizes first <id-list> and
then <read>. At each step the parser calls the appropriate code
generation routine.
For the assignment statement
VARIANCE := SUMQ DIV 100 – MEAN * MEAN
most of the work involves the analysis of the <exp> statement on the
right hand side of the :=. The parser first recognizes the id SUMQ as a
<factor> and a <term>; then it recognizes the int 100 as a <factor>; then it
recognizes SUMQ DIV 100 as a <term> and so on. As each portion of
the statement is recognized a code-generation routine is called to create
the corresponding object code.
The assembler code below shows the symbolic representation of the
object code to be generated for the assignment statement
VARIANCE := SUMQ DIV 100 – MEAN * MEAN
                                              83



LDA    SUMQ
DIV    #100
STA    T1
LDA    MEAN
MUL    MEAN
STA    T2
LDA    T1
SUB    T2
STA    VARIANCE

Below is the symbolic representation of the object code generated from
the Pascal program on page 82.
Line   Symbolic representation of the generated code
1      STATS          STATS           0                {program Header}
                      EXTREF          XREAD, XWRITE
                      STL             RETADR           {Save return address}
                      J               {EXADDR}
       RETADR         RESW            1                {Variable declarations}
3      SUM            RESW            1
       SUMQ           RESW            1
       I              RESW            1
       VALUE          RESW            1
       MEAN           RESW            1
       VARIANCE       RESW            1
5      {EXADDR}       LDA             #0               {SUM := 0}
                      STA             SUM
                      LDA             #0               {SUMQ := 0}
6                     STA             SUMQ
7                     LDA             #1               {For I := 1 to 100}
       {L1}           STA             I
                      COMP            100
                      JGT             {L2}
9                     +JSUB           XREAD            {READ(VALUE)}
                      WORD            1
                      WORD            VALUE
10                    LDA             SUM              {SUM := SUM + VALUE}
                      ADD             VALUE
                      STA             SUM
11                    LDA             VALUE            {Sumq:=Sumq+Vvalue*Value}
                      MUL             VALUE
                      ADD             SUMQ
                      STA             SUMQ
                      LDA             I                {End of FOR Loop}
                      ADD             #1
                      J               {L1}
13     {L2}           LDA             SUM
                      DIV             #100
                      STA             MEAN
14                    LDA             SUMQ              {Variance := SumqDIV100–mean*mean}
                      DIV             #100
                      STA             T1
                      LDA             MEAN
                      MUL             MEAN
                      STA             T2
                      LDA             T1
                      SUB             T2
                      STA             VARIANCE
15                    +JSUB           XWRITE           {Write(Mean,Variance)}
                      WORD            2
                      WORD            MEAN
                      WORD            VARIANCE
                      LDL             RETADR           {Return}
                      RSUB
       T1             RESW            1                { Working Variables Used}
       T2             RESW            1
                      END
                                    84


Machine Dependent Compiler Features.
Most high level programming languages are designed to be relatively
independent of the machine being used. This means that the process of
analyzing the syntax of the program should also be machine
independent. The only machine dependencies of a compiler are related
to the generation and optimization of the object code.
The code optimization is done using an intermediate form of the program
being analysed. In the intermediate form the syntax and the semantics of
the source statements have been completely analysed but the actual
translation into machine code has not yet been performed. It is much
easier to analyse and manipulate the intermediate form of the program
for the purposes of code optimization than to perform the corresponding
operations on either the source program or the machine code.


Intermediate form of the program
One of the methods used in representing a program in an intermediate
form represents the executable instructions of the program with a
sequence of quadruples. Each quadruple is of the form
      Operation, op1, op2, result
where operation is the function to be performed by the object code, op1
and op2 are the operands and result is where the resulting value is to be
placed.
      SUM := SUM + VALUE
could be represented with quadruples as
      + , SUM, VALUE, i1
      :=, i1  ,      ,SUM
where i1 represents the intermediate result (SUM + VALUE); the second
quadruple      assigns the value of this intermediate result to SUM.
Assignment is treated as a separate operation (:=) .
Similarly
                                       85


               VARIANCE:= SUMQ DIV 100 – MEAN * MEAN
Could be represented with quadruples as
       DIV, SUMQ, #100, i1
       *, MEAN, MEAN, I2
       -,    i1  , i2   ,i3
       :=, i3    ,      , VARIANCE

Many types of analysis and manipulation can be performed on the
quadruples for code optimization purposes e.g. the intermediate results
ij can be assigned to registers or to temporary variables to make their
use more efficient. After optimization has been performed the modified
quadruples are translated into machine code.
Below is the intermediate code for the Pascal program.
operation           Op1        Op2          Result
(1)    :=           #0                      SUM            {SUM := 0}
(2)    :=           #0                      SUMQ           {SUMQ := 0}
(3)    :=           #1                      I              {FOR I := 1 TO 100}
(4)    JGT          I          #100         (15)
(5)    CALL         XREAD                                  {READ(VALUE)}
(6)    PARAM        VALUE
(7)    +            SUM        VALUE        i1             {SUM:= SUM + VALUE}
(8)    :=           i1                      SUM
(9)    *            VALUE      VALUE        i2             {SUMQ:=SUMQ
(10)   +            SUMQ       i2           i3         +           VALUE * VAL }
(11)   :=           i3                      SUMQ
(12)   +            I          #1           i4             {end of FOR loop}
(13)   :=           i4                      I
(14)   J                                    (4)
(15)   DIV          SUM        #100         i5             {MEAN := SUM DIV 100}
(16)   :=           i5                      MEAN
(17)   DIV          SUMQ       #100         i6         {VARIANCE := SUMQ DIV 100 -
(18)   *            MEAN       MEAN         i7                  MEAN * MEAN}
(19)   -            i6         i7           i8
(20)   :=           i8                      VARIANCE
(21)   CALL         XWRITE                                 {WRITE(MEAN,VARIANCE)}
(22)   PARAM        MEAN
(23)   PARAM        VARIANCE


The READ and WRITE statements are represented with a CALL
operation followed by PARAM quadruples that specify the parameters of
the READ and WRITE.


Code-Optimization
Machine instructions that use registers as operands are usually faster
than the corresponding instructions that refer to locations in memory. It is
                                     86


therefore better to keep in registers all variables and intermediate results
that will be used later in the program.
Consider the variable VALUE which is used once in quadruple 7 and
twice in quadruple 9. It is possible to fetch this value once and retain it in
a register for use by the code generated from quadruple 9. Similarly if i5
is stored into a register it could be used wherever the variable MEAN is
required.
Another possibility for code optimization involves rearranging quadruples
before machine code is generated.
      DIV   SUMQ        #100         i1
      *     MEAN        MEAN         i2
      -     i1          i2           i3
      :=    i3                       VARIANCE


This corresponds to
      LDA   SUMQ
      DIV   #100
      STA   T1
      LDA   MEAN
      MUL   MEAN
      STA   T2
      LDA   T1
      SUB   T2
      STA   VARIANCE


The value of the intermediate result i1 is calculated first and stored in a
temporary variable T1, then i2 is calculated. The third quadruple needs
subtracting i2 from i1. Since i2 has just been computed its value is in
register A. It is necessary to store the value of i2 in another temporary
variable T2 and then load the value of i1 from T1 into register A before
performing the subtraction.
An optimizing compiler can rearrange the quadruples so that the second
operand of subtraction is computed first as shown below.
      *     MEAN        MEAN         i2
      DIV   SUMQ        #100         i1
      -     i1          i2           i3
      :=    i3                       VARIANCE
                                        87


corresponding to
         LDA          MEAN
         MUL          MEAN
         STA          T1
         LDA          SUMQ
         DIV          #100
         SUB          T1
         STA          VARIANCE
The resulting machine code requires fewer instructions and uses only
one temporary variable instead of two.


Machine Independent Compiler Features.
Storage Allocation
The type of storage assignment where all programmer defined variables
are assigned fixed addresses within the program is called static
allocation. It is often used for programs like FORTRAN that do not allow
recursive use of procedures or subroutines.


(1)    System             System                      System

       MAIN       (1)            MAIN           (1)


                         CALL SUB                     CALL SUB


      RETADR
                         RETADR                       RETADR

        (a)     (2)                           (2)
                            SUB                          SUB


                                              (3)      CALL SUB



                           RETADR
                                                       RETADR
                             (b)
                                                          (c)
If procedures may be called recursively like in PASCAL static allocation
cannot be used. In the figure the program MAIN has been called by the
                                     88


operating system (call 1). MAIN stores its return address at a fixed
location RETADR within MAIN.
MAIN calls SUB (call 2). The return address of this call is stored at a fixed
location within SUB. If SUB calls itself recursively as in fig (c) a problem
occurs because SUB stores the return address for call 3 into RETADR
from register L. This destroys the return address for call 2 and as a result
there is no possibility of ever making a correct return to MAIN.
A similar difficulty occurs with respect to any variables used by SUB.
When recursive calls are made variables within SUB may be set to new
values; however the previous values may be needed by call 2 of SUB
after the return from the recursive call
It is therefore necessary to preserve the previous values of any variables
used in SUB including parameters, temporaries, return addresses,
register save areas etc.
This is usually accomplished by the dynamic storage allocation
technique where each procedure call creates an activation record that
contains storage for all the variables used by the procedure. If the
procedure is called recursively another activation record is created. Each
activation record is associated with a particular invocation of the
procedure. An activation record is not deleted until a return has been
made from the corresponding invocation. The starting address for the
current activation record is usually contained in a base register which is
used by the procedure to address its variables. In this way the values of
variables used by the different invocations of a procedure are kept
separate from one another.
Activation records are typically allocated on a stack, with the current
record on top of the stack.
In the diagram below, (a) MAIN has been called, its activation record
appears on the stack. The base register is set to indicate the starting
                                                89


address of this of the current activation record. The first word in an
activation record normally contains a pointer PREV to the previous
record on the stack. Since this record is the first the pointer value is null.
                                                                                      Variables
      System                                                   System                 for SUB
(1)   MAIN                                       (1)
                                                               MAIN                   RETADR
                          Variables
                          For MAIN                            Call SUB                NEXT

                         RETADR                                             B         PREV
                                                (2)
                          NEXT                                                        Variables
                     B          0                                                     for MAIN

               (a)                                             SUB
                                                                                      RETADR

                                                                                        NEXT
                                                                                            0

                                                                            (b)
                                Variables
                                For SUB



                                RETADR

                                 NEXT

                          B
                                  PREV
                                    Variables
         System                 for SUB
(1)                                                                                      Variables
                                                                  System                  for SUB
                                                        (1)
       Call SUB                RETADR                                                       RETADR

                                 NEXT                            Call SUB                    NEXT

(2)                             PREV                                              B         PREV

                                Variables                                                   Variables
       SUB                     for Main                (2)                                  for MAIN

                                                                   SUB
(3)                            RETADR                                                       RETADR
       Call SUB
                                NEXT                                                         NEXT

                                     0                                                            0

                         (c)                                                          (d)
                                     90


The second word of the activation record contains a pointer NEXT to the
first unused word of the stack, which will be the starting address for the
next activation record created. The third word contains the return
address for this invocation of the procedure, and the remaining words
contain the values of all the variables used by the procedure.
In diagram (b) MAIN has called SUB. On the top of the stack a new
activation record has been created with register B set to indicate the new
current record. The pointers PREV and next are set as shown.
In (c) SUB has called itself recursively and another activation record has
been created.
When a procedure returns to its caller the current activation record is
deleted. The pointer PREV in the deleted record is used to reestablish
the previous activation record as the current one and execution
continues.
Fig (d) shows how the stack would appear after SUB returns from the
recursive call. Register B has been reset to point to the activation record
for the previous invocation of SUB. The return address and all the
variable values in this activation record are exactly the same as they
were before the recursive call.
This technique is called automatic allocation of storage. In this technique
the compiler generates code for references to variables using some sort
of relative addressing. The compiler assigns each variable an address
which is relative to the beginning of the activation record instead of an
actual location within the program. The address of the current activation
record is contained in register B. the displacement in this instruction is
the relative address of the variable within the activation record.
The compiler also generates additional code to manage the activation
records themselves. At the beginning of each procedure there must be
code to create a new activation record, linking it to the previous one and
                                     91


setting the appropriate pointers. This code is often called prologue for the
procedure. At the end of the procedure there must be a code to delete
the current activation record and resetting pointers as needed. This code
is called an epilogue.
Other types of dynamic storage allocation allow the programmer to
specify when storage is to be assigned. In PL/I the statement
ALLOCATE (A) allocates storage for the variable A while FREE (A)
releases the storage assigned to A by the previous ALLOCATE. This
feature is called controlled storage in PL/I.
In Pascal the statement NEW(P) allocates storage for a variable and
sets the pointer P to indicate the variable just created. The statement
DISPOSE(P) releases the storage that was previously assigned to the
variable pointed to by P.


Structured Variables
These include arrays, records, strings, sets etc.
Consider an array A: ARRAY[1..10], if each integer variable occupies
one word of memory, then ten words have to be allocated to store this
array.
In general an array ARRAY[l..u] of integer needs an allocation of u-l+1
words of storage for the array.
For a two-dimensional array like B: ARRAY[0..3,1..6] of integer, the first
subscript on 4 different values (0-3) and the second subscript can take
on 6 values. We need to allocate a total of 4 * 6 = 24 words to store the
array. In general an array ARRAY[l1..u1, l2..u2] of integer needs to be
allocated a storage of (u1-l1+1)*(u2-l2+1) words.
To generate code for array references it is important to know which array
element corresponds to each word of allocated storage. For a one
                                                92


dimensional array there is an obvious correspondence e.g. in the array A
above the first word contains A[1], the second word A[2] etc.
A two dimensional array has two possible ways of storing its elements.
All array elements that have the same value of the first subscript are
stored in contiguous locations. This is called row major order.
      0,1   0,2 0,3 0,4 0,5 0,6 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6

The right most subscript varies most rapidly.


Where all elements that that have the same value of the second
subscript are stored together is called the column major order.
      0,1   1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 0,4 1,4 2,4 3,4 0,5 1,5 2,5 3,5 0,6 1,6 2,6 3,6



The left most subscript varies most rapidly.


Compilers for most high level languages store arrays using row-major
order; FORTAN compilers however store arrays in column order.
To refer to an array element we calculate the address of the referenced
element relative to the base address of the array. The compiler will
generate code to place this relative address in an index register.
Assume a one dimensional array A: ARRAY[1..10] of integer and
suppose that a statement refers to an array element A[6]. There are five
array elements preceding A[6]; on a SIC machine each element will
occupy 3 bytes, thus the address of A[6] relative to the starting address
of the array is given by 5 x 3 = 15.
In general for an array element A[s] of a one dimensional array, A:
ARRAY [l..u] where each array element occupies w bytes of storage, its
location will be
      w * (s - l)
                                       93


A multi dimensional array will consider whether a row major or column
major is used.
Assume a row major and an array B: ARRAY[0..3,1..6] of integer. For
the array element B[2,5] skip two rows row 0 and row 1 Each row
contains 6 elements so this involves 2 x 6 = 12. Skip also the first 4
elements in row 2 to get to B[2,5]. This makes a total of 16 array
elements. Each element is three bytes so the array element B[2,5] is at
address 48 relative to the beginning of the array.
In general, for an array declaration B: ARRAY [l1..u1, l2..u2], the relative
address of element B[s1,s2] is given by
      w [(s1 – l1) * (u2 – l2 + 1) + (s2 – l2)]
                                      94


                       MACROPROCESSORS
A macro instruction (often abbreviated as a macro) represents a
commonly used group of statements in the source program language.
The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called
expanding    the   macros.   Macro     instructions   therefore   allow   the
programmer to write a short hand version of a program. The functions of
a macro processor essentially involve the substitution of one group of
characters or lines for another.


Macro Definition and Expansion.
A macro consists of a name, a set of formal parameters and a body.
The table below shows an example of a SIC/XE program using macro
instructions. It uses 2 macro instructions RDBUFF (RDREC) and
WRBUFF (WRREC).
Two new assembler directives (MACRO and MEND) are used.
MACRO identifies the beginning of a macro definition. RDBUFF is the
name of the macro and the operands are the parameters of the macro
instruction. Each parameter begins with a character &. Following the
MACRO directive are the statements that make up the body of the
macro definition (lines 15 – 90). These are statements that will be
generated as the expansion of the macro.
MEND marks the end of the macro definition.
The main program itself begins on line 180. The statement on liner 190
is a macro invocation (macro call).
                                             95



5     COPY      START     0                                  COPY FILE FROM INPUT TO OUTPUT
10    RDBUFF    MACRO     &INDEV,&BUFADR,&RECLTH
15
20     MACRO TO READ RECORD INTO BUFFER
25
30              CLEAR     X             CLEAR LOOP COUNTER
35              CLEAR     A
40              CLEAR     S
45              +LDT      #4096         SET MAXIMUM RECORD LENGTH
50              TD        =X’&INDEV’    TEST INPUT DEVICE
55              JEQ       *-3           LOOP UNTIL READY
60              RD        =X’&INDEV’    READ CHARACTER INTO REGISTER A
65              COMPR     A,S           TEST FOR END OF RECORD
70              JEQ       *+11          EXIT LOOP IF EOR
75              STCH      &BUFADR,X     STORE CHARACTER IN BUFFER
80              TIXR      T             LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85              JLT       *-19
90              STX       ,&RECLTH      SAVE RECORD LENGTH
95              MEND

100   WRBUFF    MACRO     &OUTDEV,&BUFADR,&RECLTH
105
110                 Macro to write Record from Buffer
115
120             CLEAR     X                CLEAR LOOP COUNTER
125             LDT       &RECLTH
                LDCH      &BUFADR,X        GET CHARACTER FROM BUFFER
130             TD        =X’&OUTDEV’      TEST OUTPUT DEVICE
135             JEQ       *-3              LOOP UNTIL READY
145             WD        =X’&OUTDEV’      WRITE CHARACTER
150             TIXR      T                LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
155             JLT       *-14
160             MEND
165
170                         Main Program
175
180    FIRST     STL       RETADR                    SAVE RETURN ADDRESS
190    CLOOP     RDBUFF    F1, BUFFER, LENGTH        READ RECORD INTO BUFFER
195              LDA       LENGTH                    TEST FOR END OF FILE
200              COMP      #0
205              JEQ       ENDFIL                    EXIT IF EOF FOUND
210              WRBUFF    05, BUFFER, LENGTH        WRITE OUTPUT RECORD
215              J         CLOOP                     LOOP
220    ENDFIL    WRBUFF    05, EOF, THREE            INSERT EOF MARKER
225              J         @RETADR
230    EOF       BYTE      C‘EOF’
235    THREE     WORD      3
240    RETADR    RESW      1
245    LENGTH    RESW      1                         LENGTH OF RECORD
250    BUFFER    RESB      4096                      4096 - - BYTE BUFFER AREA
255              END       FIRST
                                        96



5      COPY      START    0                    COPY FILE FROM INPUT TO OUTPUT
180    FIRST     STL      RETADR               SAVE RETURN ADDRESS
190              RDBUFF   F1, BUFFER, LENGTH   READ RECORD INTO BUFFER
       .CLOOP
190a   CLOOP     CLEAR    X                    CLEAR LOOP COUNTER
190b             CLEAR    A
190c             CLEAR    S
190d             +LDT     #4096                SET MAXIMUM RECORD LENGTH
190e             TD       =X’F1’               TEST INPUT DEVICE
190f             JEQ      *-3                  LOOP UNTIL READY
190g             RD       =X’F1’               READ CHARACTER INTO REGISTER A
190h             COMPR    A,S                  TEST FOR END OF RECORD
190i             JEQ      *+11                 EXIT LOOP IF EOR
190j             STCH     BUFFER,X             STORE CHARACTER IN BUFFER
190k             TIXR     T                    LOOP UNLESS MAXIMUN LENGTH       HAS   BEEN
                                               REACHED
190l             JLT      *-19
190m             STX      LENGTH               SAVE RECORD LENGTH
195              LDA      LENGTH               TEST FOR END OF FILE
200              COMP     #0
205              JEQ      ENDFIL               EXIT IF EOF FOUND
210              WRBUFF   05, BUFFER, LENGTH   WRITE OUTPUT RECORD
210a             CLEAR    X                    CLEAR LOOP COUNTER
210b             LDT      LENGTH
210c             LDCH     BUFFER,X             GET CHARACTER FROM BUFFER
210d             TD       =X’05’               TEST OUTPUT DEVICE
210e             JEQ      *-3                  LOOP UNTIL READY
210f             WD       =X’05’               WRITE CHARACTER
210g             TIXR     T                    LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
210h             JLT      *-14
215              J        CLOOP                LOOP
220              WRBUFF   05, EOF, THREE       INSERT EOF MARKER
       .ENDFIL
220a   ENDFIL    CLEAR    X                    CLEAR LOOP COUNTER
220b             LDT      THREE
220c             LDCH     EOF,X                GET CHARACTER FROM BUFFER
220d             TD       =X’05’               TEST OUTPUT DEVICE
220e             JEQ      *-3                  LOOP UNTIL READY
220f             WD       =X’05’               WRITE CHARACTER
220g             TIXR     T                    LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
220h             JLT      *-14
225              J        @RETADR
230    EOF       BYTE     C‘EOF’
235    THREE     WORD     3
240    RETADR    RESW     1
245    LENGTH    RESW     1                    LENGTH OF RECORD
250    BUFFER    RESB     4096                 4096 - - BYTE BUFFER AREA
255              END      FIRST



The figure above shows the output that would be generated. In the
expanded form:
      The macro instruction definitions have been deleted.
                                   97


     Each macro instruction has been expanded into the statements
      that form the body of the macro with the arguments from the macro
      invocation substituted for the parameters in the macro prototype.
     The macro invocation statement itself has been included as a
      comment line.


Differences between Macros and Subroutine calls
The statements from the body of the macro WRBUFF are generated
twice i.e. lines 210a to 210h and lines 220a to 220h in the above figure.
In the figure on page 40 the corresponding statements appear only once.
In general the statements that form the expansion of the macro are
generated (and assembled) each time the macro is invoked. Statements
in a subroutine appear only once regardless of how many times the
subroutine is called.
Macro instructions are written with no labels in the body of the macro.
Line 140 “JEQ *-3 “, line 155 “JLT *-14” instead of JEQ WLOOP and JLT
WLOOP, where WLOOP is the label on the TD instruction that tests the

output device. If such a statement appeared on line 135 of the macro
body it would be generated twice on lines 210d and 220d.


Macro processor Tables and Logic.
There are 3 main data structures involved in macro processors.
     The macro definitions are stored in a definition table DEFTAB
      which contains the macro prototype and the statements that make
      up the macro body. Comment lines from the macro definition are
      not entered into DEFTAB.
     Macro names are entered into NAMTAB which serves as an index
      to DEFTAB. For each defined macro NAMTAB contains pointers to
      the beginning and end of the definition in DEFTAB.
                                     98


     The third structure is the argument table, ARGTAB, used during
      the expansion of macro calls. The arguments are stored in
      ARGTAB according to their positions in the argument list.

                NAMTAB                               DEFTAB



                                           RDBUFF   &INDEV, &BUFADR, &RECLTH
                                           CLEA        X
            RDBUFF   . .                   CLEA        A
                                           CLEA        S
                                           +LDT        #4096
                                           TD          =X’?1’
                                           JEQ         *-3
                                           RD          = X’?1’
                                           COMPR       A,S
                                           JEQ         *+11
                                           STCH        ?2,X
                                           TIXR        T
                                           JLT         *-19
                                           STX         ?3
                                           MEND


           ARGTAB

       1   F1

       2   BUFFER

       3   LENGTH
The positional notation for the parameters &INDEV has been converted
to ?1 etc. The first argument in the figure above is F1.


Generation of Unique Labels.
Since it is not possible for the body of the macro instruction to contain
labels, relative addressing is used. However for large jumps over many
instructions such a notation is very inconvenient, error prone and difficult
to read. Special types of labels are therefore used. Labels within the
macro body begin with the special character $. In the expansion each
symbol beginning with $ is modified by replacing $ with $AA. More
generally the character $ will be replaced by $XX where XX is a two
character alphanumeric counter of the number of macro instructions
                                          99

25     RDBUFF    MACRO    &INDEV,&BUFADR,&RECLTH
30               CLEAR    X          CLEAR LOOP COUNTER
35               CLEAR    A
40               CLEAR    S
45               +LDT     #4096      SET MAXIMUM RECORD LENGTH
50     $LOOP     TD       =X’&INDEV’ TEST INPUT DEVICE
55               JEQ      $LOOP      LOOP UNTIL READY
60               RD       =X’&INDEV’ READ CHARACTER INTO REGISTER A
65               COMPR    A,S        TEST FOR END OF RECORD
70               JEQ      $EXIT      EXIT LOOP IF EOR
75               STCH     &BUFADR,X STORE CHARACTER IN BUFFER
80               TIXR     T          LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85               JLT      $LOOP
90     $EXIT     STX      &RECLTH    SAVE RECORD LENGTH
95               MEND

                 RDBUFF     F1, BUFFER, LENGTH

30                CLEAR   X             CLEAR LOOP COUNTER
35                CLEAR   A
40                CLEAR   S
45                +LDT    #4096         SET MAXIMUM RECORD LENGTH
50     $AALOOP    TD      =X’&INDEV’    TEST INPUT DEVICE
55                JEQ     $AALOOP       LOOP UNTIL READY
60                RD      =X’&INDEV’    READ CHARACTER INTO REGISTER A
65                COMPR   A,S           TEST FOR END OF RECORD
70                JEQ     $AAEXIT       EXIT LOOP IF EOR
75                STCH    &BUFADR,X     STORE CHARACTER IN BUFFER
80                TIXR    T             LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85                JLT     $AALOOP
90     $AAEXIT    STX     &RECLTH       SAVE RECORD LENGTH


expanded. For the first macro expansion in a program XX will have the
value AA. For succeeding macro expansion, XX will be set to AB, AC
etc.


Conditional Macro Expansion
Most macro processors can modify the sequence of statements
generated during a macro expansion depending on the arguments
supplied in the invocation. The first figure below shows a definition of a
macro RDBUFF. Two additional parameters &EOR (a hexadecimal
character code that marks the end of the record) and &MAXLTH
(specifying the maximum record length that can be read). It is possible
for either or both of these parameters to be omitted in an invocation of
                                          100

25   RDBUFF    MACRO    &INDEV,&BUFADR,&RECLTH, &EOR, &MAXLTH
26             IF       (&EOR NE ‘ ‘)
27   &EORCK    SET      1
28             ENDIF
30             CLEAR    X                  CLEAR LOOP COUNTER
35             CLEAR    A
38             IF       (&EORCK EQ 1)
40             LDCH     =X’&EOR’           SET EOR CHARACTER
42             RMO      A,S
43             ENDIF
44             IF       (&MAXLT EQ ‘ ‘)
45             +LDT     #4096              SET MAXIMUM RECORD LENGTH
46             ELSE
47             +LDT     #&MAXLTH           SET MAXIMUM RECORD LENGTH
48             ENDIF
50   $LOOP     TD       =X’&INDEV’         TEST INPUT DEVICE
55             JEQ      $LOOP              LOOP UNTIL READY
60             RD       =X’&INDEV’         READ CHARACTER INTO REGISTER A
65             COMPR    A,S                TEST FOR END OF RECORD
70             JEQ      $EXIT              EXIT LOOP IF EOR
75             STCH     &BUFADR,X          STORE CHARACTER IN BUFFER
80             TIXR     T                  LOOP UNLESS MAXIMUN LENGTH HAS BEEN
                                           REACHED
85             JLT      $LOOP
90   $EXIT     STX      &RECLTH            SAVE RECORD LENGTH
95             MEND

               RDBUFF     F3, BUF, RECL, 04, 2048

30              CLEAR   X               CLEAR LOOP COUNTER
35              CLEAR   A
40              LDCH    =X’04’          SET EOR CHARACTER
42              RMO     A,S
47              +LDT    #2048           SET MAXIMUM RECORD LENGTH
50   $AALOOP    TD      =X’F3’          TEST INPUT DEVICE
55              JEQ     $AALOOP         LOOP UNTIL READY
60              RD      =X’F3’          READ CHARACTER INTO REGISTER A
65              COMPR   A,S             TEST FOR END OF RECORD
70              JEQ     $AAEXIT         EXIT LOOP IF EOR
75              STCH    BUF,X           STORE CHARACTER IN BUFFER
80              TIXR    T               LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85              JLT     $AALOOP
90   $AAEXIT    STX     RECL            SAVE RECORD LENGTH


RDBUFF. Statements on lines 44 to 48 of this definition illustrate a
simple macro time conditional structure. If the value of the expression is
TRUE the statements following the IF are generated until an ELSE is
encountered. If the parameter &MAXLTH is equal to the null string the
statement on line 45 is generated. A similar structure appears on lines
26-28.

								
To top