Document Sample
                       Mike Smith
   Department of Electrical and Computer Engineering,
                 University of Calgary,
          Calgary, Alberta, Canada T2N 1N4

            Email: smith @
                Phone: (403) 220-6142
                 Fax: (403) 282-6855

   First Rule in Assembly Language Programming

           Published in Circuit Cellar Magazine
                     December 1998

         Published CCI magazine, December 1998.
              ASSEMBLING SOME “C” CODE
                                            Mike Smith
It might seem strange but the best advice you          The most obvious of these is the use of the
can give somebody learning how to program              plus-plus operator in a loop (as in value++)
using assembly language is                             rather than using a slower arithmetic addition
                                                       (value = value + CONST) (see Table I). Other
Make sure you know when NOT to program in              “C/C++” constructs, such as += or -=
            assembly language!                         operators, also reflect the typical processor
                                                       instruction where the destination is identical
There are many valid reasons for giving this           with one of the instruction sources.
advice but what should strike home to the
industrial programmer is – value for time and          In the “good old days”, it was necessary for the
effort involved. It roughly takes as much time to      programmer to write “C” to take advantages of
develop, debug, test, integrate and maintain one       these special features as the compilers were
line of code in any language. Therefore best           rather unsophisticated. However the modern
value for your money is obtained when you              compiler can typically do as good a job as, and
code using the highest amount of abstraction           frequently a better than, an experienced
that you can get away with for your line of            assembly language programmer. For example if
work. If it is true that one picture is worth a        CONST = 1 for the instructions in Table I, most
thousand words, then in many cases one line of         compilers would automatically generate the
“C” is worth a hundred lines of assembly code.         faster increment instruction. Most modern
                                                       “C++” compilers can provide an analysis of
You also have to remember that “C” was                 your code and automatically handle a plethora
originally designed to provide an efficient,           of optimizations to account for speed and
processor independent, assembly language.              memory trade-offs for a variety of industrial
This means that many of the constructions              scenarios. Table II shows some of the switches
behind the language translate directly into            available on the Software Development
special features available within a processor’s        Systems (SDS) compiler.
instruction set.

          “C” construct             Motorola Processor                   Intel Processor
                                     (SDS Compiler)                    (Borland Compiler)
              value++          ADDQ.L #1, D3        FAST       INC si                 FAST
                                                               MOVE ax, si
       value = value + CONST   ADD.L #CONST, D3     SLOW       ADD ax, CONST          SLOW
                                                               MOVE si, ax

            Table 1: “C” language constructs were originally designed to take advantage of the
                                       processor’s instruction set.

                               Published CCI magazine, December 1998.
Despite all the wonderful things that “C” can
do for you, there still comes the time when the                    There is more than one way to perform any
programmer needs to tweak that one piece of                         given operation. Some approaches take
code at the assembly language level in order to                     advantage of one processor feature whilst
take account of some special algorithmic match                      others take advantage of different features.
with an unusual processor feature. Indeed, SDS                     The relative advantages are not always
actually recommends that access of hardware                         apparent and are frequently not particularly
registers be performed directly from assembly                       significant in terms of increased program
language rather than “C” in all but the simplest                    speed.
situations. Both situations mean that it is                        If it takes a programmer an hour to shave 1
necessary for the programmer to pick up the                         msec from a section of code, then that code
skill to mix and match between subroutines                          block must be run 3, 600, 000 times before
quickly made functional through the use of “C”                      there is a pay-back. You have to remember
and those hand-written, highly customized,                          that with today’s faster processors, the
assembly language code sequences. The                               programmer probably only saved 0.01
purpose behind this article is to provide an                        msec!
introduction to the key elements of that skill.
                                                               However, probably the most important and
                                                               difficult thing for programmers to come to
             THINGS TO PONDER                                  terms with is the same problem experienced by
                                                               four-year olds playing with their siblings – the
The ability to be able to link between “C” and                 concept of SHARING
assembly code is not particularly difficult if
you are willing to accept several basic facts                      There is just one set of processor and
                                                                    system resource. Resource sharing must
     There is nothing particularly magic about                     occur whether it is a team of programmers
      the code generated by a “C/C++” compiler.                     using common assembly language
      It must make use of the same instructions                     subroutines within a project, or one
      and system resources as the assembly                          programmer switching back and forth
      language programmer.                                          between “C/C++” and assembler.

                        A allocate registers based on frequency
                        B perform branch optimization
                        C put frequent constants in registers
                        D called functions cleanup
                        E dead-code elimination
                        H local common subexpression elimination
                        I allow inline functions (C++ only)
                        L perform lifetime analysis
                        R enable automatic register allocation
                        S optimize for size (vs optimize for speed)
                        T volatile variables must be declared volatile
                        U remove unreachable code
                        Y enable aggressive switch algorithms

    Table II: Optimizations, such as these from the SDS “CC68000” compiler, provide the programmer
                      with a wide range of register and memory usage optimizations.

                                 Published CCI magazine, December 1998.
      THE DEVICE TO CONTROL                                The 8-bit wide TRANSMIT and
   USING “C” AND ASSEMBLY CODE                              RECEIVE registers are used to
                                                            communicate with a small handheld screen
A recent article in the New Scientist magazine              and touch pad.
(15th November, 1997) described some                       The FLASH and DARKEN 16-bit register
interesting, new crime detection hardware                   values are used to reload the timers used to
which we will use as a basis for showing how                control the fluorescence and glasses
to link “C” and assembly code. Apparently, it is            operation.
very difficult to detect the equivalent of small           The CONTROL register is 32 bits wide
blood splatters, or other tissue remains, around             READREADY and WRITEREADY
a corpse during daylight hours. The new                         bits for the serial communications line
hardware overcomes this problem by                              registers are found as bits 0 and 1
illuminating the crime scene with rapid flashes                 respectively
of light, causing the blood splashes to                      Activation bits for the FLASH and
fluoresce. However, the fluorescence can’t be                   DARKEN timers are found as bits 4
seen above normal daylight reflection. To                       and 5.
overcome this problem, the detective uses a                  If the OVERHEAT warning bit (bit 8)
pair of glasses with active lenses that can be                  becomes set, quickly switch off the
made to darken. By making the darkening rate                    powerful fluorescence lamp (controlled
of the glasses slightly different from the                      by bit 9) to avoid destroying it.
frequency of the fluorescence, the tissue
                                                             Interrupt handling information is stored
samples can be made to flicker on and off like                  in bits 16 to 31.
Christmas lights. Presumably the detective can
then identify the evidence and scrape it up to
send to the forensic laboratory.
                                                                    GETTING STARTED
For this article let us assume that there are five      The code for the three main components of the
hardware registers necessary to control the             first GLOB device prototype are given in
Generalized Locator of Blood (GLOB)                     Figures 1, 2 and 3. The main.c file (Figure 1)
device. These registers are shown in Table III          has a main() function that calls an assembly
and have been deliberately chosen to have size          language program (void CallAsm(void)).
and offset characteristics that could cause
problems when accessed from within a “C”

                          Register                                        Register
                           Name                                            Offset
                    CONTROL        (32-bit)                              0x00
                    TRANSMIT       (8-bit)                               0x04
                    RECEIVE         (8-bit)                              0x07
                    FLASH TIME (16-bit)                                  0x08
                    DARKEN TIME (16-bit)                                 0x0A

                Table III: The GLOB device register characteristics (sizes and offset)
                 have been chosen to empathize some of the problems of interfacing
                                   “C” and the device hardware.

                                Published CCI magazine, December 1998.
Also included in main.c file is a simple “C”
utility (void ShowTitle(void)) that will be called      .EXPORT START
directly from within the assembly language              .IMPORT _main
                                                        .IMPORT STKTOP, ResetInit
routine CallAsm(). Why spend time writing
assembly language output routines when a             START:
simple call to “C” will do the job?                        // Establish the stack needed for
                                                           //       “C”and assembly code
 #include <stdio.h>                                    MOVEA.L #STKTOP, SP
 void main(void);                                          // Call necessary initialization routines
 void CallAsm(void);                                   JSR ResetInit
 void ShowTitle(void);                                     // Transfer control to “C” main()
                                                       JSR _main
 void main(void) {                                         // Trap back to the system kernel
      // Switch to Assembly Code                       TRAP #15
          CallAsm ();                                  DC.W RETURN_TO_KERNEL
                                                     Figure 2: The stack and various other parameters
 void ShowTitle(void) {                                  are initialized before program control is
        printf(“ACME GLOB V1\n”);                        transferred to “C” from within the init.s
                                                                       assembly code.
  Figure 1: The main.c code contains a call to an
  assembly language routine (CallASM()) and a        Note one key concept in the init.s assembly
      “C” utility called from assembly code.         codes. For many compilers it is necessary to
                                                     use the function name _main rather than main
                                                     when transferring control to the “C” function
The init.s file shown in Figure 2 contains the       main(). The use of a coding convention with
68K code necessary to establish the system           names having a leading underscore will be all
stack used by both the “C” and the                   too familiar with programmers who have ever
programmer’s custom assembly code. It is this        received an error message after accidentally
code that is activated during the embedded           linking “C” code segments containing a
system startup prior to calling the first            missing function or misspelled name.
subroutine main().
                                                     Naming conventions are very language
Various important initializations (ResetInit())      specific. Any body attempting to link legacy
are also necessary before calling the first “C”      FORTRAN code (_MAIN_) will quickly find
subroutine (main()).After the main() routine         this out. Note also that the utilities are provided
has exited, it is necessary for program control      in a file named main.c (“C”) rather than
to be returned back to the embedded system’s         main.cpp (“C++”). This is because the naming
kernel. In this example, the return is achieved      convention for “C++” functions is far more
through the use of a TRAP instruction and an         complex than for “C” functions in order to be
associated parameter. Equivalent software            able to handle the function over-loading
interrupts to be used to transfer control back to    possible in “C++”. More on the concept of
an operating system (kernel or monitor) can be       name-mangling later.
found on other processors.
                                                     The final component of the GLOB device
                                                     prototype is the asm.s file which contains an

                               Published CCI magazine, December 1998.
assembly language routine to call other routines        . IMPORT _ShowTitle
written in both “C” and assembly code. There            . EXPORT _CallAsm, CallAsm
is little point in developing a complicated,
assembly language sequence to print out a title      // Provide two entry points to
when a simple “C” call will efficiently do the       //       each assembly code function
same job. Since the message will be limited by
the slow transmission rate over the serial line to          // “C” callable entry point
the display device, there is no speed advantage      _CallAsm:
                                                            // Natural assembler entry point
for the programmer to do any assembly coding.
                                                        JSR ResetDevice
Also note the practice of providing two entry           JSR _ShowTitle
points into each assembly code function. One            RTS
entry point (with the leading underscore) makes
it easy to call the subroutine from “C”. The            .EXPORT _ResetDevice, ResetDevice
other entry point (without the underscore) is               // void ResetDevice(void) {
more natural when calling the routine directly       _ResetDevice:
from other assembly code. I have developed a         ResetDevice:
coding practice to always provide both entry                // register long int *pt;
points whether needed or not. This makes code        pt SET A0
                                                            // pt = (long int *) BASEADDR;
maintainability that much easier to achieve and
                                                        MOVEA.L #BASEADDR, pt
avoids the common error of forgetting to code               // (Reset the CONTROL register)
the entry point you actually end up using in the        MOVE.L #RESET, CONTROL(pt)
final program.                                          RTS
                                                            // }

      HIDDEN ERRORS ALREADY?                         Figure 3: The asm.s code demonstrates how to
                                                        call both assembly language and “C” code
Although it does not seem likely in the 20 or so     routines from assembly code. Note that there is
lines of code we have so far developed, there is      already one possible source of error present in
in fact already one possible error source that                          the code
could crash the processor. This error is not
specific to the fact that we have mixed
assembly and “C” subroutines. It could occur at
any time when an algorithm is developed with         earlier subroutine (e.g main()) may be relying
one routine calling, and then returning from,        on the original value stored in A0 for some
another routine. However, the problem is far         critical, but non-obvious, purpose.
less obvious with mixed code.
                                                     Figure 4 shows two approaches to get around
In the file asm.s, we have made use of an            this problem. There are no register values
address register, A0, one of the limited             destroyed in ResetDeviceV2() where an
processor resources. During the function             absolute addressing mode instruction is used to
ResetDevice() we destroyed the original value        set the CONTROL register. In actual fact, this
held in this address register in order to be able    mode of operation generates code that runs
to access the hardware CONTROL register              faster than the original ResetDevice(). During
using an instruction with a convenient indirect      ResetDeviceV3() the original address register
addressing mode. However it is possible that an      value is first saved to the processor stack

                               Published CCI magazine, December 1998.
    (PROLOGUE), the register used, and then the           have made use of that particular address
    old value recovered from the stack                    register in their code. This leads to slow code
    (EPILOGUE).                                           whether it has been written in “C” or
      // Uses absolute addressing mode
ResetDeviceV2:                                            One approach around this problem is to identify
   MOVE.L #RESET, (BASEADDR + CONTROL)                    two classes of registers when your team writes
   RTS                                                    code. The volatile or temporary registers are
                                                          those that everybody in your team agrees will
      // Save and then recover the register value         not be used to hold useful values. These
ResetDeviceV3:                                            registers can be used within a subroutine
      // PROLOGUE                                         without the registers having to be saved to slow
   MOVE.L A0, -(SP)                                       external memory. The non-volatile registers
                                                          must be saved, and later recovered, if they are
pt SET A0                                                 used within a subroutine. The reason that
    MOVEA.L #BASEADDR, pt                                 registers are being discussed here is that
    MOVE.L #RESET, CONTROL(pt)                            placing frequently used variables into registers
                                                          is often the route to fast code. The on-processor
     // EPILOGUE                                          register-to-register operations are significantly
   MOVE.L (SP)+, A0                                       faster than external memory accesses.
                                                          Note the key-words in the paragraph above –
    Figure 4: There are many approaches to avoid          everybody on your team agrees. The
  destroying register values within a subroutine. An      designation of which processor registers are
       absolute addressing mode is used within            classified as volatile or non-volatile is totally
   ResetDeviceV2(). The register is saved and later       arbitrary. If your project requires little repeated
  recovered from the stack during ResetDeviceV3()         use of variables, then designate most of the
                        routine.                          processor registers as being freely available for
                                                          use. If a later project has different
                                                          characteristics, then change the register use
    The first approach is very inconvenient and           convention to optimize that code.
    would not lead to easily maintainable code if
    many adjacent hardware register locations had         The trouble with such a general approach is
    to be accessed. The second approach is looks          maintainability of code. What happens if you
    like overkill, especially as the address register,    decide to cut costs by reusing code segments
    A0, would not be in use to store anything             from one project in a later project? Selecting a
    useful during 8 subroutines out of 10.                totally arbitrary approach to designating
                                                          volatile registers lays your team open to
    However, for those 2 times out of 10 when the         problems. A better approach is to choose a
    register is in use to store some critical value,      convenient, but arbitrary, register classification
    your program is heading for never-never land.         and then stick with it.

    The trouble is that it is always necessary to         If one of your team is going to be the “C” or
    save all possibly-important registers upon entry      “C++” compiler, then it makes sense that the
    to each subroutine. You never know when one           arbitrary register convention to use is dictated
    or other members of your design team will             by the compiler. Even if you plan to code only

                                    Published CCI magazine, December 1998.
  in assembler, it is still necessary to adopt some      operation. This is demonstrated in Figure 5 for
  register usage approach if you actually intend         the function int IsLightOn(void) which returns
  to produce code within some reliable coding            a 1 in register D0 if the GLOB device
  process. I recommend to my students that they          fluorescence light is turned on. The routine
  adopt a C-compatible register convention at all        simply checks whether the LAMPON bit is set
  times.                                                 in the GLOB device’s CONTROL register (bit
  A C-compatible register usage convention is a
  balance between making the maximum number                    .EXPORT _IsLightOn, IsLightOn
  of volatile registers available for free usage
  while avoiding having to save frequently used                    // int IsLightOn(void) {
  variables from registers into (slow) external             IsLightOn:
  memory when you need to call a subroutine                 _IsLightOn:
  and have insufficient non-volatile registers                     // register long int *pt;
  available. On some windowed RISC                          pt SET A0
  processors, such as the SPARC, it is possible to                 // register long int temp;
  have a large number of volatile and non-                  temp SET D1
  volatile registers for general programming use                   // register int rtnvalue;
  and still have other registers available for              rtnvalue SET D0
  special operating system related operations.                     // #define LAMPON 0x200
  Other processors offer far fewer opportunities            LAMPON EQU 0x200
  for such flexible register usage. Table IV
  shows the volatile and non-volatile register                   // pt = (long int *)BASEADDR;
  allocations used for the SDS 68K “C”                         MOVEA.L #BASEADDR, pt
  compiler, with similar, arbitrary, selections                  // pt = pt + CONTROL;
  found with other compilers.                                    // temp = *pt;
                                                               MOVE.L CONTROL(pt), temp

         RETURNING PARAMETERS                                      // rtnvalue = 0;
                                                                MOVE.L #0, rtnvalue
  Many functions need to return a parameter.                       // if (temp & LAMPON)
  Since this parameter will typically be                           //     rtnvalue = 1;
  immediately used within the calling routine, it               AND.L #LAMPON, temp
  make sense for the return parameter to be                        // (Lamp is not on)
  placed in a (volatile) register for faster                    BEQ IsLightOnEXIT
                                                                MOVE.L #1, rtnvalue
                    Volatile        Non-volatile                   // return(rtnvalue);
                   Registers          Registers             IsLightOnEXIT:
SDS Compiler       D0, D1       D2 to D7                        RTS
                   A0, A1       A2 to A6, SP                       // }

Table IV: The designation of volatile and non-volatile       Figure 5: The function int IsLampOn(void)
 registers is an arbitrary convention that depends on        demonstrates the use of a volatile register to
    available processor resources and the balance             return a parameter. Note that there are two
 considered appropriate by the compiler developers.              hidden sources of error in this simple

                                 Published CCI magazine, December 1998.
Some compilers will return a pointer value in a        approaches cut out many STCE’s (Stupid Time
volatile address register and a data value in a        Consuming Errors).
volatile data register. This is not the case for
the SDS compiler which returns both pointers
and data in the (volatile) data register D0. It is         HIDDEN POINTER ARITHMETIC
easy to pick examples where one or other of                        PROBLEMS
these two approaches leads to a speed
advantage.                                             The second possible error in figure 5 is a lot
                                                       more subtle. Suppose we want to write the
Very long variables or complex numbers (64             function long int ReturnTimerValues(void)
bit) may be returned using two volatile                which accesses a 32-bit hardware register
registers. Structures are typically returned by        corresponding to both the 16 bit FLASH and
moving the return address down the stack, and          DARKEN timer register values. This code is
placing the structure above the return address.        shown as Figure 6.
It is then the responsibility of the programmer
of the calling routine to pull the structure from        .EXPORT _ReturnTimerValues
the stack and make the necessary stack pointer           .EXPORT ReturnTimerValues
                                                             // long int ReturnTimerValues(void) {
Even the simple code shown for the function           ReturnTimerValues:
int IsLightOn() has two possible sources of           _ReturnTimerValues:
error present in its seven lines of code. A                  // register long int *pt;
possible error arises in the definition of the size   pt SET A0
of a variable of type int. With a 16-bit variant             // register long int rtnvalue;
of the 68K processor and an algorithm using           rtnvalue SET D0
only small numbers, there is considerable speed              // #define FLASH 0x08
advantage for using 16 bit integer operations         FLASH EQU 0x08
rather than 32 bit. On 32-bit processors there is
no speed disadvantage for using 32 bit integer             // pt = (long int *)BASEADDR;
operations capable of handling large values              MOVEA.L #BASEADDRESS, pt
without possible overflow. Which type of int is            // Grab both 16-bit registers at once
intended for the function int IsLightOn(), 16 or           // pt = pt + FLASH;
32 bit? Most “C” compilers have options to                 // return(*pt);
accept either, which can lead to further code            MOVE.L FLASH(pt), rtnvalue
compatibility problems.                                  RTS
                                                           // }
When I planning to mix “C” and assembly code
functions, I adopt a process where I never use                   Figure 6: This function
int variables in any of my code. I specify long       long int ReturnTimerValues(void) to access a
int when I intend to manipulate 32-bit variables      32-bit hardware register works correctly at the
and short int when I want 16-bit variables. I          assembly code level but not at the “C” code
also take the time to explicitly state the size of                        level.
variable being manipulated in each assembly
language instruction (MOVE.L and MOVE.W)
rather than relying on the default extension
(MOVE means MOVE.W). I find these two

                                Published CCI magazine, December 1998.
Look back at the code from Figure 5. Both           and not by the offset defined in Table 3
assembler code and the code that would be
generated from the compiled “C” comments.           register long int *pt;
Both code fragments would correctly execute,        register long int rtnvalue;
despite the hidden errors.                          #define FLASH 2 // !!!!!!!!!!!
                                                        pt = (long int *)BASEADDR;
Now closely examine the code from Figure 6.             pt = pt + FLASH;
The code written by the assembly language              rtnvalue = *pt;
programmer will work as expected. However,
if a code maintainer decided to upgrade the         Pointer arithmetic in assembly code is based (in
program to use the supplied “C” comments            most cases) around byte arithmetic so that the
then the code would quit functioning correctly!     TIMER registers are offset by 8 bytes from the
The difference between the “C” code                 hardware BASEADDRESS. However, pointer
comments being functional in Figure 5 and           arithmetic in “C” is based around the type of
invalid in Figure 6 is the ancient friend/enemy     variable being pointed to. Thus we have that if
of the programmer – LUCK!                           OFFSET = 1 then incrementing a pointer by an
                                                    amount OFFSET in “C” will change the pointer
If you wish to access a 32 bit register that is     value by 1 (char *), by 2 (short int *), by 4
offset from a hardware base address by 8 bytes,     (long int *) or possibly by some very strange
then the correct assembly language code             amount (struct mystruct *). This means that
sequence is                                         there must be two register offset definition
                                                    files, one for “C” and the other for assembly
pt SET A0                                           code, in order to take into account the way that
rtnvalue SET D0                                     “C” handles pointers.
    MOVEA.L #BASEADDR, pt                           The code in figure 5 worked as the offset to the
    MOVE.L FLASH(pt), rtnvalue                      CONTROL register was, by fortune-chance, 0
                                                    whether the offset be measured in bytes, short
However, if you want to write the same code in      int or long int. On the other hand, the offset to
“C”, then the correct sequence is not               the TIMER registers used for Figure 6 was 8
                                                    bytes but only 2 long ints.
register long int *pt;
register long int rtnvalue;                         This sort of problem is one of the reasons that it
#define FLASH 0x08                                  is recommended that hardware register access
    pt = (long int *)BASEADDR;                      be handled in assembly code directly rather
    pt = pt + FLASH;                                than in “C”. However, with many hardware
    rtnvalue = *pt;                                 registers on “real” devices remaining 8 bits
                                                    wide, much “C” code has been written without
The correct sequence must take into account         the programmer or code reviewer being aware
the standard relationship in “C” between            of the problems that would occur if the
pointer value changes and the type of “C”           hardware was upgraded to a 16-bit version.
variable being pointed to! The constant
FLASH must be defined as

            0x08 / sizeof (long int)

                              Published CCI magazine, December 1998.
PASSING SUBROUTINE PARAMETERS                        maintainability and the speed of straight line
A common requirement in a program is the
ability to pass parameters between a “C” code        Figure 7 shows one of the approaches that can
routine and an assembly code routine. You may        be taken with SDS 68K compiler to pass
pass a pointer and a data value to control two       parameters to the “C” routines
identical devices placed at different                void PassMany(long int outpar1, short int
BASEADDRESS’s on a processor bus. For                outpar2, long int *outpar3) and
example, controlling the DARKEN timer                void PassOne(long int outpar1) from within an
register for the glasses of the detective and her    assembly language code sequence.
assistant who are both using the GLOB device.
                                                        First a stack frame is established. A
There are again a wide variety of approaches to          compiler can keep track of what variables
handle parameter passing                                 are (or are not) pushed onto the memory
                                                         stack. However, if the programmer makes
   Parameters are passed on the memory stack            adjustment to code where a variety of
   Parameters are passed in registers,                  values are pushed onto a constantly
    especially with windowed processors such             changing stack, it is very easy to introduce
    as the SPARC                                         error. Having a fixed stack frame size
   Parameters are passed using a combination            determined by the maximum number of
    of both registers and memory stack                   parameters to be passed can frequently
                                                         avoid such errors and offer speed
Originally, the concept behind subroutines was           advantages.
to avoid rather repeated coded sections.                Note that there is space allocated on the
However, common coding practice is now to                stack for all local variables despite the fact
use subroutines as a method of abstracting the           that many of the variables will be optimized
ideas to make the code more maintainable – the           directly into registers. This is useful for
concept of ensuring that any code does not               code maintainability. There is no speed
contain more than 7 +- 2 ideas.                          disadvantage of adjusting the stack pointer
                                                         by 200 bytes rather than 48 bytes. If stack
Many compilers can be made to analyze the                space became a very limited resource, then
code in a file to determine whether a call to a          go back and remove the unnecessary
subroutine can be optimized by replacement               storage locations after the code became
with in-line code. In many situations when a             stable.
call is made to short subroutines, the subroutine       Copies, rather than originals, of variables
code may be physically placed many times                 are passed as the “C” parameters. I often
within the main code and the final code to be            joke that Kernighan and Ritchie must have
placed in ROM still may be shorter than the              been Pepsi rather than Coca-Cola drinkers
code needed to pass parameters the standard              when they specified the “C” language .
way. In the newer language extensions, it is             They used The Real-Thing as they only
possible for the high-level language                     ever passed a copy. Many optimizing
programmer to specify that a subroutine be               compilers will account for memory and
handled in this INLINE manner (see Table III).           register usage by placing the originals
This automated approach handled by the “C”               directly into the outgoing parameter loation
compiler combines both the advantage of code             when this offers a speed advantage.
                                                         Obviously, this should only happen if the

                               Published CCI magazine, December 1998.
                                                               // void CodeExample(long int value) {
    value is no longer needed after the
    subroutine. Remember that incoming and                    // Stack Frame Definition
    out-going parameters will be treated as              INPAR1 SET 8
    volatile variables by the compiler.                       // Old Return Address SET 4
   Note that when address of a variable is                   // Old Frame Pointer Location SET 0
    passed, it is the address of the local variable           // long int var3; (optimize to D0)
    on the stack that is passed. The actual              VAR3 SET 16
    memory value must be pulled back into a              var3 SET D0
    register before use.                                      // short int var2; (optimize to D1)
                                                         VAR2 SET 12
   Parameters are promoted to long before
                                                         var2 SET D1
    being passed as parameters. In order to gain
                                                         OUTPAR3 SET 8
    code speed, the promotion of short int
                                                         OUTPAR2 SET 4
    outpar2 to a long in Figure 7 occurs in an
                                                         OUTPAR1 SET 0
    implicit manner (using a MOVE.W
    instruction to an offset stack location)
                                                               // (Establish the stack frame)
    rather than an explicit manner (EXT.L
                                                            LINK A6, #-20
    followed by MOVE.L). Make sure that your
                                                               // var3 = 2; var2 = 4;
    team is familiar with this optimization and
                                                            MOVE.L #2, var3
    does not assume that the top 16 bits of the
                                                            MOVE.W #4, var2
    passed parameter are what is needed!
   The OUTPARAMETERs of the calling
                                                               // PassMany(value + 2, var2, &var3);
    subroutine become the INPARAMETERs
                                                               .IMPORT _PassMany
    of the called subroutine. When passing
                                                            ADD.L #2, INPAR1(SP)
    parameters between “C” and assembly
                                                            MOVE.L INPAR1(SP), OUTPAR1(SP)
    code, it is very important to understand the
                                                            MOVE.W var2, (OUTPAR2 + 2)(SP)
    processor architecture associated with the
                                                               // Store the variable,
    CALL-TO-SUBROUTINE instruction. On a
                                                            MOVE.L var3, VAR3(SP)
    windowed processor the return address is
                                                               // Generate its address to pass
    part of the stack frame the programmer
                                                            LEA VAR3(SP), A0
    establishes. Many DSP processors use a
                                                            MOVE.L A0, OUTPAR3(SP)
    hardware stack for return address storage.
                                                            JSR _PassMany
    On other processors) the stack frame is
                                                               // Recover the variable
    modified by the addition of a return address
                                                            MOVE.L VAR3(SP), var3
    onto the stack by the CALL-TO-
    SUBROUTINE instruction.
                                                               // PassOne(value + 2);
   Some processors have stack pointers that                .IMPORT _PassOne
    point to the last used stack location whilst
                                                            MOVE.L INPAR1(SP), OUTPAR1(SP)
    others point to the next empty location.                JSR _PassOne
    These variants mean that the assembly
    language programmer must ensure correct                   // Destroy the stack frame
    stack usage to avoid unintentionally passing            UNLK A6
    or using the wrong parameter.                           RTS
   Prior to exiting the calling subroutine, the
    local variables and OUTPARAMETERS                     Figure 7: Examples of passing one or many
    must be removed from the stack, possibly              parameters between “C” and assembly code
    using a frame pointer register (A6).                                  routines.

                                Published CCI magazine, December 1998.
               ARRAY OPERATIONS                              Constant arrays (style4[]) can be found in a
                                                              memory section set aside for constant
Figure 8 shows the wide range of different                    values. However, be careful if you are one
styles of arrays that it is possible to describe              of those programmers that initializes a
within “C”. It is important to recognize what                 string variable (style4[]) and then changes
will happen at the assembly code level for each               its contents during the course of the
type of array.                                                program. Some compilers will use the same
                                                              memory for both the array style4[] and the
    char style4[] = “Hello World”;                            “Hello World” array used as a printf()
    short int style3[200];                                    parameter.
                                                             Arrays (style5[]) generated using calls to
    char * DemoCode(void) {                                   “C” memory allocation functions such as
       long int style1[100];                                  malloc() or the “C++” new operator exist
       static short int style2[100];                          within a memory section called The Heap.
       short int style6[10] = {1, 2, 3, 4};                   Provided that the memory allocation for the
                                                              array is not freed, the starting address will
          char *style5 = malloc(200);                         be fixed, although a function of when the
                                                              malloc() call is performed.
          Func1(style1, style2[3], style4);                  Initialization of variables, including arrays
          printf(“Hello World”);                              (style6[]), can occur in a number of ways.
          return(style5);                                     Downloading code using S-Records
    }                                                         generated from the SDS compiler places the
                                                              values directly into the array. This can
         Figure 8: Each different “C” array type              cause a problem if the code is rerun without
        requires a different underlying assembly              being downloaded a second time. A better
           language programming construct to                  approach, also available from the SDS
                       implement.                             compiler, is to store these initialization
                                                              constants in ROM and copy them into the
                                                              variables as part of the ResetInit() routine
       Arrays such as style1[] come into existence           used in init.s (Figure 2).
        by allocating space on the stack. Such
        automatic arrays only exist while the             The programmer should also be aware of a
        function containing them exists and               number of other “C” array conventions that can
        therefore do not have a fixed starting            sneak up and bite the unwary. You must
        address. If the function exits, then the space    allocate space for the END-OF-STRING
        on the stack is de-allocated and the array        character at the end of a string array, and then
        (and any useful values it contains) vanishes      remember to pack the array (style4[]) with
        into the bit-bucket.                              additional NULL characters so that the next
       Static (style2[]) and global (style3[]) arrays    integer array (style3[]) allocated within the
        exist independently of the stack and are          code starts at the proper word (16-bit) or long-
        located in a RAM section specifically set         word (32-bit) boundary as is appropriate for the
        aside for all static and global variables.        processor.
        These arrays will have a fixed starting
        address once the program is loaded into

                                    Published CCI magazine, December 1998.
                                                     volatile long int *pt = (volatile long int
We have already discussed one problem with           *0xA0000)       // Hardware register
using “C” to access hardware registers, the          long int sum = 0;
difficulty associated with pointer arithmetic.               for (count = 0; count < max; count++)
However, there is another equally serious                            sum += *pt;
problem that can occur when accessing a                      sum = sum >> 3;
hardware register within a loop. Consider the
following “C” code to generate an average of 8
readings of a hardware input register                OTHER THINGS THAT CAN GO “BUMP
                                                              IN THE DARK”
        // Hardware register
long int *pt = (long int *0xA0000)                   I’ve covered many of the important things that
long int sum = 0;                                    must be considered when trying to cross-link
        for (count = 0;                              “C” and assembly code programs. However,
                count < max; count++) {              there are many more important things that can
            sum += *pt;                              creep up on the unwary. Unfortunately, many
        }                                            of the problems are compiler and processor
        sum = sum >> 3;                              dependent. Here are a few that have given me a
                                                     few sleepless nights.
An optimizing “C” compiler may rewrite this
code into a form essentially equivalent to           Many programmers use the compiler’s –S
                                                     option to generate assembly code from “C”
        // Hardware register                         code as a starting point for producing
long int *pt = (long int *0xA0000)                   customized code. These programmers should
long int temp;                                       be aware of the many possible trade-offs that
long int sum = 0;                                    can occur during compiling.
        temp = *pt;
        sum = (temp * max) >> 3;                     For example, one optimization of the SDS
                                                     compiler is to make use of the RTD rather than
The rational behind this optimization is that the    the RTS instruction. The RTD instruction not
pointer pt is always accessing the same              only pulls of the return address from the stack,
memory location and therefore the same value         but also a specified number of pushed
should always be returned. The loop can              parameters. This approach can reduce the
therefore be optimized by bringing all constants     number of instructions to be stored in program
outside the loop. Normally, this assumption          ROM as the stack is adjusted within one
would be valid if standard memory operations         commonly called routine rather than repeatedly
were being performed. However, in this case,         during each calling routine.
the pointer is being used to access a hardware
register whose value could be changing under         Other code optimizations (see Table II)
the influence of some external operation.            performed by the “C” compiler can be an
                                                     inconvenience if you want to use the “C” code
In such cases, it is necessary to use the            assembler listing as a starting point for future
keyword volatile to ensure that the memory           optimized code. Dead-code removal
location is continually accessed at each step        optimization may cause values you want to
around the loop.                                     vanish before you get the change to use them in

                               Published CCI magazine, December 1998.
a customized way. One advantage of using a           _TEXT
frame pointer is that the position of the             ;    void Asm(void);
incoming parameters stays constant (relative to       ;    void main(void) {
the frame pointer) regardless of how much the              assume cs:_TEXT
stack is adjusted. However, use of the frame         _main proc near
pointer has some speed and stack                           push bp
disadvantages. Some compilers make use of a                mov bp,sp
virtual frame pointer to overcome these               ;           CallAsm();
problems which causes the generation of some               call   near ptr _CallAsm
very interesting code.                                ;    }
                                                           pop    bp
Figure 9 shows the complications that can                  ret
occur if the programmer decides to link              _main endp
between “C++” and assembly code. The upper           _TEXT        ends
assembly language sequence is generated by
placing the “C” code from Figure 1 into a file       _TEXT
called main.c before activating the Borland           ;    void Asm(void);
X86 Visual C++ compiler. The generated                ;    void main(void) {
subroutine name _CallAsm starts with the                   assume cs:_TEXT
anticipated leading underscore.                      _main proc near
                                                           push bp
The lower assembly language sequence is                    mov bp,sp
produced from the same “C” code using the             ;           CallAsm();
same compiler but with the code placed into a              call   near ptr @CallAsm$qv
file main.cpp. Note the change in function            ;    }
name from _CallAsm to @CallAsm$qv. This                    pop    bp
name mangling means that it is possible in C++             ret
to distinguish between functions with the same       _main endp
name but different number of parameters              _TEXT        ends
passed (function overloading). The concept is
straight forward enough but can cause                  Figure 9: The upper and lower X86
problems for the programmer trying to link an          code sequences were both generated
object file generated from a C++ subroutine             from the same “C” code (figure 1).
with a hand crafted, custom assembly language         Upper code is produced by invoking a
routine.                                              “C” compiler translation and the lower
                                                          code is obtained from a “C++”
                                                              translation (Borland).

                              Published CCI magazine, December 1998.

In this article I’ve demonstrated the ease with
which it possible to link “C” code routines with
assembly code routines. A programmer having
the skill to mix-and-match these routines is
provided with the ability to generate a large
amount of code quickly, but yet customize
those portions that need customizing.
Knowledge of “C” coding conventions also
provides a useful framework upon which to
place your assembly language programs to
provide fast, but easily maintainable, code.


Mike Smith is an instructor within the
Department of Electrical and Computer
Engineering at the University of Calgary in
Canada where he teaches about embedded
systems and does research into high speed
hardware and software applications in
telecommunications and bio-engineering. If
you wish to annoy him with your own ill-
reasoned arguments about the relative merits of
assembly language and “C” programming, he
can be contacted at

                              Published CCI magazine, December 1998.

Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail you!