A Radix-4 Redundant Cordic Algorithm with Fast On-line Variable

Document Sample
A Radix-4 Redundant Cordic Algorithm with Fast On-line Variable Powered By Docstoc
					    A RADIX-4 REDUNDANT CORDIC ALGORITHM WITH FAST ON-LINE
              VARIABLE SCALE FACTOR COMPENSATION

                        Chieh-Chih Li                                                Sau-Gee Chen
          Opto-Electronics & Systems Laboratories                       Department of Electronics Engineering
          Industrial Technology Research Institute                         National Chiao Tung University
                   Hsinchu, Taiwan, ROC                                        Hsinchu, Taiwan, ROC
                                                                           E-mail: sgchen@cc.nctu.edu.tw


                      ABSTRACT                                  fast decision of rotation direction with only a few most
                                                                significant digits (MSDs) of the related parameters [3-8];
     In this work, a fast radix-4 redundant CORDIC algo-        (3) skipping unnecessary rotations; (4) recoding rotation
rithm with variable scale factor is proposed. The algorithm     angle for saving rotation iterations; and (5) applying radix-
includes an on-line scale factor decomposition algorithm        4 rotation scheme [5,10,13]. The the 2nd to 4th techniques
that transforms the complicated variable scale factor into a    result in variable scale factors. Variable scale factors have
sequence of simple shift-and-add operations and does the        the trouble of complicated scale factor computation fol-
variable scale factor compensation in the same fashion. On      lowed by penalty compensation [7,8]. Due to the consider-
the other hand, the on-line decomposition algorithm itself      able overhead generated by variable scale factor, the exist-
can be realized with a simple and fast hardware. The new        ing radix-4 CORDIC algorithms resort to constant scale
CORDIC algorithm has the smallest number of 0.8n itera-         factor approach [5,10,13]. However, these constant scale-
tions among all the CORDIC algorithms, which requires           factor CORDICs are not pure radix-4 algorithms. In fact,
only about two-third rotation number that of the existing       they are all hybrid radix-2 and radix-4 algorithms. As a
best (hybrid radix-2 and radix-4) redundant algorithms.         result, all these approaches have minorly reduced iteration
Therefore, the new algorithm achieves fast rotation itera-      numbers, at the cost of control overheads. Ideally, a pure
tions, high-speed and low-overhead scale factor compensa-       radix-4 algorithm would achieve the best performance.
tions, which are hard to attain simultaneously for the exist-        To alleviate the mentioned disadvantages related to
ing algorithms. The on-line scale factor compensation can       prior arts, a pure fast radix-4 redundant CORDIC algo-
be also applied to the existing on-line CORDIC algo-            rithm with variable scale factor is proposed. The algorithm
rithms.                                                         includes an on-line variable scale factor decomposition
                                                                algorithm that transforms the complicated variable scale
                 1. INTRODUCTION                                         n/2           2 −4 i − 2
     CORDIC [1,2] algorithm is an efficient scheme for          factor ∏ /1 (1 + δ i 2            ) into a sequence of simple
                                                                        i=0
computing elementary functions especially for the trigo-                                     n/ 2         −2 i −1
nometric functions. Since the algorithm can be realized as      shift-and-add operations of ∏ (1 + s i 2          ) in an on-
a sequence of shift-and-add operations followed by a scale                                   i =0
factor compensation operation, it is very suited for VLSI       line fashion, where δi, si ∈{-2,-1,0,1,2}. Here si only de-
implementation and widely applied to DSP applications.          pends on δi. Both δi and si can be easily determined by
     Most of the CORDIC algorithms assume a constant
                                                                estimating their corresponding intermediate variables with
scale factor for the ease of scale factor compensation.
                                                                very short wordlength. In all, the new algorithm has the
However, they have to either do an accurate but slow deci-
                                                                smallest number 0.8n of shift-and-add steps among all the
sion operation for rotation direction or do rough direction
                                                                CORDIC algorithms. Therefore, the new CORDIC algo-
decision at the expense of extra compensation operations
                                                                rithm achieves fast rotation iterations, high-speed and low-
[4], [6]. In addition, they have to rotate even when the
                                                                overhead scale factor compensations, which are hard to
rotation angle has been converged. To speedup CORDIC
                                                                attain simultaneously for the existing algorithms. The on-
operations, the following techniques are widely used: (1)
                                                                line scale factor compensation can be also applied to the
applying carry-free redundant addition scheme [3-8]; (2)
                                                                existing on-line CORDIC algorithms.
2. THE NEW RADIX-4 CORDIC ALGORITHM                                                                                        ^
                                                                                                             2  if 3 / 4 < W i
         FOR ROTATION MODE                                                                                  
     Here, the new redundant CORDIC algorithm to be                                                                        ^
proposed is based on the fast signed-digit addition (SDA)
                                                                                                            1 if 1 / 4 ≤ W i ≤ 3 / 4
[12]. The proposed radix-4 rotation mode algorithm for
                                                                                                            
                                                                                                                             ^
initial vector of [X0,Y0] to be rotated by an angle of Z0 is                                          s i = 0   if - 1 / 4 < W i < 1 / 4   ,
given as follows:
                                                                                                                                ^
                                                                                                            −1 if - 13 / 16 ≤ W i ≤ −1 / 4
for i=0 to n/2+1                                                                                                    ^
     Xi+1=Xi+δi2-2i-1Yi,        Yi+1=Yi-δi2-2i-1Xi,                                                         −2 if W i < −13 / 16
                                                                                                            
                                                                                                            
       Ri+1=4(Ri-22itan-1δi2-2i-1) =22(i+1)Zi+1,                                                ^
The final scale factor is                                                              where W i is the five most-significant fractional digits of
                                                                                       Wi. The scale factor compensation can then be combined
         n/2
K −1 = ∏ 1 / (1 + δ i2 2 −4 i − 2 ) =                                                  with rotation iteration or executed after all the rotation
         i=0                                                                           iterations are finished.
                n/2

           
                ∏ (1 + si 2 −2 i−1 )
                  i =1
                                              if δ 0 = 0
                                                         .
                                                                                       3. THE NEW RADIX-4 CORDIC ALGORITHM
                      n/2
                                                                                                FOR VECTORING MODE
           (1 − 2 )∏ (1 + si 2
                    −2              −2 i −1
                                            ) if δ 0 ≠ 0                                    Since the iterated vectors are scaled in magnitude in
           
                      i =1
                                                                                       each iteration and can only be tested after rotation, the
A simple selection rules (derived in Appendix B) for δi is                             decision operations are slower and more complicated than
as follows,                                                                            that of the rotation mode. For this reason, the proposed
                         ^                                                            new vectoring mode algorithm is still a hybrid radix-2 and
               2 if R i > 5 / 8
                                                                                      radix-4 one. However, the new algorithm reduces radix-2
                                       ^                                              iterations to four which is much smaller than the existing
              1             if 1 / 4 ≤ R i ≤ 5 / 8                                    n/2. Derivation of the new algorithm is more involved than
              
                                         ^
                                                                                       and similar to the rotation mode algortihm.
        δ i = 0             if − 1 / 4 < R i < 1 / 4                                       The new algorithm starts with four radix-2 iterations
                                         ^
                                                                                       based on the Ercegovac and Lang’s algorithm [7], fol-
              -1            if − 5 / 8 ≤ R i ≤ −1 / 4
                                                                                       lowed by (n-4)/2+1 radix-4 iterations based on a fixed
                                                                                      selection rule as follows, for i=0,1,...,(n-4)/2
              -2               ^                                                                                 ^ *     ^
                             if R i < −5 / 8                                                              2    if W i > 3 X 4 / 2
              
                                                              .                                                        ^     ^ *     ^
      ^
where R i consists of the three most-significant fractional                                               1    if 1 / 2 X 4 ≤ W i ≤ 3 X 4 / 2
                                                                                                      * 
digits of Ri. On the other hand, a simple selection rule                                            δ i = 0    otherwise
(derived in Appendix A) for s i can be obtained by defin-                                                             ^         ^ *     ^
                                                                                                          -1   if − 3 X 4 / 2 ≤ W i ≤ − X 4 / 2
ing the following iterative operations:                                                                           ^ *      ^
      = 4[ Wi − 2
                    2 i +1
                             ln(1 + δ
                                        2      −4( i +1)      2i
                                                         ) − 2 ln(1 + si 2
                                                                           −2 i −1 ,                      -2   if W i < −3 X 4 / 2
W
 i +1                                   i +1
                                             2                                    )]
                                                                                                                                                  ,
                                   −i                                                                                      * −4 i − 2 r *
                Ai 1 = Ai (1 + si 2 )                                                                       *        *
                                                                                                          X i +1 = X i + δ i 2
                  +                                                                                                                    Wi ,
             −1        2 −2
where W0 = −2 ln(1 + δ 0 2 ) , δ 0 ∈ {0, ±1, ±2} , A0=1,
                                                                                                          *         *    * *         i +1 *
                                                                                                        Wi 1 = 4( Wi − δ i X i ) = 4     Yi 1 ,
                                                                                                          +                                +
K-1= An/2+1           for n-bit precision.                                                               *       *      −1 * −2 i − r
                                                                                                          Zi +1 = Zi + tan δ i 2
                                                                                                ^ *   ^
                                                                                       where    W and X 4       are the 6 and 5 most significant frac-
                                                                                                  i
                                                                                       tional   digits     of       *    and     X4     respectivley,   and
                                                                                                                  Wi
  *          *          *                                                            On the other hand, in average, the hybrid radix-2 and
X 0 = X 4 , Y0 = Y4 , Z 0 = Z 4 ; X 4 , Y4 , Z 4 are the results
                                                                                radix-4 algortihms in [5] and [10] needs n/2 radix-2 itera-
from the first 4 radix-2 iterations. The final result                           tions, (n/4) ×4/5 radix-4 iterations and n/4 iterations for
 X
   *            =K, R = K X 2 + Y 2 , Z *              =tan-1X0/Y0.             scale factor compensation. These amount to 0.95n shift-
   ( n− 4)/ 2+1            0     0     ( n − 4 )/ 2 +1
                                                                                and-add operations for those two algorithms. Note that the
The resulted variable scale factor decomposition can be                         algorithm in [10] is based on slower non-redundant addi-
performed similarly to the rotation mode on-line decompo-                       tions.
sition algorithm.                                                                    The double rotation method [4] needs 2.25n basic
       4. PERFORMANCE COMPARISONS                                               steps where 2n steps for rotations and 0.25n for scale fac-
                                                                                tor compensation. The branch CORDIC algorithm [6]
     To compare different redundant CORDICs for rotation                        needs 1.25n basic steps where n steps for rotations and
mode, we assume that a basic iteration step consists of a                       0.25n for scale factor compensation. However, this algo-
shift operation and a 4-2 SDA. Combined with CORDIC                             rithm needs two copies of CORDIC operated in parallel.
rotation iterations, the new scale factor decomposition                         Hence, in fact this algorithm needs 2.5n basic steps.
algortithm can compensate the final results in two different                         Table 1 summaries the comparison results. In the table,
schemes:                                                                        all the algorithms are assumed realized with unfolded
•    Scheme-I: The n/2 additional shift-and-add compen-                         (sequential) hardwares, which mainly consist of required
     sation operations are performed right after all the n/2                    barrel shifters, adders and ROM tables excluding other
     redundant CORDIC iterations have been done, namely                         minor components. As shown, the new radix-4 redundant
          X i*+1 = (1 + si 2 −2 i −1 ) X i* , Yi *1 = (1 + si 2 −2i −1 )Yi*     CORDIC have the best performance. The comparison
                                                 +
                                                                                statistic for all existing vectoring-mode CORDICs have the
      where X 0 = X n / 2 and Y0* = Yn / 2 , Xn/2 and Yn/2 are
               *
                                                                                similar performance results as the rotation mode.
     the CORDIC rotation results before scale factor com-
     pensation. The final compensated results are                                                   5. CONCLUSION
         *                   *                                                       The new CORDIC algorithm achieves the best per-
     X   n / 2 +1   and Y   n / 2 +1 .   Consequently, both rotation and
                                                                                formance among all the existing algorithms in terms of
     compensation operations need n/2 shift-and-add op-
                                                                                iteration number and hardware complexity. The algorithm
     erations. However, the probability of nonzero δi, si
                                                                                can be applied to the computation of hyperbolic functions
     ∈{-2,-1,0,1,2} is 4/5. As a result, in average there are                   as well. Moreover, the new algorithm includes a ROM
     (n/2)×2×4/5=0.8n shift-and-add operations.                                 table of ln(1 + si 2 −2 i −1 ) which can be utilized to compute
•    Scheme-II: Each compensation iteration is performed
                                                                                logarithm and exponential functions, and in turns the hy-
     and combined with the rotation iteration immedialtely
                                                                                perbolic functions by using the well-known CCM algo-
     after its corresponding si has been determined, that is
                                                                                rithm. Doing this way, no scale factor compensation is
                    X i +1 = (1 + si 2 −2 i −1 )( X i − δ i 2 −2 i −1 Yi )) ,   required. As a result, a unified algorithm for the computa-
                    Yi +1 = (1 + si 2 −2 i −1 )(Yi + δ i 2 −2 i −1 X i .        tion of a broad set of elementary functions can be obtained,
                                                                                which is under further investigation.
     Similarly, there are 0.8n shift-and-add operations for
     this CORDIC operations.
                                                                                     989-995, Sept. 1991.
REFERENCES                                                                      \6^! J. A. Lee and T. Lang, “Constant-Factor Redundant
\2^! J. E. Volder, “The CORDIC trigonometric comput-                                 CORDIC for Angle Calculation and Rotation,”
     ing technique,” IRE Trans. Electronic Comput., vol.                             IEEE Trans. Comput., vol. 41, no. 8, pp. 1016-1015.
     EC-8, no. 3, pp. 330-334, 1959.                                            \7^! J. Duprat and J. M. Muller, “The cordic algorithm:
\3^! J. S. Wather, “A unified algorithm for elementary                               New results for fast VLSI implementation,” IEEE
     functions,” AFIPS Spring Joint Comput. Conf., pp.                               Trans. on Comput., vol. 42, no. 2, pp. 168-178, Feb.
     379-385, 1971.                                                                  1993.
\4^! J. R. Cavallaro and N. D. Hemkumar, “Redundant                             \8^! M. D. Ercegovac and T. Lang, “Redundant and on-
     and On-line CORDIC for Unitary Transformations,”                                line CORDIC: Application to matrix triangulariza-
     IEEE Trans. Comput., vol. 43, no. 8, pp. 941-954,                               tion and SVD,” IEEE Trans. Comput., vol. 39, n0. 6,
     August 1994.                                                                    pp. 725-740, June 1990.
\5^! N. Takagi, et al., “Redundant CORDIC methods                               \9^! H. X Lin and H. J. Sips, “ON-Line CORDIC Algo-
     with a constant scale factor for sine and cosine com-                           rithms,” IEEE Trans. Comput., vol. 39, no. 8, pp.
     putation,” IEEE Trans. Comput., vol. 40, no. 9, pp.
     1038-1052, Aug. 1990.                                                         view,” in Proc. SPIE Real-Time Signal Processing,
\:^! T. C. Chen, “Automatic computation of logarithms,                             495 (VII), Aug. 1984, pp. 86-93.
     exponential, ratios and square roots,” IBM J. Res.                       \23^!K. Hwang, Computer Arithmetic, Principles, Archi-
     and Dev., vol. 16, pp. 380-388, 1972.                                         tecture, and Design. New York: Wiley, 1979.
\21^!M. R. D. Rodrigues et. al, “Hardware evaluation of                       [13] D. Timmermann et. al, “Low Latency Time COR-
     mathematical functions,” IEE Proc., vol. 128, Pt. E,                          DIC Algorithms,” IEEE Trans. Comput., vol. 41,
     no. 4, July 1981.                                                             no. 8, pp. 110-115, Aug. 1992.
\22^!M. D. Ercegovac, “On-line arithmetic: An over-
                                                                              ln(1+si2-2i-1) doesn’t exist when i=0 and s0= -2. To solve
                      APPENDIX A
                                                                              this problem and ensure convergence, we introduce the
Derivation of the radix-4 on-line decomposition algo-
                                                                              initial steps as follows,
rithm for variable scale factor
                                                                                                                  if δ = 0
                                                                                             −2 i −1 
The decisoin of si=k should make Wi+1 remain bounded                                                         1
                                                                                                                      0
                                                                                    1+ s 2          =         −2          .
in [L-2,U2] whenever Wi is in the interval [Lk,Uk]. Then                                  0             (1 − 2 ) if δ ≠ 0
                                                                                                                     0
the following equations have to be satisfied:
               U 2 = 4(U k − 2 2 i ln(1 + k 2 −2 i −1 ))                                              APPENDIX B
                                                                              Derivation of selection rule for rotation direction of the
L−2 + 2 2 i +1 ln(1 + 2 −4 ( i +1) ) = 4( Lk − 2 2 i ln(1 + k 2 −2 i −1 )).   radix-4 CORDIC in rotation mode
By substituting k=-2, -1, 0, 1, 2 successively into the above                 Similar to the on-line scale factor decompostion algorithm,
equations, the exact bounds for i≥2 are found to be:                          the decision rules for rotation direction is to make Ri+1
     2 2 i +2                              22i
U2 =          ln(1 + 2−2i ) ≥ 12933 ,U 0 =
                               .               ln(1 + 2−2i ) ≥ 0.3233         still bounded if Ri is bounded in the interval [Lk,Uk].
        3                                   3
               2 2i                                                           Therefore, Lk and Uk can be found from U2=4(Uk-22itan-
        U1 =        ln(1 + 2 −2 i ) + 2 2 i ln(1 + 2 −2 i−1 )≥ 0.8157
                3                                                             1k2-2i-1) and L =4(L -22itan-1k2-2i-1). And the smallest
                                                                                               -2     k
               22 i
     U −1 =         ln(1 + 2 −2i ) + 2 2i ln(1 − 2 −2i −1 ) ≥ −0.1846         (largest) values of Uk (Lk) can be found by letting i=0.
                3
                                                                              Specifically,
                22 i                                                                                    π
      U −2   =       ln(1 + 2−2 i ) + 22 i ln(1 − 2−2 i )≥ −0.7093 .                      U 2 = − L−2 ≥ , U1 = − L−1 ≥ 0.7254 ,
                 3                                                                                     3
Similarly,                                                                                            π ,
           L−2 =≤ −4 / 3 , L−1 =≤ −5 / 6 , L0 =≤ −1 / 3                                U 0 = − L0 =≥      U −1 = − L1 ≥ −0.2019 ,
                                                                                                      12
                  L1 =≤ 1 / 6 , L2 =≤ 2 / 3                                   U −2 = − L−2 ≥ −0.5235. From the overlap intervals of
It can be shown that the overlap intervals [L2,U1], [L1,U0],                  [L2,U1], [L1,U0], [L0,U-1] and [L-1,U-2], a simple deci-
[L0,U-1] exist for all i, while [L-1,U-2] exists for i≥2. With                sion rule for δi can be deduced as shown in the second
the overlap regions, a simple selection rule for si can be                    section.
obtained as shown in the second section. The term
                          Table 1. Comparisons of the rotation-mode redundant CORDIC algorithms.
                          New              Lee & Tang’s        Rodrigues’ CORDIC   Branch CORDIC                        Double rotation
   Algorithm           CORDIC              CORDIC [5]                  [10]                [6]                           CORDIC [4]
      Area                 ∼A                    ∼A                     ∼A                ∼2A                                ∼A
  No. of Shift-
   &-add steps            0.8n                 0.95n                  0.95n              1.25n                               2.25n