VIEWS: 22 PAGES: 4 CATEGORY: Technology POSTED ON: 12/23/2009
A RADIX-4 REDUNDANT CORDIC ALGORITHM WITH FAST ON-LINE VARIABLE SCALE FACTOR COMPENSATION Chieh-Chih Li Sau-Gee Chen Opto-Electronics & Systems Laboratories Department of Electronics Engineering Industrial Technology Research Institute National Chiao Tung University Hsinchu, Taiwan, ROC Hsinchu, Taiwan, ROC E-mail: sgchen@cc.nctu.edu.tw ABSTRACT fast decision of rotation direction with only a few most significant digits (MSDs) of the related parameters [3-8]; In this work, a fast radix-4 redundant CORDIC algo- (3) skipping unnecessary rotations; (4) recoding rotation rithm with variable scale factor is proposed. The algorithm angle for saving rotation iterations; and (5) applying radix- includes an on-line scale factor decomposition algorithm 4 rotation scheme [5,10,13]. The the 2nd to 4th techniques that transforms the complicated variable scale factor into a result in variable scale factors. Variable scale factors have sequence of simple shift-and-add operations and does the the trouble of complicated scale factor computation fol- variable scale factor compensation in the same fashion. On lowed by penalty compensation [7,8]. Due to the consider- the other hand, the on-line decomposition algorithm itself able overhead generated by variable scale factor, the exist- can be realized with a simple and fast hardware. The new ing radix-4 CORDIC algorithms resort to constant scale CORDIC algorithm has the smallest number of 0.8n itera- factor approach [5,10,13]. However, these constant scale- tions among all the CORDIC algorithms, which requires factor CORDICs are not pure radix-4 algorithms. In fact, only about two-third rotation number that of the existing they are all hybrid radix-2 and radix-4 algorithms. As a best (hybrid radix-2 and radix-4) redundant algorithms. result, all these approaches have minorly reduced iteration Therefore, the new algorithm achieves fast rotation itera- numbers, at the cost of control overheads. Ideally, a pure tions, high-speed and low-overhead scale factor compensa- radix-4 algorithm would achieve the best performance. tions, which are hard to attain simultaneously for the exist- To alleviate the mentioned disadvantages related to ing algorithms. The on-line scale factor compensation can prior arts, a pure fast radix-4 redundant CORDIC algo- be also applied to the existing on-line CORDIC algo- rithm with variable scale factor is proposed. The algorithm rithms. includes an on-line variable scale factor decomposition algorithm that transforms the complicated variable scale 1. INTRODUCTION n/2 2 −4 i − 2 CORDIC [1,2] algorithm is an efficient scheme for factor ∏ /1 (1 + δ i 2 ) into a sequence of simple i=0 computing elementary functions especially for the trigo- n/ 2 −2 i −1 nometric functions. Since the algorithm can be realized as shift-and-add operations of ∏ (1 + s i 2 ) in an on- a sequence of shift-and-add operations followed by a scale i =0 factor compensation operation, it is very suited for VLSI line fashion, where δi, si ∈{-2,-1,0,1,2}. Here si only de- implementation and widely applied to DSP applications. pends on δi. Both δi and si can be easily determined by Most of the CORDIC algorithms assume a constant estimating their corresponding intermediate variables with scale factor for the ease of scale factor compensation. very short wordlength. In all, the new algorithm has the However, they have to either do an accurate but slow deci- smallest number 0.8n of shift-and-add steps among all the sion operation for rotation direction or do rough direction CORDIC algorithms. Therefore, the new CORDIC algo- decision at the expense of extra compensation operations rithm achieves fast rotation iterations, high-speed and low- [4], [6]. In addition, they have to rotate even when the overhead scale factor compensations, which are hard to rotation angle has been converged. To speedup CORDIC attain simultaneously for the existing algorithms. The on- operations, the following techniques are widely used: (1) line scale factor compensation can be also applied to the applying carry-free redundant addition scheme [3-8]; (2) existing on-line CORDIC algorithms. 2. THE NEW RADIX-4 CORDIC ALGORITHM ^ 2 if 3 / 4 < W i FOR ROTATION MODE Here, the new redundant CORDIC algorithm to be ^ proposed is based on the fast signed-digit addition (SDA) 1 if 1 / 4 ≤ W i ≤ 3 / 4 [12]. The proposed radix-4 rotation mode algorithm for ^ initial vector of [X0,Y0] to be rotated by an angle of Z0 is s i = 0 if - 1 / 4 < W i < 1 / 4 , given as follows: ^ −1 if - 13 / 16 ≤ W i ≤ −1 / 4 for i=0 to n/2+1 ^ Xi+1=Xi+δi2-2i-1Yi, Yi+1=Yi-δi2-2i-1Xi, −2 if W i < −13 / 16 Ri+1=4(Ri-22itan-1δi2-2i-1) =22(i+1)Zi+1, ^ The final scale factor is where W i is the five most-significant fractional digits of Wi. The scale factor compensation can then be combined n/2 K −1 = ∏ 1 / (1 + δ i2 2 −4 i − 2 ) = with rotation iteration or executed after all the rotation i=0 iterations are finished. n/2 ∏ (1 + si 2 −2 i−1 ) i =1 if δ 0 = 0 . 3. THE NEW RADIX-4 CORDIC ALGORITHM n/2 FOR VECTORING MODE (1 − 2 )∏ (1 + si 2 −2 −2 i −1 ) if δ 0 ≠ 0 Since the iterated vectors are scaled in magnitude in i =1 each iteration and can only be tested after rotation, the A simple selection rules (derived in Appendix B) for δi is decision operations are slower and more complicated than as follows, that of the rotation mode. For this reason, the proposed ^ new vectoring mode algorithm is still a hybrid radix-2 and 2 if R i > 5 / 8 radix-4 one. However, the new algorithm reduces radix-2 ^ iterations to four which is much smaller than the existing 1 if 1 / 4 ≤ R i ≤ 5 / 8 n/2. Derivation of the new algorithm is more involved than ^ and similar to the rotation mode algortihm. δ i = 0 if − 1 / 4 < R i < 1 / 4 The new algorithm starts with four radix-2 iterations ^ based on the Ercegovac and Lang’s algorithm [7], fol- -1 if − 5 / 8 ≤ R i ≤ −1 / 4 lowed by (n-4)/2+1 radix-4 iterations based on a fixed selection rule as follows, for i=0,1,...,(n-4)/2 -2 ^ ^ * ^ if R i < −5 / 8 2 if W i > 3 X 4 / 2 . ^ ^ * ^ ^ where R i consists of the three most-significant fractional 1 if 1 / 2 X 4 ≤ W i ≤ 3 X 4 / 2 * digits of Ri. On the other hand, a simple selection rule δ i = 0 otherwise (derived in Appendix A) for s i can be obtained by defin- ^ ^ * ^ -1 if − 3 X 4 / 2 ≤ W i ≤ − X 4 / 2 ing the following iterative operations: ^ * ^ = 4[ Wi − 2 2 i +1 ln(1 + δ 2 −4( i +1) 2i ) − 2 ln(1 + si 2 −2 i −1 , -2 if W i < −3 X 4 / 2 W i +1 i +1 2 )] , −i * −4 i − 2 r * Ai 1 = Ai (1 + si 2 ) * * X i +1 = X i + δ i 2 + Wi , −1 2 −2 where W0 = −2 ln(1 + δ 0 2 ) , δ 0 ∈ {0, ±1, ±2} , A0=1, * * * * i +1 * Wi 1 = 4( Wi − δ i X i ) = 4 Yi 1 , + + K-1= An/2+1 for n-bit precision. * * −1 * −2 i − r Zi +1 = Zi + tan δ i 2 ^ * ^ where W and X 4 are the 6 and 5 most significant frac- i tional digits of * and X4 respectivley, and Wi * * * On the other hand, in average, the hybrid radix-2 and X 0 = X 4 , Y0 = Y4 , Z 0 = Z 4 ; X 4 , Y4 , Z 4 are the results radix-4 algortihms in [5] and [10] needs n/2 radix-2 itera- from the first 4 radix-2 iterations. The final result tions, (n/4) ×4/5 radix-4 iterations and n/4 iterations for X * =K, R = K X 2 + Y 2 , Z * =tan-1X0/Y0. scale factor compensation. These amount to 0.95n shift- ( n− 4)/ 2+1 0 0 ( n − 4 )/ 2 +1 and-add operations for those two algorithms. Note that the The resulted variable scale factor decomposition can be algorithm in [10] is based on slower non-redundant addi- performed similarly to the rotation mode on-line decompo- tions. sition algorithm. The double rotation method [4] needs 2.25n basic 4. PERFORMANCE COMPARISONS steps where 2n steps for rotations and 0.25n for scale fac- tor compensation. The branch CORDIC algorithm [6] To compare different redundant CORDICs for rotation needs 1.25n basic steps where n steps for rotations and mode, we assume that a basic iteration step consists of a 0.25n for scale factor compensation. However, this algo- shift operation and a 4-2 SDA. Combined with CORDIC rithm needs two copies of CORDIC operated in parallel. rotation iterations, the new scale factor decomposition Hence, in fact this algorithm needs 2.5n basic steps. algortithm can compensate the final results in two different Table 1 summaries the comparison results. In the table, schemes: all the algorithms are assumed realized with unfolded • Scheme-I: The n/2 additional shift-and-add compen- (sequential) hardwares, which mainly consist of required sation operations are performed right after all the n/2 barrel shifters, adders and ROM tables excluding other redundant CORDIC iterations have been done, namely minor components. As shown, the new radix-4 redundant X i*+1 = (1 + si 2 −2 i −1 ) X i* , Yi *1 = (1 + si 2 −2i −1 )Yi* CORDIC have the best performance. The comparison + statistic for all existing vectoring-mode CORDICs have the where X 0 = X n / 2 and Y0* = Yn / 2 , Xn/2 and Yn/2 are * similar performance results as the rotation mode. the CORDIC rotation results before scale factor com- pensation. The final compensated results are 5. CONCLUSION * * The new CORDIC algorithm achieves the best per- X n / 2 +1 and Y n / 2 +1 . Consequently, both rotation and formance among all the existing algorithms in terms of compensation operations need n/2 shift-and-add op- iteration number and hardware complexity. The algorithm erations. However, the probability of nonzero δi, si can be applied to the computation of hyperbolic functions ∈{-2,-1,0,1,2} is 4/5. As a result, in average there are as well. Moreover, the new algorithm includes a ROM (n/2)×2×4/5=0.8n shift-and-add operations. table of ln(1 + si 2 −2 i −1 ) which can be utilized to compute • Scheme-II: Each compensation iteration is performed logarithm and exponential functions, and in turns the hy- and combined with the rotation iteration immedialtely perbolic functions by using the well-known CCM algo- after its corresponding si has been determined, that is rithm. Doing this way, no scale factor compensation is X i +1 = (1 + si 2 −2 i −1 )( X i − δ i 2 −2 i −1 Yi )) , required. As a result, a unified algorithm for the computa- Yi +1 = (1 + si 2 −2 i −1 )(Yi + δ i 2 −2 i −1 X i . tion of a broad set of elementary functions can be obtained, which is under further investigation. Similarly, there are 0.8n shift-and-add operations for this CORDIC operations. 989-995, Sept. 1991. REFERENCES \6^! J. A. Lee and T. Lang, “Constant-Factor Redundant \2^! J. E. Volder, “The CORDIC trigonometric comput- CORDIC for Angle Calculation and Rotation,” ing technique,” IRE Trans. Electronic Comput., vol. IEEE Trans. Comput., vol. 41, no. 8, pp. 1016-1015. EC-8, no. 3, pp. 330-334, 1959. \7^! J. Duprat and J. M. Muller, “The cordic algorithm: \3^! J. S. Wather, “A unified algorithm for elementary New results for fast VLSI implementation,” IEEE functions,” AFIPS Spring Joint Comput. Conf., pp. Trans. on Comput., vol. 42, no. 2, pp. 168-178, Feb. 379-385, 1971. 1993. \4^! J. R. Cavallaro and N. D. Hemkumar, “Redundant \8^! M. D. Ercegovac and T. Lang, “Redundant and on- and On-line CORDIC for Unitary Transformations,” line CORDIC: Application to matrix triangulariza- IEEE Trans. Comput., vol. 43, no. 8, pp. 941-954, tion and SVD,” IEEE Trans. Comput., vol. 39, n0. 6, August 1994. pp. 725-740, June 1990. \5^! N. Takagi, et al., “Redundant CORDIC methods \9^! H. X Lin and H. J. Sips, “ON-Line CORDIC Algo- with a constant scale factor for sine and cosine com- rithms,” IEEE Trans. Comput., vol. 39, no. 8, pp. putation,” IEEE Trans. Comput., vol. 40, no. 9, pp. 1038-1052, Aug. 1990. view,” in Proc. SPIE Real-Time Signal Processing, \:^! T. C. Chen, “Automatic computation of logarithms, 495 (VII), Aug. 1984, pp. 86-93. exponential, ratios and square roots,” IBM J. Res. \23^!K. Hwang, Computer Arithmetic, Principles, Archi- and Dev., vol. 16, pp. 380-388, 1972. tecture, and Design. New York: Wiley, 1979. \21^!M. R. D. Rodrigues et. al, “Hardware evaluation of [13] D. Timmermann et. al, “Low Latency Time COR- mathematical functions,” IEE Proc., vol. 128, Pt. E, DIC Algorithms,” IEEE Trans. Comput., vol. 41, no. 4, July 1981. no. 8, pp. 110-115, Aug. 1992. \22^!M. D. Ercegovac, “On-line arithmetic: An over- ln(1+si2-2i-1) doesn’t exist when i=0 and s0= -2. To solve APPENDIX A this problem and ensure convergence, we introduce the Derivation of the radix-4 on-line decomposition algo- initial steps as follows, rithm for variable scale factor if δ = 0 −2 i −1 The decisoin of si=k should make Wi+1 remain bounded 1 0 1+ s 2 = −2 . in [L-2,U2] whenever Wi is in the interval [Lk,Uk]. Then 0 (1 − 2 ) if δ ≠ 0 0 the following equations have to be satisfied: U 2 = 4(U k − 2 2 i ln(1 + k 2 −2 i −1 )) APPENDIX B Derivation of selection rule for rotation direction of the L−2 + 2 2 i +1 ln(1 + 2 −4 ( i +1) ) = 4( Lk − 2 2 i ln(1 + k 2 −2 i −1 )). radix-4 CORDIC in rotation mode By substituting k=-2, -1, 0, 1, 2 successively into the above Similar to the on-line scale factor decompostion algorithm, equations, the exact bounds for i≥2 are found to be: the decision rules for rotation direction is to make Ri+1 2 2 i +2 22i U2 = ln(1 + 2−2i ) ≥ 12933 ,U 0 = . ln(1 + 2−2i ) ≥ 0.3233 still bounded if Ri is bounded in the interval [Lk,Uk]. 3 3 2 2i Therefore, Lk and Uk can be found from U2=4(Uk-22itan- U1 = ln(1 + 2 −2 i ) + 2 2 i ln(1 + 2 −2 i−1 )≥ 0.8157 3 1k2-2i-1) and L =4(L -22itan-1k2-2i-1). And the smallest -2 k 22 i U −1 = ln(1 + 2 −2i ) + 2 2i ln(1 − 2 −2i −1 ) ≥ −0.1846 (largest) values of Uk (Lk) can be found by letting i=0. 3 Specifically, 22 i π U −2 = ln(1 + 2−2 i ) + 22 i ln(1 − 2−2 i )≥ −0.7093 . U 2 = − L−2 ≥ , U1 = − L−1 ≥ 0.7254 , 3 3 Similarly, π , L−2 =≤ −4 / 3 , L−1 =≤ −5 / 6 , L0 =≤ −1 / 3 U 0 = − L0 =≥ U −1 = − L1 ≥ −0.2019 , 12 L1 =≤ 1 / 6 , L2 =≤ 2 / 3 U −2 = − L−2 ≥ −0.5235. From the overlap intervals of It can be shown that the overlap intervals [L2,U1], [L1,U0], [L2,U1], [L1,U0], [L0,U-1] and [L-1,U-2], a simple deci- [L0,U-1] exist for all i, while [L-1,U-2] exists for i≥2. With sion rule for δi can be deduced as shown in the second the overlap regions, a simple selection rule for si can be section. obtained as shown in the second section. The term Table 1. Comparisons of the rotation-mode redundant CORDIC algorithms. New Lee & Tang’s Rodrigues’ CORDIC Branch CORDIC Double rotation Algorithm CORDIC CORDIC [5] [10] [6] CORDIC [4] Area ∼A ∼A ∼A ∼2A ∼A No. of Shift- &-add steps 0.8n 0.95n 0.95n 1.25n 2.25n