DSP Architectures for Future
Wireless Base-Stations
Sridhar Rajagopal and Joseph Cavallaro
ECE Department
Rice University
April 10, 2000
This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF
Overview
Future Base-Stations
Current DSP Implementation
Our Approach
– Make Algorithms Computationally effective
– Task Partitioning for pipelining, parallelism
DSP/ARM Extensions for Performance Acceleration
4/10/00 TI Meeting 2
Evolution of Wireless Comm
First Generation
Voice
Second/Current Generation
Voice + Low-rate Data (9.6Kbps)
Third Generation +
Voice + High-rate Data (2 Mbps) + Multimedia
W-CDMA
4/10/00 TI Meeting 3
Communication System Uplink
Noise +MAI
Base Station
Reflected Paths
Direct Path
User 1
User 2
4/10/00 TI Meeting 4
Main Processing Blocks
Channel Estimation Detection Decoding
Baseband Layer of Base-Station Receiver
4/10/00 TI Meeting 5
No Multiuser Detection
Proposed Base-Station
TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm)
4/10/00 TI Meeting 6
Real -Time Requirements
Multiple Data Rates by Varying Spreading Factors
Detection needs to be done in real-time
– 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128
Kbps
Spreading Number of Data Rate
Factor Bits / Frame Requirement
4 10240 1024 Kbps
32 1280 128 Kbps
256 160 16 Kbps
4/10/00 TI Meeting 7
Current DSP Implementation
4
x 10 Data Rate Comparisons for Matched Filter and Multiuser Detector
18
16
14
Targeted Data Rate = 128Kbps
Data Rates Achieved
12
10
Projected (8x)
8
Matched Filter(C64)*
Multiuser Detector(C64)*
6 Matched Filter(C67)
Multiuser Detector(C67)
4 Targeted Data Rate
2
C67 at 166MHz
0
9 10 11 12 13 14 15
Number of Users
4/10/00 TI Meeting 8
Complexity
Algorithm Choice Limited by Complexity
– Multistage reduces data rate by half.
Main Features
– Matrix based operations
– High levels of parallelism
– Bit level computations
32x32 problem size for the Detector shown
Estimation, Decoding assumed pipelined.
4/10/00 TI Meeting 9
Reasons
Sophisticated, Compute-Intensive Algorithms
Need more MIPs/FLOPs performance
Unable to fully exploit pipelining or parallelism
Bit - level computations / Storage
4/10/00 TI Meeting 10
Our Approach
Make algorithms computationally effective
– without sacrificing error rate performance
Task Partitioning on Multiple Processing Elements
– DSPs : Core
– FPGAs : Application Specific / Bit-level Computations
VLSI Implementation to find extensions for DSPs.
4/10/00 TI Meeting 11
Algorithms
Channel Estimation
– Avoid inversion by iterative scheme
Detection
– Avoid block-based detection by pipelining
4/10/00 TI Meeting 12
Computations Involved
delay
Model time
bi bi+1
ri
i ibiA ir
R b
K2 Bits of K async. users aligned at times I and I-1
i
Received bits of spreading length N for K users
C r
N
i
1
i r ib
H
L rb R
1
ib ib
T
L bb R
4/10/00 TI Meeting 13
Multishot Detection
N K 2
Solve for the channel estimate, Ai C A i
rbR iA bb
R A A A
i 0 1
Multishot Detection
1,1b
0 0 A A DK DN
1 0
C A
1, K b 0 A
1 0 A 0
r
b
D ,1
A 0 0 0
0
D , K b
4/10/00 TI Meeting 14
Differencing Multistage Detection
Stage 0- Matched Filter
] r H A [eR 0 y S=diag(AHA)
) 0 y ( ngis 0
d
Stage 1
y - soft decision
0
d] S A A [eR y y
H 0 1
) 1y ( ngis 1 d
Successive Stages d - detected bits
1 l (hard decision)
d d x
l l
l
x] S A H A [eR l y 1 l y
) 1 l y ( ngis 1 l d
4/10/00 TI Meeting 15
Iterative Scheme
b b Rbb Rbb bL * bL b0 * b0
T T
T
i iR bb
r b R Rbr Rbr bL * rL b0 * r0H
H H
i i rb
rb R iA * bbR A A ( A * Rbb Rbr )
Tracking
Method of Steepest Descent
Stable convergence behavior
Same Performance
4/10/00 TI Meeting 16
Simulations - AWGN Channel
-1 Comparison of Bit Error Rates (BER)
10
Detection Window = 12
SINR = 0
Paths =3
Preamble L =150
Spreading N = 31
BER
-2
10 Users K = 15
MF O(K2N) 10000 bits/user
ActMF
ML
ActML MF – Matched Filter
O(K3+K2N) ML- Maximum Likelihood
ACT – using inversion
-3
10
4 5 6 7 8 9 10 11 12
Signal to Noise Ratio (SNR)
4/10/00 TI Meeting 17
Block Based Detector
Matched Filter
1 12 Stage 1
1 12 Stage 2
1 12
Stage 3
1 12
Matched Filter Bits 2-11
11 22 Stage 1
11 22 Stage 2
11 22
Stage 3
11 22
Bits 12-21
4/10/00 TI Meeting 18
Pipelined Detector
1 2 3 4 5 6 7 8 9 10 11 12
Matched Filter
1 2 3 4 5 6 7 8 9 10 11 12
Stage 1
1 2 3 4 5 6 7 8 9 10 11 12
Stage 2
Stage 3 1 2 3 4 5 6 7 8 9 10 11 12
4/10/00 TI Meeting 19
Task Decomposition [Asilomar99]
Block I Block II Block III
Multistage Detector
Correlation Inverse Matrix
Matrices (Per Products Block IV
Bit)
d M A0HA1 Multistage
U Rbr[R] RbbAH O(K2N) Detection
X O(KN) = Rbr[R] (Per Window)
b O(K2N)
Rbr[I] A0HA0
O(KN) O(K2N)
Data’ RbbAH
M O(DK2Me) d
= Rbr[I]
U
O(K2N)
X Rbb A1HA1
Pilot O(K2) O(K2N)
AHr
Data O(KND)
Channel Estimation
Matched Filter
4/10/00 TI Meeting 20
Achieved Data Rates
5
x 10 Data Rates for Different Levels of Pipelining and Parallelism
3
2.5 (Parallel A) (Parallel+Pipe B)
(Parallel A) (Pipe B)
(Parallel A) B
2 AB
Data Rates
Sequential A + B
1.5
Data Rate Requirement = 128 Kbps
1
0.5
0
9 10 11 12 13 14 15
Number of Users
4/10/00 TI Meeting 21
VLSI Implementation
Channel Estimation as a Case Study
Area - Time Efficient Architecture
Real - Time Implementation
Bit- Level Computations - FPGAs
Core Operations - DSPs
4/10/00 TI Meeting 22
DSP Extensions for Performance
Bit-level storage / processing support
– Registers / Memory / ALU
Efficient Matrix -Based operations
– Matrix- Vector Multiply
Support for Complex-valued data
Efficient memory accesses
Pre-fetching Data - C64
4/10/00 TI Meeting 23
Use of ARM Core
Work on Higher Base Station Layers
User Interface
OSI Translation
Layers Synchronization
3-7 Transport
ARM Network
OSI Data Link Layer
Layer (Converts Frames
2 to Bits)
Physical Layer
DSP OSI
Layer (hardware;
1 raw bit stream)
4/10/00 TI Meeting 24
Software Suggestions
Limited OS Support
Compiler Efficiency
– No more Assembly!
Performance Analysis Tools
Code Composer Studio 1.2
4/10/00 TI Meeting 25
Conclusions
DSPs to play major role in Future Base-Station
Implementations.
Search for Computationally Efficient Algorithms and Better
Processor Designs to meet Real-Time
4/10/00 TI Meeting 26