Implementing Multiuser Channel
Estimation and Detection for W-CDMA
Sridhar Rajagopal, Srikrishna Bhashyam,
Joseph R. Cavallaro and Behnaam Aazhang
Rice University
{sridhar,skrishna,cavallar,aaz}@rice.edu
This work is supported by Nokia, Texas Instruments, Texas Advanced Technology Program and NSF
Organization
Joint Estimation & Detection
An Implementation-Friendly Scheme
Simulations
Architectural Features
– Task Partitioning
– Area-Time Tradeoffs
Conclusions
Future Work
Base-Station with MUD
Base-station Receiver
Antenna
Data Multiuser
Decoder
Detection
Detected Bits
Delay
Multiple Decision
Users Demod + Feedback
-ulator
M Channel M d
Pilot U Estimation U
X X b
Joint Estimation & Detection
Jointly estimate the channel response and detect
all the user’s bits.
Shown to have better performance as well as
reduced computational complexity.
Maximum Likelihood Based Channel Estimation
– [C.Sengupta et al. : PIMRC’1998 WCNC’1999]
Differencing Multistage Detection based on Parallel
Interference Cancellation
– [G.Xu et al. : SPIE’1999]
Computations Involved
delay
Model
time
bi bi-1
ri
i ibiA ir
R b
K2
i Bits of K async. users aligned at times I and I-1
C r
N
i
Received bits of spreading length N for K users
Compute Correlation Matrices
r b R
H
i i rb
b b R
T
i i bb
Multishot Detection
N K 2
Solve for the channel estimate, Ai C A i
rbR iA * bb R A A A
i 0 1
Multishot Detection
1,1b
0 0 A A DK DN
1 0
C A
1, K b 0 A
1 A
0
0
r
b
D ,1
A 0 0 0
0
D , K b
Differencing Multistage Detection
Stage 0 [ Matched Filter Detector]
] r H A [eR 0 y
) y ( ngis d
0 0 S=diag(AHA)
Stage 1 [ to build differencing vector]
0
d] S A H A [eR 0 y 1y y - soft decision
) 1y ( ngis 1 d
d - detected bits
Successive Stages
1 l (hard decision)
d d x
l l
l
x] S A H A [eR l y 1 l y
) 1 l y ( ngis 1 l d
Structure of AHA
Not difficult to Compute AHA
Block Bi-Diagonal Matrix : Use Structure
DK DK
R A H
A
A H 0A
0 0
1 AH A
0 0
0
1A
H
0A AH
1 A 0A H 0A
1 AHA
0 1
A H 1A A H 0A A H
A 0 0
1 0 0 1
Drawbacks
Matrix Inversion/ Decomposition Needed
Result not available till end of computation
– Delay before Detection
Difficult for Tracking
Higher Precision Needed
– Floating Point Units
Larger Memory Requirements
– Storage of elements to compute inverse
– Float = 32 bits / Input accuracy = 12-14 bits
SLOW! - Difficult to meet Real-Time
– [S.Rajagopal et al. : TI DSPFest’1999]
Proposed Base-Station
No Multiuser Detection
TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm)
New Scheme
Iterative Method to find the Channel Estimates
– [S.Bhashyam et al. : WCNC’2000 (submitted)]
Can be easily adapted to Tracking for Fading Channels
Fixed Point Implementation
Estimates ready for detection Immediately
Simpler Hardware and Software.
– Computation Savings only Per Bit
Iterative Scheme
b b Rbb Rbb bL * bL b0 * b0
T T
T
i i R bb
r b R Rbr Rbr bL * rL b0 * r0H
H H
i i rb
R iA * bbR
rb
A A ( A * Rbb Rbr )
Tracking
– Slow Fading : Large Window L
– Fast Fading : Smaller Window L
Method of Steepest Descent
Stable convergence behavior
μ fixed : Bit-by-Bit update
Matches Closely to the Scheme with Inversions
Simulations - AGWN Channel
-1
Comparison of BER using Channel Estimates by inversion and by iteration
10
Detection Window = 12
SINR = 0
Paths =3
Preamble =150
10000 bits/user
BER
-2
10
MF – Matched Filter
MF
ActMF
ML ML- Maximum Likelihood
ActML
ACT – using inversion
-3
10
4 5 6 7 8 9 10 11 12
SNR
Fading Channel with Tracking
Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths
0
10
MF - Static
MF - Tracking
ML - Static
ML - Tracking
-1
10
BER
-2
10
-3
10
4 5 6 7 8 9 10 11 12
SNR
DSP Implementation
C6201 Texas Instruments
– Fixed Point Processor
– 200 MHz
32 -bit VLIW Architecture
8 Functional Units
– 2 Multipliers
– 4 Adders
– 2 Load/Store
TI C Compiler
Simulation
Work in Progress!
Why better?
– Fixed Point Implementation - Faster on DSPs
– Higher Clock Speeds / Faster Multiplications
– More SIMD Parallelism due to smaller wordlength.
– Software Code Simpler to write
Smaller Program Size
Problems
– Input Bit Precision Analysis
– Overflows
Task - Partitioning the Algorithm
Base-station Receiver
Antenna
Data Multiuser
Decoder
Detection
Detected Bits
Delay
Multiple Decision
Users Demod + Feedback
-ulator
M Channel M d
Pilot U Estimation U
X X b
Task Decomposition
S.Das et al : Asilomar’99
Block I Block II Block III
Task B
Correlation Iterate Matrix
Matrices (Per Products Block IV
Bit)
d M A0HA1 Multistage
U Rbr[R] A[R] O(K2N) Detection
X O(KN) O(K2N) (Per Window)
b
Rbr[I] A0HA0
O(KN) O(K2N)
Data’ A[I]
M O(DK2M) d
O(K2N)
U
X Rbb A1HA1
Pilot O(K2) O(K2N)
AHr
Data O(KND)
Channel Estimation Multistage Detection
TIME
Task A
Channel Estimation Architecture
Detection Architecture
– One version already ready
– [G.Xu - Master’s Thesis 1999]
Advantages over DSP Implementation:
– Optimal Memory Utilization
– Custom Blocks for exploiting available pipelining and parallelism
– Parts could be mapped to FPGA / Reconfigurable logic
– Shows theoretical bounds for maximum achievable Data Rates
– Shows how tasks could be split among different processors
Block Diagram
Window
Each block shows no. of “operations” in it.
b0b0’ Inverter
(2K2) (2 K2)
b b0
(2K)
Rbb Multiplier A
MUX [R] REAL
bb’ (2 K2) (2 K2) (2 K2N) (KN)
(2 K2)
Inverter
(2K) MUX
r[R]
(2K)
r0
(N) Rbr Atmp >>
[R] [R] (4 K2)
MUX (KN)
(N)
bit Multiplier A
[I]
8-bit (2 K2N) (KN) IMAG
Inverter
(2K) MUX
r[I]
(2K)
r0
(N) Rbr Atmp >>
[I] (4 K2)
MUX (KN)
(N)
Channel Estimation
Each block shows no. of “operations” in it.
bb R
RWindowbb bL * bL b0 * b
T bit T
8-bit 0 A A ( A * Rbb Rbr )
b0b0’ Inverter
(2K2) (2 K2)
b b0
(2K)
A
Rbb [R]
(2 K2) Multiplier (KN) REAL
MUX (2 K2N)
bb’ (2 K2)
(2 K2)
Inverter
(2K)
MUX
r[R] (2K)
r0 Rbr Atmp
(N) [R] >>
[R]
(KN) (4 K2)
MUX
(N)
Rbr Rbr bL * rL b0 * r0H
H
Auto-correlation Structure
Rbb Rbb bL * bL b0 * b0
T T
•b,b0 are 1-bit
•Subtraction by using inverter
b0b0’ Inverter
•Rbb using a Counter
(2K2) (2 K2)
• Fully Parallel
Rbb •2K2 elements O(1) Time
MUX (2 K2)
bb’ K2)
• Pipelined [with LOAD]
(2
(2 K2)
•2K elements O(K) Time
• Serial [with LOAD]
•1 element O(2K2) Time
Cross-Correlation Structure
•r is 8-bit, b is 1-bit
Inverter
(2K) •Rbr using 8-bit Adders
MUX
(2K)
Rbr • Based on sign of b
[R]
(KN)
• Fully Parallel KN, O(1)
MUX
(N) • Pipelined N , O(K)
• Serial 1, O(KN)
Rbr Rbr bL * rL b0 * r0H
H
Iterative Update Structure
A A ( A * Rbb Rbr )
•8-bit Multipliers
•16-bit Adders for Multiplier
A
•8-bit Adders for A
Rbb [R]
Multiplier
(2 K2)
(2 K2N)
(KN) REAL • Parallel KN, O(K)
• Pipelined N , O(K2)
• Serial 1, O(K2N)
Rbr Atmp
[R] >>
[R]
(KN) (4 K2)
Elements in each block
Block Requires Area-Time Fully Parallel
Tradeoff Implementation
bbT,b0b0T 1-bit AND Gates 2K2 2K2
Rbb 8-bit UP/DOWN 2K 2K2
Counters [with LOAD]
Rbr[R,I] 8-bit Adders 2N 4KN
Y[R,I] 8-bit Adders 4K 4KN
Multiplier 8-bit Multipliers 4K 4KN
[R,I] 16-bit adders 4K 4K
Window Shift Registers:1-bit L L
Buffer Shift Registers:8-bit 2L 2L
Atmp[R,I] 8-bit subtractors 2K 4KN
TIME O(K2) O(K)
Example : N = 32,L =100, K =32
Fully Parallel Solution : 4K Multipliers, 12K Adders : O(32) Time
Pipelined Solution :100 Multipliers, 300 Adders : O(1K) Time
Conclusions
Iterative Scheme for Joint Estimation & Detection
No loss in algorithm performance
Suitable for Hardware Implementation
– On DSPs, FPGAs and ASICs
Supports Tracking for Fading Channels
Fixed Point Implementation Feasible
ASIC architecture
– To exploit available pipelining and parallelism
Multiuser Channel Estimation and Detection algorithms
POSSIBLE to IMPLEMENT for W-CDMA.
Future Work
MS
Extend Architecture to Long Codes
Task Partition the algorithm on the Sundance Multi-
DSP/FPGA board to achieve real-time
Post-MS
Downlink
Architectures to Min. Power Consumption /Area
Implementing Coding/Decoding Blocks and integrate
RENE’
EXTRA SLIDES
Data Rates Achieved
Assuming Channel Estimation Real-Time
5
x 10 Data Rates for Different Levels of Pipelining and Parallelism
3
2.5 (Parallel A) (Parallel+Pipe B)
(Parallel A) (Pipe B)
(Parallel A) B
2 AB
Data Rates
Sequential A + B
1.5
Data Rate Requirement = 128 Kbps
1
0.5
0
9 10 11 12 13 14 15
Number of Users
Fading Channel
SNR = 10 dB, Doppler = 10 Hz, 1000 Bits
Error rates of users for fading channel
0.2
ML
0.18 MF
MLact
0.16 MFact
0.14
0.12
Error Rate
0.1
0.08
0.06
0.04
0.02
0
0 5 10 15
User index