thesis v1

Document Sample
thesis v1 Powered By Docstoc
					An Efficient FPGA Implementation of IEEE
802.16e LDPC Encoder




     Speaker: Chau-Yuan-Yu
      Advisor: Mong-Kai Ku
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoder scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
Low-Density Parity-Check Code

   Benefit of LDPC Codes.
     Approaching    Shannon limit

     Low   error floor

     LDPC  code is adopted by various standards
     (e.g. DVB-S2, 802.11n, 802.16e)
Low-Density Parity-Check Code
   Parity check matrix H is sparse
        Very few 1’s in each row and column




   Null space of H is the codeword space




                          Valid Codeword
Low-Density Parity-Check Code
   In (n, k) block codes, k-bit information data can be
    encoded as n-bit codeword.

   In systematic block codes, the information bits directly
    exist in the bits of codeword.




    Systematic Part              Parity Part
Low-Density Parity-Check Code
   General encoding of systematic linear block codes
       Finding generator matrix G via H.
       C = sG = [s | p]


   Issues with LDPC codes
       The size of G is very large.
       G is not generally sparse.
       Encoding complexity will be very high.
Structured LDPC Codes
   Quasi-Cyclic LDPC Codes
       In QC-LDPC, H can be partitioned into square sub-blocks of size z x z.




       Each sub-blocks can be Z x Z zero sub-block or identity matrix with
        permutation.
Structured LDPC Codes
   QC Codes With Dual-Diagonal Structure
       In IEEE standards QC-LDPC Codes have Dual-Diagonal
        parity structure.
       We take 802.16e code rate ½ matrix for example.



                                                      0 represent
                                                      identity matrix.
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
General Encoding for LDPC Codes
   Richardson and Urbanke (RU) algorithm
       Partition the H matrix into several sub-matrix.
       In H, the part T is a low triangle matrix.
General Encoding for LDPC Codes
   Richardson and Urbanke (RU) algorithm




    p0                                      O(n+g2)




    p1                                      O(n+g2)
Efficient Encoding for Dual-Diagonal LDPC Codes
 A valid codeword c = [s|p] must satisfy



 Replace by dual-diagonal matrix




                                    Information bits      Parity bits
  Define lambda value as              From equation, we obtained
  Related Work (1) Sequential Encoding


Encoding scheme
                                         One-way derivation
Step 1
Compute lambda value by doing
matrix operation x = HsS

Step 2
Determines parity vector P0 by
adding all the lambda value

Step 3
Rest of parity vector is obtained
by exploiting dual-diagonal
matrix T
    Related Work (2) Arbitrary Bit-generation and Correction Encoding

    In [1], an alternative encoding for standard matrix was presented.

    Matrix will be modify by
     parity portion of weight-3                                              A                               Q                  U
     column set.

    H can be sectorized into
     three sub matrices
       The information bit region A
       The parity bit region Q
        for bit-flipping operation
       The parity bit region U
        for non bit-flipping.
                                                                                                                  Replace with zero
                                                                                                                     cyclic shift
[1] C. Yoon, E. Choi, M. Cheong, and S.-K. Lee, "Arbitrary bit generation and correction technique for encoding QC-LDPC codes
with dual-diagonal parity structure," IEEE Wireless Communications and Networking Conference, (WCNC 2007), pp. 662-666,
March 2007.
  Related Work (2) Arbitrary Bit-generation and Correction Encoding
Encoding scheme
Step 1                                            One-way derivation
Compute lambda value by doing
matrix operation x = As

Step 2
Set P0 as arbitrary binary values.
solve unknown parity bits

Step 3
Computed correction vector f
from P0

Step 4
Add correction vector to parity
bits in region Q to correct them
Related Work (2) Arbitrary Bit-generation and Correction Encoding


   Advantage
       Low-complexity encoding
       The number of addition required is less than RU scheme
   Drawback
       Can not directly applicable to standard code
       Modifying matrix will decrease code performance
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
Better encoding scheme

   Advantages of the encoding scheme
    proposed in [2]

          Low-complexity encoding

          Can directly applicable to matrices defined in
           IEEE standards without any modification

          Achieve higher level parallelism




[3] C.-Y. Lin, C.-C. Wei, and M.-K. Ku, "Efficient Encoding for Dual-Diagonal Structured LDPC Code Based on Parity bits Prediction and
Correction," IEEE Asia Pacific Conference on Circuits and Systems (APPCCAS), pp.1648-1651, Dec. 2008.
 Better Encoding Scheme
Step 1
Set P0’ as any binary vector
                                                   Correct prediction vector by f
Step 2
Compute lambda value by
doing matrix operation Hs

Step 3
[Forward Derivation]

Step 4
[Backward Derivation]

Step 5
Compute the P0 by adding
prediction parity vector

Step 6
Compute the correction vector f

Step 7
                                  Compute P0 by adding prediction vector
Correct prediction parity by
adding f                          Compute correction vector f   f = (P0)d
 Better Encoding Scheme
Step 1
Set P0’ as any binary vector.      Reduce encoding delay !!
Step 2                               Two-way derivation
Compute lambda value by
doing matrix operation Hs.

Step 3
[Forward Derivation]

Step 4
[Backward Derivation]

Step 5
Compute the P0 by adding
prediction parity vector.

Step 6
Compute the correction vector f.

Step 7
Correct prediction parity by
adding f.
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
LDPC Encoder Architecture
Based on the encoding scheme proposed bedore, we design both
parallel and serial architecture.

   Parallel architecture
      Achieve higher level parallelism
      High-speed


   Serial architecture
Parallel architecture

                                  Barrel shifter#1




                      divider
                                                                   Prediction
      Matrix                                                       Parity
                                                     Accumulator                Correct
                                                                   memory




                                  Barrel shifter#6




      Input data register       lambda position
   Parallel architecture (Stage 1)

                                            Barrel shifter#1




                                divider
                                                                             Prediction
                Matrix                                                       Parity
                                                               Accumulator                 Correct
                                                                             memory




                                            Barrel shifter#6




                Input data register       lambda position




                                                                                  Benefit:
 In this stage, matrix                                                            1.When the input data
select the shift values                                                           is coming, it can work
and multiply specific                                                             immediately without all
  value according to                                                              the input data are
   the code length.                                                               coming.
                                                                                  2.Reduce the numbers
                                                                                  of barrel shifter.
Shifter Value Computation
Equation for computing shift value


Normal code rate           :




Code rate 2 ∕ 3 A code :



Two type of matrix implement result with multiple rate and length

                                        Slice    FFs      LUTs     CLK       Total     gate

                                                                   (MHz)     count


                   One    matrix    +   14,179   4,071    26,846   141.391   227,076

                   calculate IP


                   Using matrices to    41,409   12,078   76,977   165.591   635,691

                   save shifter value
Parallel architecture (Stage 2)

                                       Barrel shifter#1




                           divider
                                                                           Prediction
           Matrix                                                          Parity
                                                          Accumulator                   Correct
                                                                           memory




                                       Barrel shifter#6




           Input data register       lambda position




Divide the datas from                       This module used to save
matrix.                                     the input data. These data
                                            are used in barrel shifters.
Parallel architecture (Stage 3)

                                       Barrel shifter#1




                           divider
                                                                        Prediction
           Matrix                                                       Parity
                                                          Accumulator                      Correct
                                                                        memory




                                       Barrel shifter#6




           Input data register       lambda position




                                                                                     Lambda position = 3




 These module are                                                               This module records the
 used to circulated                                                             row position of the
 shift the input data                                                           shifter values
                                                                                     Lambda position = 8


                                                                                     Lambda position = 11

                                       Shifter value
 Parallel architecture (Stage 4)

                                            Barrel shifter#1




                                divider
                                                                              Prediction
                Matrix                                                        Parity
                                                                Accumulator                 Correct
                                                                              memory




                                            Barrel shifter#6




                Input data register       lambda position




According to the                                                                   Computed the
lambda position, in                                                                lambda value by
this clock cycle λ1, λ2,                                                           accumulating the
λ5, λ8, λ9, λ11 need to be                                                         shifted data after Kb
accumulated.                                                                       clock cycle


                                                               Kb
Parallel architecture (Stage 5)

                                 Barrel shifter#1




                     divider
                                                                  Prediction
     Matrix                                                       Parity
                                                    Accumulator                Correct
                                                                  memory




                                 Barrel shifter#6




     Input data register       lambda position




                                                                       Computed the
                                                                       prediction vector
                                                                       Pi‘ by equation
Parallel architecture (Stage 5)
P_0 <= acc_out0;
P_1 <= acc_out0 ^ acc_out1;
P_2 <= acc_out0 ^ acc_out1 ^ acc_out2;
P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3;
P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4;
P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5;
P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6;
P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7;
P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8;
P_9 <= acc_out11 ^ acc_out10 ^ acc_out9;
P_10 <= acc_out11 ^ acc_out10;
P_11 <= acc_out11;




  For saving the hardware area, we use                                            2 3,         P_3
                                                                     In code rate 1 / 2, P_0 ~ P_11
 one architecture to compute the                                     P_8~P_11are the
                                                                     are the prediction prediction
 prediction values for four different
 code rate.
Parallel architecture (Stage 5)
P_0 <= acc_out0;
P_1 <= acc_out0 ^ acc_out1;
P_2 <= acc_out0 ^ acc_out1 ^ acc_out2;
P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3;
P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4;
P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5;
P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6;
P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7;
P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8;
P_9 <= acc_out11 ^ acc_out10 ^ acc_out9;
P_10 <= acc_out11 ^ acc_out10;
P_11 <= acc_out11;




  For saving the hardware area, we use                                            5 6,         P_1
                                                                     In code rate 3 / 4, P_0 ~ P_2
 one architecture to compute the                                     P_10~P_11are the prediction
                                                                     P_9~P_11 are the prediction
 prediction values for four different                                vectors
 code rate.
Parallel architecture (Stage 6)

                                      Barrel shifter#1




                          divider
                                                                       Prediction
          Matrix                                                       Parity
                                                         Accumulator                Correct
                                                                       memory




                                      Barrel shifter#6




          Input data register       lambda position




  Step2:                                                                    Step1:
  Correct the other Pi.                                                     Compute the P0. In
  Using the equation                                                        code rate = 1 / 2,
  Pi= Pi’^ P0                                                               P0 = P5 ^ P6
Serial architecture (Stage 1)

                                             Barrel shifter#1   Accumulator
                                                                &




                             divider
               Matrix
                                                                Predict          Correct
                                                                memory
                                             Barrel shifter#2




            Input data    Input
            register      control


                                     2

                                 3
 As the stage1 in            1                                            In the first Kb clock
 parallel architecture.                                                   cycle, encoder order are
                                                                          from top->middle and
                                     3
                                         3                                down ->middle,
                             2
                                                                          column by column
                             1
Serial architecture (Stage 1)

                                             Barrel shifter#1   Accumulator
                                                                &




                           divider
              Matrix
                                                                Predict           Correct
                                                                memory
                                             Barrel shifter#2




           Input data   Input
           register     control


                                             1       2
                                     3


 Reason:                                                                  In the last clock cycle,
 1.Prepare the input                                                      encoder order are from
 data                                                                     left->right, row by row
 2.Reduce the slice
                                                 3

                                         1               2
Serial architecture (Stage 2)

                                       Barrel shifter#1   Accumulator
                                                          &




                             divider
                Matrix
                                                          Predict          Correct
                                                          memory
                                       Barrel shifter#2




             Input data   Input
             register     control




Divide the datas from                                             Choose the corresponding
matrix.                                                           input value to barrel shifter
                                                                  (Take clock cycle #2 for
                                                                  example)
Serial architecture (Stage 3)

                                   Barrel shifter#1   Accumulator
                                                      &




                        divider
         Matrix
                                                      Predict       Correct
                                                      memory
                                   Barrel shifter#2




      Input data     Input
      register       control




                   Shift the input data
                   according to the shifter
                   value chosen form Mux
Serial architecture (Stage 4)

                                         Barrel shifter#1   Accumulator
                                                            &




                               divider
              Matrix
                                                            Predict           Correct
                                                            memory
                                         Barrel shifter#2




           Input data       Input
           register         control




  In normal, this module                                              In this module, there
  accumulate the shifted                                              are three works:
  data to compute λi .                                                1.Compute λi
  When the data is the                                                2.Compute Pi’
  last value in this row,                                             3.Compute P0
  also compute Pi’.
Serial architecture (Stage 4)

                                       Barrel shifter#1   Accumulator
                                                          &




                             divider
              Matrix
                                                          Predict       Correct
                                                          memory
                                       Barrel shifter#2




           Input data    Input
           register      control




  When all Pi have been
  computed, compute the
  P0 by Xor Px’ and Px+1’
  which are the middle
  prediction vector in the
  matrix.
Serial architecture (Stage 5)

                                Barrel shifter#1   Accumulator
                                                   &




                      divider
         Matrix
                                                   Predict           Correct
                                                   memory
                                Barrel shifter#2




      Input data   Input
      register     control




                                                             Correct the other Pi.
                                                             Using the equation
                                                             Pi= Pi’^ P0
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Better Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
Implementation Results


   The proposed encoder based on IEEE 802.16e LDPC codes can
    encode the code with code rate 1/2 2/3 3/4 5/6 and code length
    ranging from 576 to 2304.


   The hardware implementation was performed and verification on Xilinx
    Virtex-4 and Altera Stratix Field Programmable Gate Array (FPGA)
    device.
Implementation Results
Parallel architecture
                                                          Rate 1/2    Rate 2/3    Rate 3/4    Rate 5/6

         Z    N     Slice    FFs     LUTs     CLK (MHz)   IT (Gbps)   IT (Gbps)   IT (Gbps)   IT (Gbps)

        24   576                                           2.262       2.468       2.545        2.61

        40   960                                            3.77       4.113       4.241        4.35

        60   1440   14,179   4,071   26,846    141.391     5.656        6.17       6.363       6.526

        80   1920                                          7.541       8.226       8.483       8.701

        96   2304                                          9.049       9.872       10.18       10.441

   Information throughput ranging from 2.262 to 10.441 Gbps
   The encoder area is constant in any code rate or code length.
   For a given code rate, an increase in the code length will increase the throughput.
Implementation Results
Serial architecture




   Information throughput ranging from 0.867 to 4.019 Gbps
   For a given code rate, an increase in the code length will increase the
    throughput.
Implementation Results
Parallel architecture using row by row




 Area comparison
Implementation Results
IT comparison




IT/Area comparison
                                                 Table 4.5a The synthesis result of [22] at code rate 1/2



Compare to Related Work
     We compare implementation with [3].

      Code Length   Area (LE)   Clk (MHz)   IT (Gbps)   IT/Total Area                    Code Length   Area (LE)    Clk (MHz)   IT (Gbps)   IT/Total Area

                                                         (Mb per Le)                                                            Rate 1/2     (Mb per Le)

          576         3391       192.23      2.129         0.0612                                                                              rate1/2

[2]       960         5100       159.57      2.253         0.0648                            576                                 1.561        0.07447

                                                                             Proposed        960            20960     97.58      2.602        0.12414
         1440         7012       164.83      2.697         0.0776

                                                                                            1440                                 3.903        0.18621
         1920         8924       148.72      2.644         0.0761

                                                                                            1920                                 5.204        0.24828
         2304        10339       148.41      2.758         0.0793

                                                                                            2304                                 6.245        0.29794
                     34766




       Better throughput for longer code length
       Using less area to implement multiple code length and code rate
       The clock cycle is shorter the [3].


  [3] S. Kopparthi and D. M. Gruenbacher, "Implementation of a fiexible encoder for structured low-density parity-check codes," IEEE Pacic Rim
  Conference on Communications, Computers and Signal Processing (PacRim 2007), pp.438-441, Aug. 2007.
Compare to Related Work
The comparison of throughput




   The proposed encoder outperforms the work in [3] in terms of throughput
    when the code length longer then 1200
   The proposed encoder architecture provides better throughput for a longer
    code length while the work in [3] does not have this kind of speed-up
Compare to Related Work
The comparison of throughput/area ratio




   The proposed encoder outperforms the work in [3] in terms of
    throughput/area ratio by 1.216 to 3.757 times
   The proposed encoder utilizes hardware resources more efficiently
Compare to Related Work
   We compare implementation with [2].
Compare to Related Work
The comparison of throughput




   The throughput in our proposed encoder is higher then [2] in all code rate
    and code length
   The proposed encoder outperforms the work in [2] in terms of throughput
    ratio by 1.237 to 1.963 times
Compare to Related Work
The comparison of throughput/area




   The proposed encoder outperforms the work in [2] in terms of throughput
    ratio by 2.427 to 5.256 times
   The result shows that our proposed encoder utilizes hardware resources
    efficiently
Compare to Related Work (Serial)
   We compare implementation with [4].

                                 Slices         FFs           LUTs        Block rams       CLK             IT

                    [4]          4,724         1,807          8,335           81            186           3.34

                 Proposed       12,567         3,885         22,050           0          123.502         4.626




   Our proposed encoder achieve higher IT in low clock.

   In our proposed encoder, the matrix information are built in it without
    additional blockrams.

   The IT/Area of our serial encoder is 0.3681(Mbps) per slice and the
    IT/Area of [4] is 0.1768.
[4] Jeong Ki KIM1, Hyunseuk YOO1 and Moon Ho LEE1, "Efficient Encoding Architecture for IEEE 802.16e LDPC Codes, " IEICE Transactions
on Fundamentals of Electronics, Communications and Computer Sciences 2008.
Outline
   Introduction
       Low-Density Parity-Check Codes
   Related work
       General encoding for LDPC codes
       Efficient encoding for Dual-Diagonal matrix
   Proposed Encoding scheme
   LDPC Encoder Architecture
       Parallel Encoder
       Serial Encoder
   Result
   Conclusion
Conclusion

   An efficient encoding architecture for IEEE 802.16e LDPC
    codes with multiple code lengths and code rates are
    implemented.

   In our design, change between different code rate or code
    length only to change the type in information data.

   This architecture is also suitable the IEEE 802.11n standard.

   Our encoder achieve higher throughput and better
    throughput/area ratio than conventional encoding scheme
    when code length longer than 1200.
Thank you!!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/8/2013
language:Unknown
pages:56