Ldpc Final .doc - 123SeminarsOnly.com

Document Sample
Ldpc Final .doc - 123SeminarsOnly.com Powered By Docstoc
					                                      CHAPTER I

                                    INTRODUCTION

1.1 Introduction to LDPC

       Due to their near Shannon limit performance and inherently parallelizable
decoding scheme, low-density parity-check (LDPC) codes. have been extensively
investigated in research and practical applications. Recently, LDPC codes have been
considered for many industrial standards of next generation communication systems
such as DVB-S2, WLAN (802.11.n), WiMAX (802.16e), and 10GBaseT (802.3an).
For high throughput applications, the decoding parallelism is usually very high.
Hence, a complex interconnect network is required which consumes a significant
amount of silicon area and power.

       A message broadcasting technique was proposed to reduce the routing
congestion in a fully parallel LDPC decoder. Because all check nodes and variable
nodes are directly mapped to hardware, the implementation cost is very high. The
decoders in are targeted to specific LDPC codes which have very simple
interconnection between check nodes and variables nodes. The constraints in H matrix
structure for routing complexity reduction unavoidably limit the performance of the
LDPC codes. The LDPC code decoder proposed in based on two-phase message-
passing (TPMP) decoding scheme.

        Recently, layered decoding approach has been of great interest in LDPC
decoder design because it converges much faster than TPMP decoding approach. The
4.6 Gb/s LDPC decoder presented in adopted layered decoding approach. However, it
is only suited for array LDPC codes, which can be viewed as a sub-class of LDPC
codes. It should be noted that a shuffled iterative decoding algorithm based on vertical
partitioning of the parity-check matrix can also speed up the LDPC decoding in
principle.

       In practice, LDPC codes have attracted considerable attention due to their
excellent error correction performance and the regularity in their parity check
matrices which is well suited for VLSI implementation. In this paper, we present a
high-throughput low-cost layered decoding architecture for generic QC-LDPC codes.
                                            1
       A row permutation approach is proposed to significantly reduce the
implementation complexity of shuffle network in the LDPC decoder. An approximate
layered decoding approach is explored to increase clock speed and hence to increase
the decoding throughput. An efficient implementation technique which is based on
Min-Sum algorithm is employed to minimize the hardware complexity. The
computation core is further optimized to reduce the computation delay.

       Low-density parity-check (LDPC) codes were invented by R. G. Gallager
(Gallager 1963; Gallager 1962) in 1962. He discovered an iterative decoding
algorithm which he applied to a new class of codes. He named these codes low-
density parity-check (LDPC) codes since the parity-check matrices had to be sparse to
perform well. Yet, LDPC codes have been ignored for a long time due mainly to the
requirement of high complexity computation, if very long codes are considered. In
1993, C. Berrou et. al. invented the turbo codes (Berrou, Glavieux, and Thitimajshima
1993) and their associated iterative decoding algorithm. The remarkable performance
observed with the turbo codes raised many questions and much interest toward
iterative techniques. In 1995, D. J. C. MacKay and R. M. Neal (MacKay and Neal
1995; MacKay and Neal 1996; Mackay 1999) rediscovered the LDPC codes, and set
up a link between their iterative algorithm to the Pearl’s belief algorithm (Pearl 1988),
from the artificial intelligence community (Bayesian networks). At the same time, M.
Sipser and D. A. Spielman (Sipser and Spielman 1996) used the first decoding
algorithm of R. G. Gallager (algorithm A) to decode expander codes.

1.2 Objectives:

       The objective of this project is a high-throughput decoder architecture for
generic quasi- cyclic low-density parity-check (QC-LDPC) codes. Various
optimizations are employed to increase the clock speed. A row permutation scheme is
proposed to significantly simplify the implementation of the shuffle network in LDPC
decoder. An approximate layered decoding approach is explored to reduce the critical
path of the layered LDPC decoder. The computation core is further optimized to
reduce the computation delay. It is estimated that 4.7 Gb/s decoding throughput can
be achieved at 15 iterations using the current technology.



                                               2
       Low-density parity-check (LDPC) codes, which have channel capacity
approaching performance, were first invented by Gallager in 1962 and rediscovered
by MacKay in 1996 as a linear block code; LDPC codes show excellent error
correction capability even for low signal-to-noise ratio applications. Also, inner
independence of its parity-check matrix enables parallel decoding and thus makes
high-speed LDPC decoder possible. Hence, LDPC codes have been suggested in
many recent wire-line and wireless communication standards such as IEEE 802.11n,
DVB-S2 and IEEE 802.16e (WiMax). LDPC codes can be effectively decoded by the
standard belief propagation (BP) algorithm which is also called sum-product
algorithm (SPA). Later, min-sum algorithm (MSA) is introduced to reduce the
computational complexity of the check nodes processing in SPA, which makes this
algorithm suitable for VLSI implementation. VLSI implementation of LDPC decoder
has attracted attentions from researchers in the past few years, including fully parallel
architecture and partly parallel architecture. Fully parallel architecture directly maps
standard BP algorithm into hardware by specifying connections between check nodes
and variable nodes. However, the interconnections become more complex as the
block length increases, which leads to large chip area and power consumption. Partly
parallel architecture can effectively balance the hardware complexity and system
throughput by employing architecture-aware LDPC (AA-LDPC) codes that have
regularly-constructed parity-check matrix. However, the decoder complexity is still a
great challenge for LDPC codes that have irregular parity check matrix, such as the
codes used in the IEEE 802.16e standard for WiMax systems. The decoding
throughput for irregular LDPC codes will decrease due to the irregular parity check
matrix which destroys the inherent parallelism in partly parallel decoding
architectures. Layered decoding algorithm (LDA), either by horizontal partitioning or
vertical partitioning, uses the newest data from the current iteration rather than data
from the previous iteration and thus can double the convergence speed. Conventional
LDA processes messages in serial, from the first layer to the last, leading to limited
decoding throughput. Grouped layered decoding can improve the throughput but
employs more hardware resources. In this paper, we introduce a new parallel layered
decoding architecture (PLDA) to enable different layers to operate concurrently.
Precisely scheduled message passing paths among different layers guarantees that
newly calculated messages can be delivered to their designated locations before they

                                               3
are used by the next layer. By adding offsets to the permutation values of the sub-
matrices in the base parity check matrix, time intervals among different layers become
large enough for message passing. In PLDA, the decoding latency per iteration can be
reduced greatly and hence the decoding throughput is improved. The remainder of
this paper is organized as follows. Section II introduces code structure used in
WiMax, MSA and LDA. Corresponding hardware implementation of PLDA and
message passing network are presented in Section III. Section IV shows experimental
results of the proposed decoder, including FPGA implementation results, ASIC
implementation results and comparisons with existing WiMax LDPC decoders.




                                              4
                                     CHAPTER II



2.1 Turbo codes

       Turbo Coding is an iterated soft-decoding scheme that combines two or more
relatively simple convolutional codes and an interleaver to produce a block code that
can perform to within a fraction of a decibel of the Shannon limit. Predating LDPC
codes in terms of practical application, they now provide similar performance.

       One of the earliest commercial applications of turbo coding was the
CDMA2000 1x (TIA IS-2000) digital cellular technology developed by Qualcomm
and sold by Verizon Wireless, Sprint, and other carriers. It is also used for the
evolution of CDMA2000 1x specifically for Internet access, 1xEV-DO (TIA IS-856).
Like 1x, EV-DO was developed by Qualcomm, and is sold by Verizon Wireless,
Sprint, and other carriers (Verizon's marketing name for 1xEV-DO is Broadband
Access, Sprint's consumer and business marketing names for 1xEV-DO are Power
Vision and Mobile Broadband, respectively.).

2.1.1 Characteristics of Turbo Codes

1) Turbo codes have extraordinary performance at low SNR.

           a) Very close to the Shannon limit.

           b) Due to a low multiplicity of low weight code words.

2) However, turbo codes have a BER “floor”.

       - This is due to their low minimum distance.

3) Performance improves for larger block sizes.

           a) Larger block sizes mean more latency (delay).

           b) However, larger block sizes are not more complex to decode.

           c) The BER floor is lower for larger frame/interleaver sizes

                                               5
4) The complexity of a constraint length KTC turbo code is the same as a K =

   KCC convolutional code,

          Where: KCC  2+KTC+ log2 (number decoder iterations)

2.2 Performance of Error Correcting Codes

        The performances of error correcting codes are compared with each other by
referring to their gap to the Shannon limit, as mentioned in section 1.1.1. This section
aims at defining exactly what the Shannon limit is, and what can be measured exactly
when the limit to the Shannon bound is referred to. It is important to know exactly
what is measured since a lot of “near Shannon limit” codes have been discovered
now. The results hereafter are classical in the information theory and may be found in
a lot of references. Yet, the first part is inspired by the work of (Schlegel 1997).

        Forward Error Correction (FEC) is an important feature of most modem
communication systems, including wired and wireless systems. Communication
systems use a variety of FEC coding techniques to permit correction of bit errors in
transmitted symbols.

2.2.1 Forward error correction

        In telecommunication and information theory, forward error correction (FEC)
(also called channel coding) is a system of error control for data transmission,
whereby the sender adds (carefully selected) redundant data to its messages, also
known as an error-correcting code. This allows the receiver to detect and correct
errors (within some bound) without the need to ask the sender for additional data. The
advantages of forward error correction are that a back-channel is not required and
retransmission of data can often be avoided (at the cost of higher bandwidth
requirements, on average). FEC is therefore applied in situations where
retransmissions are relatively costly or impossible. In particular, FEC information is
usually added to most mass storage devices to protect against damage to the stored
data.

        FEC processing often occurs in the early stages of digital processing after a
signal is first received. That is, FEC circuits are often an integral part of the analog-to-

                                                 6
digital conversion process, also involving digital modulation and demodulation, or
line coding and decoding. Many FEC coders can also generate a bit-error rate (BER)
signal which can be used as feedback to fine-tune the analog receiving electronics.
Soft-decision algorithms, such as the Viterbi encoder, can take (quasi-)analog data in,
and generate digital data on output.

       The maximum fraction of errors that can be corrected is determined in
advance by the design of the code, so different forward error correcting codes are
suitable for different conditions.

How it works

       FEC is accomplished by adding redundancy to the transmitted information
using a predetermined algorithm. Each redundant bit is invariably a complex function
of many original information bits. The original information may or may not appear in
the encoded output; codes that include the unmodified input in the output are
systematic, while those that do not are nonsystematic.

       An extremely simple example would be an analog to digital converter that
samples three bits of signal strength data for every bit of transmitted data. If the three
samples are mostly zero, the transmitted bit was probably a zero, and if three samples
are mostly one, the transmitted bit was probably a one. The simplest example of error
correction is for the receiver to assume the correct output is given by the most
frequently occurring value in each group of three.

                          Triplet received Interpreted as
                                 000             0
                                 001             0
                                 010             0
                                 100             0
                                 111             1
                                 110             1
                                 101             1
                                 011             1


       This allows an error in any one of the three samples to be corrected by
"democratic voting". This is a highly inefficient FEC, but it does illustrate the

                                                7
principle. In practice, FEC codes typically examine the last several dozen, or even the
last several hundred, previously received bits to determine how to decode the current
small handful of bits (typically in groups of 2 to 8 bits).

        Such triple modular redundancy, the simplest form of forward error correction,
is widely used.

Averaging noise to reduce errors:

        FEC could be said to work by "averaging noise"; since each data bit affects
many transmitted symbols, the corruption of some symbols by noise usually allows
the original user data to be extracted from the other, uncorrupted received symbols
that also depend on the same user data.

       Because of this "risk-pooling" effect, digital communication systems that use
        FEC tend to work well above a certain minimum signal-to-noise ratio and not
        at all below it.

       This all-or-nothing tendency -- the cliff effect -- becomes more pronounced as
        stronger codes are used that more closely approach the theoretical limit
        imposed by the Shannon limit.

       Interleaving FEC coded data can reduce the all or nothing properties of
        transmitted FEC codes. However, this method has limits; it is best used on
        narrowband data.

    Most telecommunication systems used a fixed Channel Code designed to tolerate
the expected worst-case bit error rate, and then fail to work at all if the bit error rate is
ever worse. However, some systems adapt to the given channel error conditions:
Hybrid automatic repeat-request uses a fixed FEC method as long as the FEC can
handle the error rate, then switches to ARO when the error rate gets too high; adaptive
modulation and coding uses a variety of FEC rates, adding more error-correction bits
per packet when there are higher error rates in the channel, or taking them out when
they are not needed.




                                                 8
Types of FEC:

         The two main categories of FEC codes are block codes and convolutional
codes.

        Block codes work on fixed-size blocks (packets) of bits or symbols of
         predetermined size. Practical block codes can generally be decoded in
         polynomial time to their block length.

        Convolutional codes work on bit or symbol streams of arbitrary length. They
         are most often decoded with the Viterbi algorithm, though other algorithms are
         sometimes used. Viterbi decoding allows asymptotically optimal decoding
         efficiency with increasing constraint length of the convolutional code, but at
         the expense of exponentially increasing complexity. A convolutional code can
         be turned into a block code, if desired.

   There are many types of block codes, but among the classical ones the most
notable is Reed-Solomon coding because of its widespread use on the Compact disc,
the DVD, and in hard disk drives. Golay, BCH, Multidimensional parity, and
Hamming codes are other examples of classical block codes.

   Hamming ECC is commonly used to correct NAND flash memory errors ]. This
provides single-bit error correction and 2-bit error detection. Hamming codes are only
suitable for more reliable single level cell (SLC) NAND. Denser multi level cell
(MLC) NAND requires stronger multi-bit correcting ECC such as BCH or Reed-
Solomon].Classical block codes are usually implemented using hard-decision
algorithms, which means that for every input and output signal a hard decision is
made whether it corresponds to a one or a zero bit. In contrast, soft-decision
algorithms like the Viterbi decoder process (discretized) analog signals, which allow
for much higher error-correction performance than hard-decision decoding. Nearly all
classical block codes apply the algebraic properties of finite fields.

Concatenated FEC codes for improved performance:

         Classical (algebraic) block codes and convolutional codes are frequently
combined in concatenated coding schemes in which a short constraint-length Viterbi-
decoded convolutional code does most of the work and a block code (usually Reed-
                                                    9
Solomon) with larger symbol size and block length "mops up" any errors made by the
convolutional decoder.

        Concatenated codes have been standard practice in satellite and deep space
communications since Voyager2 first used the technique in its 1986 encounter with
Uranus.

Low-density parity-check (LDPC):

        Low-Density Parity- Check (LDPC) codes are a class of recently re-
discovered highly efficient linear block codes. They can provide performance very
close to the channel capacity (the theoretical maximum) using an iterated soft-
decision decoding approach, at linear time complexity in terms of their block length.
Practical implementations can draw heavily from the use of parallelism.

        LDPC codes were first introduced by Robert G. Gallager in his PhD thesis in
1960, but due to the computational effort in implementing en- and decoder and the
introduction of Reed-Solomon codes, they were mostly ignored until recently.

        LDPC codes are now used in many recent high-speed communication
standards, such as DVB-S2 (Digital video broadcasting), Wi-MAX (IEEE 802.16e
standard for microwave communications), High-Speed Wireless LAN (IEEE
802.11n), 10GBase-T Ethernet (802.3an) and G.hn/G.9960 (ITU-T Standard for
networking over power lines, phone lines and coaxial cable).

Channel Capacity:

        Stated by Claude Shannon in 1948, the theorem describes the maximum
possible efficiency of error-correcting methods versus levels of noise interference and
data corruption. The theory doesn't describe how to construct the error-correcting
method, it only tells us how good the best possible method can be. Shannon's theorem
has wide-ranging applications in both communications and data storage. This theorem
is of foundational importance to the modern field of information theory. Shannon only
gave an outline of the proof. The first rigorous proof is due to Amiel Feinstein in
1954.




                                              10
       The Shannon theorem states that given a noisy channel with channel capacity
C and information transmitted at a rate R, then if R < C there exist codes that allow
the probability of error at the receiver to be made arbitrarily small. This means that,
theoretically, it is possible to transmit information nearly without error at any rate
below a limiting rate, C.

       The converse is also important. If R > C, an arbitrarily small probability of
error is not achievable. All codes will have a probability of error greater than a certain
positive minimal level, and this level increases as the rate increases. So, information
cannot be guaranteed to be transmitted reliably across a channel at rates beyond the
channel capacity. The theorem does not address the rare situation in which rate and
capacity are equal.

       Simple schemes such as "send the message 3 times and use a best 2 out of 3
voting scheme if the copies differ" are inefficient error-correction methods, unable to
asymptotically guarantee that a block of data can be communicated free of error.
Advanced techniques such as Reed–Solomon codes and, more recently, turbo codes
come much closer to reaching the theoretical Shannon limit, but at a cost of high
computational complexity. Using low-density parity-check (LDPC) codes or turbo
codes and with the computing power in today's digital signal processors, it is now
possible to reach very close to the Shannon limit. In fact, it was shown that LDPC
codes can reach within 0.0045 dB of the Shannon limit (for very long block lengths).

Mathematical statement:




Theorem (Shannon, 1948):

1. For every discrete memory less channel, the channel capacity




                                               11
       has the following property. For any ε > 0 and R < C, for large enough N, there
exists a code of length N and rate ≥ R and a decoding algorithm, such that the
maximal probability of block error is ≤ ε.

2. If a probability of bit error pb is acceptable, rates up to R(pb) are achievable, where




And H2(pb) is the binary entropy function




3. For any pb, rates greater than R(pb) are not achievable.

       (MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991),
p. 198; Shannon (1948) thm. 11)




               Error Correction in Communication Systems
                                                  Noise

             Binary                   Encoded                Noisy                        Corrected
          information     Encoder   information           information       Decoder      information
                        (Redundancy                                     (Error Detection
                           Added)                                        and Correction




           Error correction is widely used in most communication
           systems.




                        Figure 1: Error Correction in Communication systems




                                                      12
2.3 Row Permutation of Parity Check Matrix of LDPC Codes

           The Parity check matrix of a LDPC code is an array of circulant
submatrices.To achieve very high decoding throughput, an array of cyclic shifters is
needed to shuffle soft messages corresponding to multiple submatrices for check
nodes and variable nodes. In order to reduce the VLSI implementation complexity for
the shuffle network, the shifting structure in circulant matrices is extensively
exploited. Suppose the parity check matrix H of a LDPC code is a J×C array of P×P
circulant submatrices. With row permutation, it can be converted to a form as shown
in fig.2




                       Figure 2: Array of circulant sub matrices




                             Figure 3: Permuted Matrix




                                            13
           Where      is a P×P permutation matrix representing a single left or right
cyclic shift. The submatrix            can be obtained by cyclically shifting the submatrix
         for a single step. Ai   is   a J×P matrix determined by the shift offsets of the
circulant submatrices in block column i(i=1,2,...C),m is an integer such that P can be
divided by m.

       For example, the matrix Ha shown in Fig 1. is a 2×3 array of 8×8 cyclically
shifted identify submatrices. With the row permutation described in the following, a
new matrix Hb can be obtained, which has the form shown in (1).First,the first four
rows of the first block row of Ha are distributed to four block rows of Hb in a round-
robin fashion(i.e., rows 1-4 of Ha are distributed row 1,5,9,and 13 of Hb).Then the
second four rows are distributed in the same way. The permutation can be continued
until all rows in the first block row of matrix Ha are moved to matrix Hb. Then the
second block row of Ha Are distributed in the same way. It can be seen from Fig.2 that
Hb has the form shown in(1).In the previous example, the row distribution is started
from the first row of each block row. In general, the distribution can be started from
any row of a block row. To minimize the data dependency between two adjacent
block rows, an optimum row distribution scheme is desired.

            For an LDPC decoder which can process all messages corresponding to
the 1-components in an entire block row of matrix Hp (e.g., Hb in Fig.2), the shuffle
network for LDPC decoding can be implemented with very simple data shifters.




                                                  14
 Message Passing (Row processing )
                                                                                         Initial value
                                                                            (received information from channel )

  0      0 1 1 0 0 0 1 0
                                                                                     Row
                                                                                    processing
  1      0 0 0 1 0 0 0 1
                                                                                           α
  0      1 0 0 0 1 1 0 0                                                                   Col
H                                                                                      processing
  0      0 1 0 1 0 1 0 0                                                                   β
  1      0 0 0 0 1 0 1 0                                                             Error
                                                                                  correction
  0
         1 0 1 0 0 0 0 1
                         
                                                                                    Parity check


                                     
MinSum:
                  
            ij     
                   j' ,h 1,
                                      
                            sign ij'   min j' hij ' 1 j' j ij'
                                      
                                                               
                   ij '              




 Message Passing (Column processing )
                                                                                   Initial value
  0      0 1 1 0 0 0 1 0
                                                                                  Row
  1      0 0 0 1 0 0 0 1
                                                                                 processing
  0      1 0 0 0 1 1 0 0                                                              α
H                                                                                      Col
  0      0 1 0 1 0 1 0 0
                                                                                       processing
  1      0 0 0 0 1 0 1 0                                                                β
                        
  0
         1 0 1 0 0 0 0 1
                         
                                                                                    Error
                                                                                 correction

       ij   j         j' j
                                          ij'                                    Parity check



λj is the received information.




                                                                       15
                                                                        Initial value
          0 0            1   1       0   0   0     1    0
           α                                                            Row
          1 0            0   0       1   0   0     0    1            processing
          0 1            0   0       0   1   1     0    0                   α
        H                                                                    Col
          0 0            1   0       1   0   1     0    0                  processing
                                                                                β
          1α 0           0   0       0   1   0     1    0
                                                                        Error
                                                                       correction
          0 1
                         0   1       0   0   0     0    1
                                                          
                                                                       Parity check
       λy1
        1

                                                        1          if yi  0 
                                                   Vi                       
                                                        0          if yi  0 




                                                                                   Initial value
                                               ^ 
                                               v 0 
                                               ^                                   Row
                                               v 1                              processing
                                               ^                                       α
                                                v 2
         0   0   1   1   0   0   0   1   0                                             Col
                                           ^                                        processing
         1   0   0   0   1   0   0   0   1 v 3                                         β
         0   1   0   0   0   1   1   0   0  ^ 
                                                        = 0 (Stop decoding)
                                                                                     Error
      H                                     v 4
         0   0   1   0   1   0   1   0   0  ^       ≠0 (Repeat decoding)      correction
         1   0   0   0   0   1   0   1   0 v 5 
                                                                             Parity check
         0
             1   0   1   0   0   0   0   1  ^ 
                                           
                                                v 6
                                               ^ 
                                               v 7 
                                               ^ 
                                               v 8 
                                               
                                                   
                                                    




LDPC Codes:

     An LDPC code is defined by a binary matrix called parity check matrix H.

     Rows define parity check equations (constrains) between encoded symbols in
      a code word and columns define the length of the code.

     V is a valid code word if H٠Vt=0.


                                                         16
   Decoder in the receiver checks if the condition H٠Vt=0 is valid.

   Example : Parity check matrix for (9, 5) LDPC code, row weight=4, column
    weight =2:

                                         v1 
                                         v 2 
       0    0 1 1 0 0 0 1          0  
       1                                v 3 
            0 0 0 1 0 0 0          1  
                                      v4
       0    1 0 0 0 1 1 0          0  
    H                                v 5    ≠ 0 (There is error)
       0    0 1 0 1 0 1 0          0          = 0 (There is no error)
       1                                v 6 
             0 0 0 0 1 0 1          0  
                                     v7
       0
            1 0 1 0 0 0 0          1  
                                      v8 
                                          
                                         v 9 




                                            17
                                 CHAPTER III

                            CHANNEL CODING

       This first chapter introduces the channel code decoding issue and the problem
of optimal code decoding in the case of linear block codes. First, the main notations
used in the thesis are presented and especially those related to the graph
representation of the linear block codes. Then the optimal decoding is discussed: it is
shown that under the cycle-free hypothesis, the optimal decoding can be processed
using an iterative algorithm. Finally, the performance of error correcting codes is also
discussed.

3.1 Optimal decoding:

       Communication over noisy channels can be improved by the use of a channel
code C, as demonstrated by C. E. Shannon for its famous channel coding theorem
“Let a discrete channel have the capacity C and a discrete source the entropy per
second H. If H _ C there exists a coding system such that the output of the source can
be transmitted over the channel with an arbitrarily small frequency of errors. If H > C
it is possible to encode the source so that the equivocation is less than H.” This
theorem states that below a maximum rate R, which is equal to the capacity of the
channel, it is possible to find error correction codes to achieve any given probability
of error. Since this theorem does not explain how to make such a code, it has been the
kick-off for a lot of activities in the coding theory community. When Shannon
announced his theory in the July and October issues of the Bell System Technical
Journal in 1948, the largest communications cable in operation at that time carried
1800 voice conversations. Twenty-five years later, the highest capacity cable was
carrying 230000 simultaneous conversations. Today a single optical fiber as thin as a
human hair can carry more than 6.4 million conversations. In the quest of capacity
achieving codes, the performance of the codes is measured by their gap to the
capacity. For a given code, the smallest gap is obtained by an optimal decoder: the
maximum a-posteriori (MAP) decoder. Before dealing with the optimal decoding,
some notations within a model of the communication scheme are presented hereafter.



                                              18
                Figure 4: Basic scheme for channel code encoding/decoding.

3.1.1 Communication model

       It depicts a classical communication scheme. The source block delivers
information by the mean of sequences which are row vectors x of length K. The
encoder block delivers the codeword c of length N, which is the coded version of x.
The code rate is defined by the ratio R = K/N. The codeword c is sent over the
channel and the vector y is the received word: a distorted version of c. The matched
filters, the modulator and the demodulator, and the synchronization is supposed to
work perfectly. Hence, the channel is represented by a discrete time equivalent model.
The channel is a non-deterministic mapper between its input c and its output y. We

       assume that y depends on c via a conditional probability density function (pdf)
p (y|c). We assume also that the channel is memory less:




       For example: If the channel is the binary-input additive white Gaussian noise
(BI-AWGN), and if the modulation is a binary phased shift keying (BPSK)
modulation with the




       On Figure 4, two types of decoder are depicted: decoders of type 1 have to
compute the best estimation ˆx of the source word x; decoders of type 2 compute the
                                             19
best estimation ˆc of the sent codeword c. In this case, ˆx is extracted from ˆc by a post
processing (reverse processing of the encoding) when the code is non-systematic.
Both decoders can perform two types of decoding:

3.2 Classes of LDPC codes:

         R. Gallager defined an (N, j, k) LDPC codes as a block code of length N
having a small fixed number (j) of ones in each column of the parity check H, and a
small fixed number (k) of ones in each rows of H. This class of codes is then to be
decoded by the iterative algorithm described in chapter 1. This algorithm computes
exact a posteriori probabilities, provided that the Tanner graph of the code is cycle
free. Generally, LDPC codes do have cycles. The sparseness of the parity check
matrix aims at reducing the number of cycles and at increasing the size of the cycles.
Moreover, as the length N of the code increases, the cycle free hypothesis becomes
more and more realistic. The iterative algorithm is processed on these graphs.
Although it is not optimal, it performs quite well. Since then, LDPC codes class have
been enlarged to all sparse parity check matrices, thus creating a very wide class of
codes, including the extension to codes in GF(q) and irregular LDPC codes

Irregularity:

         In the Gallager’s original LDPC code design, there is a fixed number of ones
in both the rows (k) and the columns (j) of the parity check matrix: it means that each
bit is implied in j parity check constraints and that each parity check constraint is the
exclusive-OR (XOR) of k bits. This class of codes is referred to as regular LDPC
codes. On the contrary, irregular LDPC codes do not have a constant number of non-
zero entries in the rows or in the columns of H. They are specified by the distribution
degree of the bit _(x) and of the parity check constraints ρ(x), using the notations of,
where:




                                               20
       Similarly, denoting by ρi the proportion of rows having weight i:




Code rate:

       The rate R of LDPC codes is defined by is the design code rate. Rd = R if the
parity check matrix has full rank. The authors of have shown that as N increases, the

                                             21
parity-check matrix is almost sure to be full rank. Hereafter, we will assume that R =
Rd unless the contrary is mentioned. The rate R is then linked to the other parameters
of the class by




Note that in general, for random constructions, when j is odd:




and when j is even:




3.2.1 Optimization of LDPC codes:

       The bounds and performance of LDPC codes are derived from their
parameters set. The wide number of independent parameters enables to tune them so
as to fit some external constraint, as a particular channel, for example. Two
algorithms can be used to design a class of irregular LDPC codes under some channel
constraints: the density evolution algorithm and the extrinsic information transfer
(EXIT) charts.

Density evolution algorithm:

       Richardson designed capacity approaching irregular codes with the density
evolution (DE) algorithm. This algorithm tracks the probability density function (pdf)
of the messages through the graph nodes under the assumption that the cycle free
hypothesis is verified. It is a kind of belief propagation algorithm with pdf messages
instead of log likelihood ratios messages. Density evolution is processed on the
asymptotical performance of the class of LDPC codes. It means that a infinite number
of iterations is processed on a infinite code-length LDPC code: if the length of the
code tends to infinity, the probability that a randomly chosen node belongs to a cycle
of a given length tends towards zero.



                                              22
       Usually, either the channel threshold or the code rates are optimized under the
constraints of the degree distributions and of the SNR. The threshold of the channel is
the value of the channel parameter above which the probability tends towards zero if
the iterations are infinite (and the code length also). Optimization tries to lower the
threshold or to higher the rate as best as possible. In for example, the authors designed
a rate−1/2 irregular LDPC codes for binary-input AWGN channels that approach the
Shannon limit very closely (up to 0.0045 dB). Optimization based on DE algorithm
are often processed by the mean of differential evolution algorithm when
optimizations are non-linear, as for example in where the authors optimize an
irregular LDPC code for uncorrelated flat Rayleigh fading channels. The Gaussian
approximation in the DE algorithm can also be used: the probability density functions
of the messages are assumed to be Gaussian and the only parameters that has to be
tracked in the nodes is the mean.

EXIT chart:

       Extrinsic information transfer (EXIT) charts are 2D graphs on which are
superposed the mutual information transfers through the 2 constituent codes of a
turbocode. EXIT charts have been transposed to the LDPC code optimization

3.2.2 Regular vs. Irregular LDPC codes:

   •    An LDPC code is regular if the rows and columns of H have uniform weight,
       i.e. all rows have the same number of ones (dv) and all columns have the same
       number of ones (dc)

           – The codes of Gallager and MacKay were regular (or as close as
               possible)

           – Although regular codes had impressive performance, they are still
               about 1 dB from capacity and generally perform worse than turbo
               codes

   •   An LDPC code is irregular if the rows and columns have non-uniform weight

           – Irregular LDPC codes tend to outperform turbo codes for block lengths
               of about n>105

                                               23
   •   The degree distribution pair (λ, ρ) for a LDPC code is defined as


                                              dv
                                   ( x)   i x i 1
                                              i 2
                                               dc
                                   ( x)   i x i 1
                                              i 1




   •   λi, ρi represent the fraction of edges emanating from variable (check) nodes of
       degree i

3.2.3 Constructing Regular LDPC Codes:

   •   Around 1996, Mackay and Neal described methods for constructing sparse H
       matrices

   •   The idea is to randomly generate a M × N matrix H with weight dv columns
       and weight dc rows, subject to some constraints

   •   Construction 1A: Overlap between any two columns is no greater than 1

          – This avoids length 4 cycles

   •   Construction 2A: M/2 columns have dv =2, with no overlap between any pair
       of columns. Remaining columns have dv =3. As with 1A, the overlap between
       any two columns is no greater than 1

   •   Construction 1B and 2B: Obtained by deleting select columns from 1A and 2A

          – Can result in a higher rate code

3.2.4 Constructing Irregular LDPC Codes:

   •   Luby developed LDPC codes based on irregular LDPC Tanner graphs

   •   Message and check nodes have conflicting requirements

          – Message nodes benefit from having a large degree

                                              24
           – LDPC codes perform better with check nodes having low degrees

   •   Irregular LDPC codes help balance these competing requirements

           – High degree message nodes converge to the correct value quickly

           – This increases the quality of information passed to the check nodes,
               which in turn helps the lower degree message nodes to converge

   •   Check node degree kept as uniform as possible and variable node degree is
       non-uniform

           – Code 14: Check node degree =14, Variable node degree =5, 6, 21, 23

   •   No attempt made to optimize the degree distribution for a given code rate

3.3 Constructions of LDPC codes:

       By constructions of LDPC codes, we mean the construction, or design, of a
particular LDPC parity check matrix H. The design of H is the moment when the
asymptotical constraints (the parameters of the class you designed, like the degree
distribution, the rate) have to meet the practical constraints (finite dimension, girths).
Hereafter are described some recipes taking into account some practical constraints.
Two techniques exist in the literature: random and deterministic ones. The design
compromise is that for increasing the girth, the sparseness has to be decreased
yielding poor code performance due to a low minimum distance. On the contrary, for
high minimum distance, the sparseness has to be increased yielding the creation of
low-length girth, due to the fact that H dimensions are finite, and thus, yielding a poor
convergence of the belief propagation algorithm.

3.3.1 Random based construction:

       The first constructions of LDPC codes were random ones. The parity check
matrix is the concatenation and/or superposition of sub-matrices; these sub-matrices
are created by processing some permutations on a particular (random or not) sub-
matrix which usually has a column weight of 1. R. Gallager’s construction for
example is based on a short matrix H0. Then j matrices Пi(H0) are vertically stacked
on H0, where Пi(H0) denotes a column permutation of H0 (see figure 5).

                                               25
       Regular and irregular codes can be also constructed like in where the 2 sets of
nodes are created, each node appearing as many times as its degree’s value. Then a
one to one association is randomly mapped between the nodes of the 2 sets, like
illustrated on figure 2.3. D. MacKay compares random constructions of regular and
irregular LDPC codes: small girth have to be avoided, especially between low weight
variables.

       All the constructions described above should be constrained by the girth’s
value. Yet, increasing the girth from 4 to 6 and above is not trivial; some random
constructions specifically address this issue. The authors generate a parity check
matrix optimizing the length of the girth or the rate of the code when M is




       Figure 5: Some random constructions of regular LDPC parity check matrices
based on Gallager’s (a) and MacKay’s constructions (b, c) (MacKay, Wilson, and
Davey 1999). Example of a regular (3, 4) LDPC code of length N = 12. Girths of
length 4 have not been avoided. The permutations can be either columns permutation
(a, b) or rows permutations.




                                              26
3.3.2 Deterministic based construction:

       Random constructions don’t have too many constraints: they can fit quite well
to the parameters of the desired class. The problem is that they do not guarantee that
the girth will be small enough. So either post-processing or more constraints are
added for the random design, yielding sometimes much complexity. To circumvent
the girth problem, deterministic constructions have been developed. Moreover,
explicit constructions can lead to easier encoding, and can be also easier to handle in
hardware. 2 branches in combinatorial mathematics are involved in such designs:
finite geometry and Balanced Incomplete Block Design’s (BIBSs). They seem to be
more efficient than previous an algebraic construction which was based on expander.
The authors designed high rate LDPC codes based on Steiner systems.

       Their conclusion was that the minimum distance was not high enough and that
difference set cyclic (DSC) codes should outperform them, where they are combined
with the one step majority logic decoding. The authors present LDPC code
constructions based on finite geometry, like in (Johnson and Weller 2003a) for
constructing very high rate LDPC codes. Balanced incomplete block designs (BIBDs)
have also been.

       The major drawback for deterministic constructions of LDPC codes is that
they exist with a few combinations of parameters. So it may be difficult to find one
that fits the specifications of a given system.

3.4 Code construction

       For large block sizes, LDPC codes are commonly constructed by first studying
the behaviour of decoders. As the block size tends to infinity, LDPC decoders can be
shown to have a noise threshold below which decoding is reliably achieved, and
above which decoding is not achieved. This threshold can be optimized by finding the
best proportion of arcs from check nodes and arcs from variable nodes. An
approximate graphical approach to visualizing this threshold is an EXIT chart..

       The cons ruction of a specific LDPC code after this optimization falls into two
main types of techniques:



                                                  27
                   Pseudo-random approaches

                   Combinatorial approaches

        Construction by a pseudo-random approach builds on theoretical results that,
for large block size, a random construction gives good decoding performance. In
general, pseudo-random codes have complex encoders; however pseudo-random
codes with the best decoders can have simple encoders. Various constraints are often
applied to help ensure that the desired properties expected at the theoretical limit of
infinite block size occur at a finite block size.

        Combinatorial approaches can be used to optimize properties of small block-
size LDPC codes or to create codes with simple encoders.

3.4.1 Random number generation

        A random number generator (often abbreviated as RNG) is a computational or
physical device designed to generate a sequence of numbers or symbols that lack any
pattern, i.e. appear random.

        The many applications of randomness have led to the development of several
different methods for generating random data. Many of these have existed since
ancient times, including dice, coin flipping, the shuffling of playing cards, the use of
yarrow stalks (by divination) in the I Ching, and many other techniques. Because of
the mechanical nature of these techniques, generating large amounts of sufficiently
random numbers (important in statistics) required a lot of work and/or time. Thus,
results would sometimes be collected and distributed as random number
tables.Nowadays, after the advent of computational random number generators, a
growing number of government-run lotteries, and lottery games, are using RNGs
instead of more traditional drawing methods, such as using ping-pong or rubber balls.
RNGs are also used today to determine the odds of modern slot machines.

        Several computational methods for random number generation exist, but often
fall short of the goal of true randomness — though they may meet, with varying
success, some of the statistical tests for randomness intended to measure how
unpredictable their results are (that is, to what degree their patterns are discernible).


                                                    28
Only in 2010 was the first truly random computational number generator produced,
recurring to principles of quantum physics.

3.4.2 Pseudorandom number generator

        A pseudorandom number generator (PRNG), also known as a deterministic
random bit generator (DRBG), is an algorithm for generating a sequence of numbers
that approximates the properties of random numbers. The sequence is not truly
random in that it is completely determined by a relatively small set of initial values,
called the PRNG's state. Although sequences that are closer to truly random can be
generated using hardware random number generators, pseudorandom numbers are
important in practice for simulations (e.g., of physical systems with the Monte Carlo
method), and are central in the practice of cryptography and procedural generation.
Common classes of these algorithms are linear congruential generators, Lagged
Fibonacci generators, linear feedback shift registers, feedback with carry shift
registers, and generalized feedback shift registers. Recent instances of pseudorandom
algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister.

        Careful mathematical analysis is required to have any confidence a PRNG
generates numbers that are sufficiently "random" to suit the intended use. Robert
R.Coveyou of Oak Ridge National Laboratory once titled an article, "The generation
of random numbers is too important to be left to chance." As John von Neumann
joked, "Anyone who considers arithmetical methods of producing random digits is, of
course, in a state of sin."

Periodicity:

        A PRNG can be started from an arbitrary starting state using a seed state. It
will always produce the same sequence thereafter when initialized with that state. The
maximum length of the sequence before it begins to repeat is determined by the size
of the state, measured in bits. However, since the length of the maximum period
potentially doubles with each bit of 'state' added, it is easy to build PRNGs with
periods long enough for many practical applications.

        If a PRNG's internal state contains n bits, its period can be no longer than 2n
results. For some PRNGs the period length can be calculated without walking through

                                              29
the whole period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have
periods of exactly 2n−1. Linear congruential generators have periods that can be
calculated by factoring.[citation needed] Mixes (no restrictions) have periods of about 2n/2
on average, usually after walking through a no repeating starting sequence. Mixes that
are reversible (permutations) have periods of about 2n−1 on average, and the period
will always include the original internal state. Although PRNGs will repeat their
results after they reach the end of their period, a repeated result does not imply that
the end of the period has been reached, since its internal state may be larger than its
output; this is particularly obvious with PRNGs with a 1-bit output.

       Most pseudorandom generator algorithms produce sequences which are
uniformly distributed by any of several tests. It is an open question, and one central to
the theory and practice of cryptography, whether there is any way to distinguish the
output of a high-quality PRNG from a truly random sequence without knowing the
algorithm(s) used and the state with which it was initialized. The security of most
cryptographic algorithms and protocols using PRNGs is based on the assumption that
it is infeasible to distinguish use of a suitable PRNG from use of a truly random
sequence. The simplest examples of this dependency are stream ciphers, which (most
often) work by exclusive or-ing the plaintext of a message with the output of a PRNG,
producing cipher text. The design of cryptographically adequate PRNGs is extremely
difficult; because they must meet additional criteria (see below). The size of its period
is an important factor in the cryptographic suitability of a PRNG, but not the only one.

3.4.3 "True" random numbers vs. pseudorandom numbers:

       There are two principal methods used to generate random numbers. One
measures some physical phenomenon that is expected to be random and then
compensates for possible biases in the measurement process. The other uses
computational algorithms that produce long sequences of apparently random results,
which are in fact completely determined by a shorter initial value, known as a seed or
key. The latter types are often called pseudorandom number generators.

       A "random number generator" based solely on deterministic computation
cannot be regarded as a "true" random number generator, since its output is inherently
predictable. How to distinguish a "true" random number from the output of a pseudo-

                                                30
random number generator is a very difficult problem. However, carefully chosen
pseudo-random number generators can be used instead of true random numbers in
many applications. Rigorous statistical analysis of the output is often needed to have
confidence in the algorithm.

3.5 Encoding of LDPC codes:

   •    A linear block code is encoded by performing the matrix multiplication c =
       uG

   •   A common method for finding G from H is to first make the code systematic
       by adding rows and exchanging columns to get the H matrix in the form H =
       [PT I]

             – Then G = [I P]

             – However, the result of the row reduction is a non-sparse P matrix

             – The multiplication c =[u uP] is therefore very complex

   •   As an example, for a (10000, 5000) code, P is 5000 by 5000

             – Assuming the density of 1’s in P is 0.5, then 0.5× (5000)2 additions are
                required per codeword

   •   This is especially problematic since we are interested in large n (>105)

   • An often used approach is to use the all-zero codeword in simulations.

   •    Richardson and Urbanke show that even for large n, the encoding complexity
   can be (almost) linear function of ‘n’

         -    “Efficient encoding of low-density parity-check codes”, IEEE Trans. Inf.
              Theory, Feb., 2001

   •   Using only row and column permutations, H is converted to an approximately

        lower triangular matrix

         -    Since only permutations are used, H is still sparse


                                               31
             -   The resulting encoding complexity in almost linear as a function of n

             -   An alternative involving a sparse-matrix multiply followed by
                 differential encoding has been proposed by Ryan, Yang, & Li….

             -   “Lowering the error-rate floors of moderate-length high-rate irregular
                 LDPC codes,” ISIT, 2003

         The weak point of LDPC codes is their encoding process: a sparse parity
check matrix does not have necessarily a sparse generator matrix. Moreover, it
appears to be particularly dense. So encoding by a G multiplication yields to an N2
complexity processing. A first encoding scheme is to deal with lower triangular shape
parity check matrices. The other encoding schemes are mainly to deal with cyclic
parity check matrices.

3.5.1 Lower-triangular shape based encoding:

         `         A first approach is to create a parity check matrix with an almost
lower-triangular shape, as depicted on figure 2.4(a). The performance is a little bit
affected by the lower-triangular shape constraint. Instead of computing the product c
     t                     t
= u G , the equation H.c = 0 is solved, where c is the unknown variable. The encoding
is systematic:




         The last                                          have to be solved without
reduced complexity. Thus, the higher M1 is the less complex the encoding is. T.
Richardson and R. Urbanke propose an efficient encoding of a parity check matrix H.
It is based on the shape depicted on figure 2.4-(b). They also propose some “greedy”
algorithms which transform any parity check matrix H into an equivalent parity check
matrix H0 using columns and rows permutations, minimizing. So H0 is still sparse.
The encoding complexity scales in O(N + g2) where g is a small fraction of N. As a



                                                 32
particular case the authors of and construct parity check matrices of the same shape
with g = 0.

3.5.2 Other encoding schemes:

Iterative encoding

       The authors derived a class of parity check codes which can be iteratively
encoded using the same graph-based algorithm as the decoder. But for irregular cases,
the code does not seem to perform as well as random ones.

Low-density generator matrices:

       The generator matrices of LDPC codes are usually not sparse, because of the
inversion. But if H is constructed both sparse and systematic, then:




       Where G is a sparse generator matrix (LDGM): they correspond to parallel
concatenated codes. They seem to have high error floors (asymptotically bad codes).
Yet, the authors of carefully chose and concatenate the constituent codes to lower the
error floor. Note that this may be a drawback for applications with high rate codes.

Cyclic parity-check matrices:

       The most popular codes that can be easily encoded are the cyclic or pseudo-
cyclic ones. A Gallager-like construction using cyclic shifts enables to have a cyclic
based encoder. Finite geometry or BIBDs constructed LDPC codes are also cyclic or
pseudo-cyclic. Table 2 gives a summary of the different encoding schemes.




                                              33
             Table 2: Summary of the different LDPC encoding schemes

3.6 Decoding LDPC codes:

       Like Turbo codes, LDPC can be decoded iteratively

        –   Instead of a trellis, the decoding takes place on a Tanner graph

        –   Messages are exchanged between the v-nodes and c-nodes

        –   Edges of the graph act as information pathways

   Hard decision decoding

        –   Bit-flipping algorithm

   Soft decision decoding

        –   Sum-product algorithm

               •   Also known as message passing/ belief propagation algorithm

        –   Min-sum algorithm

               •   Reduced complexity approximation to the sum-product
                   algorithm


                                           34
    In general, the per-iteration complexity of LDPC codes is less than it is for
       turbo codes

           –   However, many more iterations may be required (max100;avg30)

Thus, overall complexity can be higher than turbo.




                                             35
36
                                    CHAPTER IV

                              TERMINOLOGY

       WiMAX       (Worldwide       Interoperability for   Microwave Access)   is   a
telecommunications protocol that provides fixed and fully mobile internet access. The
current WiMAX revision provides up to 40 Mbit/s[1][2] with the IEEE 802.16m update
expected offer up to 1 Gbit/s fixed speeds. The name "WiMAX" was created by the
WiMAX Forum, which was formed in June 2001 to promote conformity and
interoperability of the standard. The forum describes WiMAX[3] as "a standards-based
technology enabling the delivery of last mile wireless broadband access as an
alternative to cable and DSL".[4]




       WiMAX base station equipment with a sector antenna and wireless modem on
top




       A pre-WiMAX CPE of a 26 km (16 mi) connection mounted 13 metres (43 ft)
above the ground (2004, Lithuania).

                                               37
        WiMAX      (Worldwide      Interoperability for     Microwave Access)           is   a
telecommunications protocol that provides fixed and fully mobile internet access. The
current WiMAX revision provides up to 40 Mbit/s with the IEEE 802.16m update
expected offer up to 1 Gbit/s fixed speeds. The name "WiMAX" was created by the
WiMAX Forum, which was formed in June 2001 to promote conformity and
interoperability of the standard. The forum describes WiMAX as "a standards-based
technology enabling the delivery of last mile wireless broadband access as an
alternative to cable and DSL".

        WiMAX refers to interoperable implementations of the IEEE 802.16 wireless-
networks standard (ratified by the WiMAX Forum), in similarity with Wi-Fi, which
refers to interoperable implementations of the IEEE 802.11 Wireless LAN standard
(ratified by the Wi-Fi Alliance). The WiMAX Forum certification allows vendors to
sell their equipment as WiMAX (Fixed or Mobile) certified, thus ensuring a level of
interoperability with other certified products, as long as they fit the same profile.

   The IEEE 802.16 standard forms the basis of 'WiMAX' and is sometimes referred
to colloquially as "WiMAX", "Fixed WiMAX", "Mobile WiMAX", "802.16d" and
"802.16e." Clarification of the formal names is as follow:

       802.16-2004 is also known as 802.16d, which refers to the working party that
        has developed that standard. It is sometimes referred to as "Fixed WiMAX,"
        since it has no support for mobility.

       802.16e-2005, often abbreviated to 802.16e, is an amendment to 802.16-2004.
        It introduced support for mobility, among other things and is therefore also
        known as "Mobile WiMAX".

   Mobile WiMAX is the WiMAX incarnation that has the most commercial interest
to date and is being actively deployed in many countries. Mobile WiMAX is also the
basis of future revisions of WiMAX. As such, references to and comparisons with
"WiMAX" in this Wikipedia article mean "Mobile WiMAX".

Uses:

        The bandwidth and range of WiMAX make it suitable for the following
potential applications:
                                                38
      Providing portable mobile broadband connectivity across cities and countries
       through a variety of devices.

      Providing a wireless alternative to cable and DSL for "last mile" broadband
       access.

      Providing data, telecommunications (VoIP) and IPTV services (triple play).

      Providing a source of Internet connectivity as part of a business continuity
       plan.

      Providing a network to facilitate machine to machine communications, such as
       for Smart Metering.

Broadband:

       Companies are deploying WiMAX to provide mobile broadband or at-home
broadband connectivity across whole cities or countries. In many cases this has
resulted in competition in markets which typically only had access to broadband
through an existing incumbent DSL (or similar) operator.

       Additionally, given the relatively low cost to deploy a WiMAX network (in
comparison to GSM, DSL or Fiber-Optic), it is now possible to provide broadband in
places where it may have not been economically viable.




                      A WiMAX USB modem for mobile internet

       There are numerous devices on the market that provide connectivity to a
WiMAX network. These are known as the "subscriber unit" (SU).


                                            39
        There is an increasing focus on portable units. This includes handsets (similar
to cellular smart phones); PC peripherals (PC Cards or USB dongles); and embedded
devices in laptops, which are now available for Wi-Fi services. In addition, there is
much emphasis by operators on consumer electronics devices such as Gaming
consoles, MP3 players and similar devices. It is notable that WiMAX is more similar
to Wi-Fi than to 3G cellular technologies.

        The WiMAX Forum website provides a list of certified devices. However, this
is not a complete list of devices available as certified modules are embedded into
laptops, MIDs (Mobile internet devices), and other private labeled devices.

WiMAX Gateways:

    WiMAX gateway devices are available as both indoor and outdoor versions from
several manufacturers. Many of the WiMAX gateways that are offered by
manufactures such as ZyXEL, Motorola, and Greenpacket are stand-alone self-install
indoor units. Such devices typically sit near the customer's window with the best
WiMAX signal, and provide:

       An integrated Wi-Fi access point to provide the WiMAX Internet connectivity
        to multiple devices throughout the home or business.

       Ethernet ports should you wish to connect directly to your computer or DVR
        instead.

       One or two PSTN telephone jacks to connect your land-line phone and take
        advantage of VoIP.

    Indoor gateways are convenient, but radio losses mean that the subscriber may
need to be significantly closer to the WiMAX base station than with professionally-
installed external units.

    Outdoor units are roughly the size of a laptop PC, and their installation is
comparable to the installation of a residential satellite dish. A higher-gain directional
outdoor unit will generally result in greatly increased range and throughput but with
the obvious loss of practical mobility of the unit.



                                               40
WiMAX Mobiles:

        HTC announced the first WiMAX enabled mobile phone, the Max 4G, on Nov
12th 2008. The device was only available to certain markets in Russia on the Yota
network.

        HTC released the second WiMAX enabled mobile phone, the EVO 4G, March
23, 2010 at the CTIA conference in Las Vegas. The device made available on June 4,
2010 is capable of EV-DO (3G) and WiMAX (4G) as well as simultaneous data &
voice sessions. The device also has a front-facing camera enabling the use of video
conversations. A number of WiMAX Mobiles are expected to hit the US market in
2010.

Technical information:




                             Illustration of a WiMAX MIMO board

4.1 WiMAX and the IEEE 802.16 Standard:

        The current WiMAX revision is based upon IEEE Std 802.16e-2005, approved
in December 2005. It is a supplement to the IEEE Std 802.16-2004, and so the actual
standard is 802.16-2004 as amended by 802.16e-2005. Thus, these specifications need
to be considered together.

        IEEE 802.16e-2005 improves upon IEEE 802.16-2004 by:




                                            41
      Adding support for mobility (soft and hard handover between base stations).
       This is seen as one of the most important aspects of 802.16e-2005, and is the
       very basis of Mobile WiMAX.

      Scaling of the Fast Fourier Transform (FFT) to the channel bandwidth in order
       to keep the carrier spacing constant across different channel bandwidths
       (typically 1.25 MHz, 5 MHz, 10 MHz or 20 MHz). Constant carrier spacing
       results in a higher spectrum efficiency in wide channels, and a cost reduction
       in narrow channels. Also known as Scalable OFDMA (SOFDMA). Other
       bands not multiples of 1.25 MHz are defined in the standard, but because the
       allowed FFT subcarrier numbers are only 128, 512, 1024 and 2048, other
       frequency bands will not have exactly the same carrier spacing, which might
       not be optimal for implementations.

      Advanced antenna diversity schemes, and hybrid automatic repeat-requeat
       (HARQ)

      Adaptive antenna Systems (AAS) and MIMO technology

      Denser sub-channelization, thereby improving indoor penetration

      Introducing Turbo Coding and Low-Density Parity Check (LDPC)

      Introducing downlink sub-channelization, allowing administrators to trade
       coverage for capacity or vice versa

      Fast Fourier Transform algorithm

      Adding an extra QoS class for VoIP applications.

       SOFDMA (used in 802.16e-2005) and OFDM256 (802.16d) are not
compatible thus equipment will have to be replaced if an operator is to move to the
later standard (e.g., Fixed WiMAX to Mobile WiMAX).

Physical layer:

       The original version of the standard on which WiMAX is based (IEEE 802.16)
specified a physical layer operating in the 10 to 66 GHz range. 802.16a updated in
2004 to 802.16-2004, added specifications for the 2 to 11 GHz range. 802.16-2004
                                             42
was updated by 802.16e-2005 in 2005 and uses Scalable Orthogonal Frequency-
Division Multiple Acess (SOFDMA) as opposed to the fixed Orthogonal Frequency-
Division Multiple Acess (OFDM) version with 256 sub-carriers (of which 200 are
used) in 802.16d. More advanced versions, including 802.16e, also bring multiple
antenna support through MIMO (See WiMAX MIMO). This brings potential benefits
in terms of coverage, self installation, power consumption, frequency re-use and
bandwidth efficiency.

MAC (data link) layer:

        The WiMAX MAC uses a Scheduling algorithm for which the subscriber
station needs to compete only once for initial entry into the network. After network
entry is allowed, the subscriber station is allocated an access slot by the base station.
The time slot can enlarge and contract, but remains assigned to the subscriber station,
which means that other subscribers cannot use it.

        In addition to being stable under overload and over-subscription, the
scheduling algorithm can also be more bandwidth efficient. The scheduling algorithm
also allows the base station to control Quality of Service (QoS) parameters by
balancing the time-slot assignments among the application needs of the subscriber
stations.

Spectrum allocation:

        There is no uniform global licensed spectrum for WiMAX; however the
WiMAX Forum has published three licensed spectrum profiles: 2.3 GHz, 2.5 GHz
and 3.5 GHz, in an effort to drive standardization and decrease cost.

        In the USA, the biggest segment available is around 2.5 GHz, and is already
assigned, primarily to Sprint Nextel and Clear wire. Elsewhere in the world, the most-
likely bands used will be the Forum approved ones, with 2.3 GHz probably being
most important in Asia. Some countries in Asia like India and Indonesia will use a
mix of 2.5 GHz, 3.3 GHz and other frequencies. Pakistan's Wateen Telecom uses
3.5 GHz.

        Analog TV bands (700 MHz) may become available for WiMAX usage, but
await the complete roll out of digital TV, and there will be other uses suggested for
                                               43
that spectrum. In the USA the FCC auction for this spectrum began in January 2008
and, as a result, the biggest share of the spectrum went to Verizon Wireless and the
next biggest to AT&T. Both of these companies have stated their intention of
supporting LTE, a technology which competes directly with WiMAX. EU
commissioner Viviane Reding has suggested re-allocation of 500–800 MHz spectrum
for wireless communication, including WiMAX.

       WiMAX profiles define channel size, TDD/FDD and other necessary
attributes in order to have inter-operating products. The current fixed profiles are
defined for both TDD and FDD profiles. At this point, all of the mobile profiles are
TDD only. The fixed profiles have channel sizes of 3.5 MHz, 5 MHz, 7 MHz and
10 MHz. The mobile profiles are 5 MHz, 8.75 MHz and 10 MHz. (Note: the 802.16
standard allows a far wider variety of channels, but only the above subsets are
supported as WiMAX profiles.)

       Since October 2007, the Radio communication Sector of the International
Telecommunication Union (ITU-R) has decided to include WiMAX technology in the
IMT-2000 set of standards.[21] This enables spectrum owners (specifically in the 2.5-
2.69 GHz band at this stage) to use WiMAX equipment in any country that recognizes
the IMT-2000.

Spectral efficiency:

       One of the significant advantages of advanced wireless systems such as
WiMAX is spectral efficiency. For example, 802.16-2004 (fixed) has a spectral
efficiency of 3.7 (bit/s)/Hertz, and other 3.5–4G wireless systems offer spectral
efficiencies that are similar to within a few tenths of a percent. This multiplies the
effective spectral efficiency through multiple reuse and smart network deployment
topologies. The direct use of frequency domain organization simplifies designs using
MIMO-AAS compared to CDMA/WCDMA methods, resulting in more effective
systems.

Inherent Limitations:

       A commonly-held misconception is that WiMAX will deliver 70 Mbit/s over
50 kilometers. Like all wireless technologies, WiMAX can either operate at higher

                                             44
bitrates or over longer distances but not both: operating at the maximum range of
50 km (31 miles) increases bit error rate and thus results in a much lower bitrate.
Conversely, reducing the range (to under 1 km) allows a device to operate at higher
bitrates.

         A recent city-wide deployment of WiMAX in Perth, Australia, has
demonstrated that customers at the cell-edge with an indoor CPE typically obtain
speeds of around 1–4 Mbit/s, with users closer to the cell tower obtaining speeds of up
to 30 Mbit/s.[citation needed]

         Like all wireless systems, available bandwidth is shared between users in a
given radio sector, so performance could deteriorate in the case of many active users
in a single sector. However, with adequate capacity planning and the use of WiMAX's
Quality of Service, a minimum guaranteed throughput for each subscriber can be put
in place. In practice, most users will have a range of 4-8 Mbit/s services and
additional radio cards will be added to the base station to increase the number of users
that may be served as required.

Silicon implementations:

         A critical requirement for the success of a new technology is the availability of
low-cost chipsets and silicon implementations.

         WiMAX has a strong silicon ecosystem with a number of specialized
companies producing baseband ICs and integrated RFICs for implementing full-
featured WiMAX Subscriber Stations in the 2.3, 2.5 and 3.5Ghz band (refer to
'Spectrum allocation' above). It is notable that most of the major semiconductor
companies have not developed WiMAX chipsets of their own and have instead
chosen to invest in and/or utilize the well developed products from smaller specialists
or start-up suppliers. These companies include but not limited to Beceem, Sequans
and Pico Chip. The chipsets from these companies are used in the majority of
WiMAX devices.

         Intel Corporation is a leader in promoting WiMAX, but has limited its
WiMAX chipset development and instead chosen to invest in these specialized



                                                45
companies producing silicon compatible with the various WiMAX deployments
throughout the globe.

Comparison with Wi-Fi:

       Comparisons and confusion between WiMAX and Wi-Fi are frequent because
both are related to wireless connectivity and Internet access.

      WiMAX is a long range system, covering many kilometers that uses licensed
       or unlicensed spectrum to deliver connection to a network, in most cases the
       Internet.

      Wi-Fi uses unlicensed spectrum to provide access to a local network.

      Wi-Fi is more popular in end user devices.

      Wi-Fi runs on the Media Access Control's CSMA/CA protocol, which is
       connectionless and contention based, whereas WiMAX runs a connection-
       oriented MAC.

      WiMAX and Wi-Fi have quite different quality of service (QoS) mechanisms:

           o      WiMAX uses a QoS mechanism based on connections between the
                  base station and the user device. Each connection is based on specific
                  scheduling algorithms.

           o      Wi-Fi uses contention access - all subscriber stations that wish to pass
                  data through a wireless access point (AP) are competing for the AP's
                  attention on a random interrupt basis. This can cause subscriber
                  stations distant from the AP to be repeatedly interrupted by closer
                  stations, greatly reducing their throughput.

      Both 802.11 and 802.16 define Peer-to-Peer (P2P) and ad hoc networks, where
       an end user communicates to users or servers on another Local Area Network
       (LAN) using its access point or base station. However, 802.11 supports also
       direct ad hoc or peer to peer networking between end user devices without an
       access point while 802.16 end user devices must be in range of the base
       station.

                                                 46
       Wi-Fi and WiMAX are complementary. WiMAX network operators typically
provide a WiMAX Subscriber Unit which connects to the metropolitan WiMAX
network and provides Wi-Fi within the home or business for local devices (e.g.,
Laptops, Wi-Fi Handsets, smartphones) for connectivity. This enables the user to
place the WiMAX Subscriber Unit in the best reception area (such as a window), and
still be able to use the WiMAX network from any place within their residence.

WiMAX Forum:

       The WiMAX Forum is a non profit organization formed to promote the
adoption of WiMAX compatible products and services.

       A major role for the organization is to certify the interoperability of WiMAX
products. Those that pass conformance and interoperability testing achieve the
"WiMAX Forum Certified" designation, and can display this mark on their products
and marketing materials. Some vendors claim that their equipment is "WiMAX-
ready", "WiMAX-compliant", or "pre-WiMAX", if they are not officially WiMAX
ForumCertified.
       Another role of the WiMAX Forum is to promote the spread of knowledge
about WiMAX. In order to do so, it has a certified training program that is currently
offered in English and French. It also offers a series of member events and endorses
some industry events.

WiMAX Spectrum Owners Alliance:




                                            WiSOA logo

WiSOA was the first global organization composed exclusively of owners of WiMAX
spectrum with plans to deploy WiMAX technology in those bands. WiSOA focussed
on the regulation, commercialisation, and deployment of WiMAX spectrum in the
2.3–2.5 GHz and the 3.4–3.5 GHz ranges. WiSOA merged with the Wireless
Broadband Alliance in April 2008.


                                            47
                                           CHAPTER V

   Approximate Layered Decoding Approach for Pipelined
                       Decoding
         Recently, layered decoding approach has been found to converge much faster
than conventional TPMP decoding approach. With layered decoding approach, the
parity    check    matrix     of   an   LDPC     code   is   partitioned   into    L   layers

                             : The layer       defines a supercode         and the original
LDPC code is the intersection of all supercodes:                                  The column
weight of each layer is at most 1.

         Let       denote the check-to-variable message from the check node _ to the

variable node      , and       represent the variable-to-check message from the variable
node      to the check node c. In the kth iteration, the log-likelihood ratio (LLR)

message from layer t to the next layer for variable node          is represented by         ,

where                       . The layered message passing with Min-Sum lgorithm can be
formulated as (2)–(4).




         In (3),            denotes the set of variable nodes connected to the check node
   excluding the variable node          In an LDPC decoder, the check node unit (CNU) is
for the computation shown in (3) and the variable node unit (VNU) performs (2) and
(4). In the case that all soft messages corresponding to the 1-components in an entire
block row of parity check matrix are processed in a clock period, the computations
shown in (2)–(4) are sequentially performed. The long computation delay in the CNU
inevitably limits the maximum achievable clock speed. Usually pipelining technique
can be utilized to reduce the critical path in computing units. However, due to the data

                                                  48
dependency between two consecutive layers in layered decoding, pipelining technique
can not be applied directly. For instance, suppose that one stage pipelining latch is

introduced into every CNU. To compute                    messages corresponding to the third

block row of                  messages are needed, which cannot be determined until

        messages are computed with (3). Due to the one-clock delay caused by the

pipelining stage in CNUs, _                messages are not available in the required clock
cycle. The data dependency between layer 3 and layer 2 occurs at column 4, 8, 9, and
13 as marked by bold squares in Fig. 2. To enable pipelined decoding, we propose an
approximation of layered decoding approach. Let us rewrite (3) as the following:




         where              is the variable node set            The data dependency between

layer     and         occurs in the column positions corresponding to the variable node

set        For the variable nodes            belonging to the variable node set               the
following equation is satisfied:




         The item                                        in (6) is the incremental change of

message          corresponding to layer             in the       decoding iteration, where

represents the check node in the layer                 that connects to         In iterative LDPC
decoding,       the   (6)      can    be     approximated      using      (7)     if   the   item

                                  is used for updating            instead of           . Thus, by


                                                    49
slightly changing the updating order of message      corresponding to the variable
node set    the (2) can be approximated by (7).




       Based on the previous consideration, an approximate layered decoding
approach is formulated as (8)–(10).




       Where         is a small integer. In order to demonstrate the decoding
performance of the proposed approach, a (3456, 1728), (3, 6) rate-0.5 QC-LDPC code
                                            50
constructed with progressive edge-growth (PEG) approach [12] is used. Its parity
check matrix is permuted as discussed in Section II. The number of rows in each layer
is 144. The parameter        in (8) and (10) is set to 2 to enable two stage pipelines. The
maximum iteration number is set to 15. It can be observed from Fig. 3 that the
proposed approach has about 0.05 dB performance degradation compared with the
standard layered decoding scheme. The conventional TPMP approach has about 0.2
dB performance loss compared with the standard layered decoding scheme because of
its slowconvergence speed. It should be noted that, by increasing the maximum
iteration number, the performance gap among the three decoding schemes decreases.
However, the achievable decoding throughput is reduced.

5. 1 Decoder Architecture with Layered Decoding Approach

5.1.1 Overall Decoder Architecture:

      The proposed decoder computes the check-to-variable messages, variable-to-
check messages, and LLR messages corresponding to an entire block row of
matrix in one clock cycle. The decoder architecture is shown in Fig. 4. It consists of
the following five portions.

1).     layer      -register arrays. Each layer is used to store the check-tovariable
messages          corresponding to the 1-components in a block row of matrix           .At
each clock cycle,           messages in one layer are vertically shifted down to the
adjacent layer.

2). A check node unit (CNU) array for generating the            messages for one layer of
R- register array in a clock cycle. The dashed-lines in the CNU array denote two
pipeline stages.


3).      LLR-register arrays. Each LLR-register array stores the                 messages
corresponding to a block column of matrix               .

4).      variable node unit (VNU) arrays. Each VNU array is used for computing the
variable-to-check messages and LLR messages corresponding to a block column of

matrix          Each VNU is composed of two adders.

                                                51
5)        data shifters. The     messages corresponding to a block column of matrix

     is shifted one step by a    data shifter array.




                                 Figure 6: Decoder Architecture

In Fig. 6, each VNU, MUX, and data shifter is used to represent          computing unit
arrays.

          In the decoding initialization, the intrinsic messages are transferred to LLR-
register arrays via the MUX1 arrays. At the first       clock cycles,      messages are
not available due to the _ pipeline stages in the CNU array. Therefore, the MUX2
arrays are needed to prevent LLR-registers from being updated. In one clock cycle,
only a portion of LLR-messages are updated. The updated LLR-messages correspond

to the 1-component in the layer of matrix        are sent to data shifter via computation
path. The remained LLR-messages are directly sent to the data shifter from the LLR-
register array.




                                                52
5.1.2 Critical path of the Proposed Architecture:

               The computation path of the proposed architecture is shown in Fig.
5.The equations shown in (8)–(10) are sequentially performed. The computation
results of (8) are represented in two’s complement format. It is convenient to use the
sign-magnitude representation for the computation expressed in (9). Thus, two’s
complement to sign-magnitude data conversion is needed before data are sent to
CNU. The        messages from CNU array and R-register arrays are in a compressed
form to reduce memory requirement. More details are explained in the next
paragraph. To recover the individual       messages, a data distributor is needed. The
     messages sent out by the data distributor are in sign-magnitude representation.
Consequently, sign-magnitude to two’s complement conversion is needed before data
are sent to VNU. In this design, the computation path is divided into three segments.
The implementation of the SM-to-2’S unit and the adder in segment-1 can be
optimized by merging the adder into the SM-to-2’S unit to reduce computation delay.
With the Min-Sum algorithm, the critical task of a CNU is to find the two smallest
magnitudes from all input data and identify the relative position of the input data with
the smallest magnitude. An efficient CNU implementation approach was proposed .
The dataflow in a CNU is very briefly discussed in this paper. Since six inputs are
considered for this design, four computation steps are needed in a CNU. The first step
is compare-and-swap. Then, two pseudo rank order filter (PROF) stages are needed.
In the last step, the two smallest magnitudes are corrected using a scaling factor

(usually,   is set as 3/4). In this way, the        messages output by a CNU are in a
compressed form with four elements, i.e., the smallest magnitude, the second smallest

magnitude, the index of the smallest magnitude, and the signs of all       messages. It
can be observed that the critical path of segment-1 is three adders and four
multiplexers. The longest logic path of segment-2 includes three adders and two
multiplexers. The adder in the last stage can be implemented with a [4:2] compressor
and a fast adder. The data shifter can be implemented with one-level multiplexers.
The detail is illustrated in Section IV-C. Thus, the computation delay of segment-3 is
less than that of either segment-1 or segment-2. By inserting two pipeline stages
among the three segments, the critical path of the overall decoder architecture is
reduced to three adders and four 2:1 multiplexers.

                                               53
Data Shifter:

          It can be seen from Fig. 2 that by a single left cyclic shift, the block

            is   identical   to   ,   for                    and
Therefore, repeated single-step left cyclic-shift operations can ensure the message
alignment for all layers in a decoding iteration. After the messages corresponding to
the last block row are processed, a reverse cyclic-shift operation is needed for the next
decoding iteration. Based on the previous observation, only the edges of the tanner
graph for the first layer of matrix              are mapped to the fixed hardware
interconnection in the proposed decoder. A very simple data shifter which is
composed of one level two-input one-output multiplexers is utilized to perform the
shifting operation for one block column of matrix          . Fig. 8 shows the structure of
a data shifter for the matrix     . When the value of control signal      is 1, the shifting
network performs a single-step left cyclic-shift. If   is set to 0, the reverse cyclic-shift
is performed.

Hardware Requirement and Throughput Estimation:

          The hardware requirement of the decoder for the example LDPC code is
estimated except for the control block and parity check block. In Table I, the gate
count for computing blocks is provided. Each MUX stands for a 1-bit 2-to-1
multiplexer. Each XOR represents a 1-bit two input XOR logic unit. The register
requirement is estimated in Table II. In the two tables,       represent the word length

of each                message and              message, respectively. As analyzed in
Section IV-B, the critical path of the proposed decoder is three adders and four
multiplexers.

          In the decoder architecture presented in [6], each soft message is represented
as 4 bits. The critical path consists of an R-select unit, two adders, a CUN, a shifting
unit and a MUX. The computation path of a CNU has a 2’S-SM unit, a two-least-
minimum computation unit, an offset computation unit, an SM-to-2’S unit stage, and
an R-selector unit. The overall critical path is longer than 10 4-bit adders and 7
multiplexers. The post routing frequency is 100 MHz with 0.13-m CMOS technology.
Because the critical path of the proposed decoder architecture is about one-third of the

                                               54
architecture presented in [6], using 4-bit for each soft message, the clock speed for the
proposed decoder architecture is estimated to be 250 MHz with the same CMOS
technology. In a decoding iteration, the required number of clock cycles is 12. To
finish a decoding process of 15 iterations, we need                   183 clock cycles.
Among them, one cycle is needed for initialization and two cycles are due to pipeline
latency. Thus, the throughput of the layered decoding architecture is at least

                            4.7 Gb/s. Because a real design using the proposed
architecture has not been completed, we can only provide a rough comparison with
other designs.

       Lin et al. designed an LDPC decoder for a (1200, 720) code. The decoder
achieves 3.33 Gb/s throughput with 8 iterations. Sha et al.proposed a 1.8 Gb/s decoder
with 20 iterations. The decoder is targeted for a (8192, 7168) LDPC code. The
decoding throughput of the both decoders is less than the proposed architecture with
15 iterations. Gunnam et al. Presented an LDPC decoder architecture for (2082,
1041) array LDPC codes. With 15 iterations, it can achieve 4.6 Gb/s decoding
throughput. The number of CNUs and VNUs are 347 and 2082, respectively




                              Figure 7: Structure of a data shifter




                     Table 5.1 Gate Count Estimation Computing Blocks
                                               55
                           Table 5.2 Storage Requirement Estimate

       It can be seen from Table I that less than half computing units are needed in
our pipelined architecture. The register requirement in our design is more than that in
because an LDPC code with a larger block length for a better decoding performance is
considered in our design. The two pipeline stages in CNU array also require
additional registers. The design in is only suitable for array LDPC codes. But the
proposed decoder architecture is for generic QC-LDPC codes. We would like to
mention that the proposed architecture is scalable. For example, the LDPC code
considered in this paper can be partitioned into 8, 12, or 18 layers for different trade-
offs between hardware cost and decoding throughput.

PLDA Architecture:

       The architecture of the PDLA based LDPC decoder is shown in Fig.8 As
described in Section III-A, the APP messages, instead of the CTV messages are
passed among different layers. Therefore, for each sub-matrix, two memory blocks
are needed – one to store the APP messages and another to store the CTV messages.
The memory blocks are dual port RAMs, because at every clock cycle, the decoder
must not only fetch messages to facilitate the variable nodes and check nodes
processing, but also receive messages from other connected layers. VNU performs the
variable nodes processing to calculate the VTC messages using data from the APP
memories and CTV message memories. CNU performs check nodes processing to
calculate the new CTV messages. Then, these newly updated CTV messages are
stored back to the same locations in the CTV memories while the updated APP
messages are passed to APP memories in other connected layers. The architecture of
VNU is a simple adder to update the VTC message by subtracting the CTV value
from the APP message. Several VNUs in the same row operate in parallel so that the
newly calculated VTC messages can be used immediately to complete the horizontal
processing. The architecture of the CNU is to perform the check node processing as in
MSA. Each CNU contains 6 number or 7 number comparators which can find the
                                      56
minimum and second minimum values. Here we adopt the comparator. Then, the
absolute value of the VTC message is compared with the minimum value from the
comparator. If the absolute value of the VTC message is larger than the minimum
value, the compare and select unit outputs the minimum value; otherwise it outputs
the second minimum value. The dedicated message passing paths among different
layers are pre-determined by the modified permutation values as described in Section
III-A. Since the message     passing paths are fixed, only static wiring connection are
necessary to connect the APP memories among different layers, instead of using a l ×
l switching network. Therefore, the area and power consumption of the message
passing network in PLDA becomes minimal. In addition, mode switching in this
decoder becomes much easier. The depth of the APP memories and CTV memories
are designed to be 96 which can completely fulfill the maximum size of the sub-
matrices. Then, different modes can be set by adjusting the operating period to the
actual size of the sub matrix.




                       Figure 8: Parallel layered decoding Architecture




       To evaluate the performance of our proposed PLDA, a rate-1/2 WiMax LDPC
decoder with 19 different modes is synthesized and implemented on both FPGA and
ASIC platforms. Implemented on Xilinx XC2VP30 device, the maximum frequency
is 66.4 MHz which corresponding to the decoding throughput of 160 Mbps with a
maximum of 10 iterations. The same architecture is implemented and synthesized
using TSMC 90nm ASIC technology. Totally 152 pieces of dual port RAM, each size
of 96 × 6 bits, are used to store the APP messages and CTV messages. The
                                              57
synthesized decoder can achieve a maximum throughput of 492 Mbps for 10decoding
iterations. PLDA only needs (l + 1) × Iter clock cycles for the convergence of the
decoding process and l is the size of the sub-matrix. This number rises tol ×(4 × Iter +
1) and l×(5 × Iter)+12, respectively. Hence, under the same number of iterations,
PLDA could reduce the decoding latency by approximately 75%. The core area of the
decoder is 2.1mm2 and the power consumption is 61 mW at the maximum frequency
of 204 MHz Fig 5.5 shows the layout of decoder. Dual-port memories are generated
by Synopsys Design Ware tool and thus flattened during synthesis and place and
route.

         Table 5.3 shows the decoder implementation results compared with other
LDPC decoders for WiMax presented in the literature. Our proposed decoder can
achieve significantly higher decoding throughput with comparable chip area and
power dissipation. The energy efficiency is measured by the energy required to
decode per bit per iteration (pJ/Bit/Iter). As shown

         in Table I, the energy efficient of the proposed decoder is improved by about
an order of magnitude comparing to other existing designs.




            Table 5.3: Overall Comparisons between Proposed Decoder And Other
                         Existing LDPC Decoders For WiMax




                                               58
                        Figure 9: Layout of the proposed decoder chip




5.1.3 Generic Implementation of an LDPC Decoder about Genericity:

       The main specification of the LDPC decoder that will be described in this
chapter is the genericity. The meaning of genericity is double:

      The decoder should be generic in the sense that it should be able to decode any
       LDPC code, providing that they have the same size as the one fixed by the
       parameters of the architecture. It means that any distribution degree of the
       variable and check nodes should be allowed, given N,M and P.
      The decoder should also be generic in the sense that a lot of parameters and
       components could be modified: for example the parity-check matrix size, the
       decoding algorithm (BP,       −min, BP-based), the dynamic range and the
       number of bits used for the fixed-point coding. But the modifications of these
       parameters should require a new hardware synthesis.

       Both of these goals are very challenging since LDPC decoders have been so
far always designed for a particular class of parity-check matrix. Moreover, genericity
for architecture description increases the complexity level. Our motivation to design
such a decoder is the possibility to run simulations much faster on a FPGA than on a
PC. Simulations give, at the end, the final judgment about the comparison of the
codes and of their associated architecture.
                                              59
5.2 Code Structure and Decoding Algorithms:

       Code Structure of WiMax. The IEEE 802.16e standard for Wi Max systems
uses irregular LDPC code as the error-correction code because of its competitive bit
error performance compared with regular LDPC codes. It is the quasi-cyclic LDPC
code whose parity check matrix can be decomposed into several sub-matrices and
each one is either an identity matrix or its transformation. An Mb × Nb base parity
check matrix of rate-1/2 WiMax codes is defined where Mb is 12 and Nb is 24. Parity
check matrix H is generated by expanding blank entries as l × l zero matrix and non-
blank entries as a l × l circular right shifted identity matrix. As there are 19 different
modes with the sub-matrix size l ranging from 24 to 96, the shifted value of each sub-
matrix can be expressed as:




                                                                         (5.1)

       where p(i, j) is the permutation value when the size of sub matrix

5.2.1 Min-Sum Decoding Algorithm:

       Before presenting the MSA, we first make some definitions as follows: Let cn
denote the n-th bit of a codeword and yn denote the corresponding received value from
the channel.

       Let rmn[k] be the check-to-variable (CTV) and qmn[k] the variable-to-check
(VTC) message between check node m and variable node n at the k-th iteration. Let N
(m) denote the set of variables that participate in check m and M (n) denote the set of
checks that participate in variable n. The set N (m) without variable n is denoted as N
(m) \ n and the set M (n) without check m is denoted as M (n) \ m. Detailed steps of
MSA are described below:

1. Initialization:

       Under the assumption of equal priori probability, compute the channel
probability pn (intrinsic information) of the variable node n, by




                                               60
                                                           (5.2)

       The CTV message rmn is set to be zero.

2. Iterative Decoding:

At the k-th iteration, for the variable node n, calculate VTC message qmn [k] by




                                                                         (5.3)

Meanwhile, the decoder can make a hard decision by calculating the APP (a-posterior
probability) by




                                                                      (5.4)




       Decide the n-th bit of the decoded codeword xn = 0 if n > 0 and x n = 1
otherwise. The decoding process terminates when the entire codeword x =[x1, x2, · · ·
· · · xN] satisfy all M parity check equations Hx = 0, or the preset maximum number of
iterations is reached. If the decoding process does not stop, then, calculate the CTV
message rmn for check node m, by




                                                                           (5.5)




                                              61
       Here, a normalized factor is introduced to compensate for the performance
loss in the min-sum algorithm compared to standard BP algorithm. In this paper, is set
to be 0.75.

5.2.2 Layered Decoding Algorithm:

       In BP algorithm and MSA, CTV messages are updated during horizontal step
using VTC messages received from previous iteration. During the vertical step, all
VTC messages are updated by the newly obtained CTV messages from the current
iteration. In other words, these two decoding steps execute iteratively with no
overlapped period between them. LDA enables check node updating process to to be
finished by each individual layer. Therefore, VTC messages can be updated using
CTV messages from the current iteration instead of using old values from the
previous iteration. The selected H base matrix in WiMax is well suited for horizontal
LDA implementation, as it can be decomposed into 12 rows and each row can be
treated as a horizontal layer. Different layers have some vertically overlapped
positions and APP messages instead of CTV messages; can be passed from upper
layer to lower layer at these positions within the same iteration. Recent work by E.
Sharon has theoretically proved that such layered decoding algorithm, either
horizontal or vertical, doubles the convergence speed in comparison with the BP
algorithm.




                                             62
                                 CHAPTER-VI

       Xilinx ISE Tool for Synthesis and Place and Route

6.1 Xilinx ISE Tool Flow:

       The Integrated Software Environment (ISE) is the Xilinx design software
suite that allows you to take the design from design entry through Xilinx device
programming. The ISE Project Navigator manages and processes the design through
the following steps in the ISE design flow.

Design Entry:

       Design entry is the first step in the ISE design flow. During design entry, you
create the source files based on the design objectives. You can create the top-level
design file using a Hardware Description Language (HDL), such as VHDL, Verilog,
or ABEL, or using a schematic. You can use multiple formats for the lower-level
source files in the design. If work starts with a synthesized EDIF or NGC/NGO file,
design entry and synthesis steps can be skipped and start with the implementation
process.

Synthesis:

       After design entry and optional simulation, you run synthesis. During this
step, VHDL, Verilog, or mixed language designs become netlist files that are
accepted as input to the implementation step.

Implementation:

       After synthesis, you run design implementation, which converts the logical
design into a physical file format that can be downloaded to the selected target
device. From Project Navigator, you can run the implementation process in one step,
or you can run each of the implementation processes separately. Implementation
processes vary depending on whether you are targeting a Field Programmable Gate
Array (FPGA) or a Complex Programmable Logic Device (CPLD).

Verification:


                                                63
       You can verify the functionality of the design at several points in the design
flow. You can use simulator software to verify the functionality and timing of the
design or a portion of the design. The simulator interprets VHDL or Verilog code
into circuit functionality and displays logical results of the described HDL to
determine correct circuit operation. Simulation allows you to create and verify
complex functions in a relatively small amount of time. You can also run in-circuit
verification after programming the device.

Device Configuration:

       After generating a programming file, you configure the device. During
configuration, you generate configuration files and download the programming files
from a host computer to a Xilinx device. This project is uses Xilinx ISE tool for
synthesis and FPGA implementation of the 2D-DCT code processor. The device
chosen is Spartan3 XC3S400.

Synthesis using Xilinx can be done by following steps:

   1) Now create a new project and select device Spartan3 and then XC3S400

   2) Now add source code.

   3) Go to Implementation

   4) Synthesis XST

   5) Now to change to Behavioral simulation

   6) Run the source code

       Figure below shows the ISE tool flow.




                                             64
Figure 7: ISE Simulation Flow




         65
                          CHAPTER VII

                           RESULTS

7.1 Simulation Results:




                                66
7.2 Schematic:




                 67
Internal Schematic:




                      68
69
70
                                CHAPTER VIII

                               APPLICATIONS

       In 2003, an LDPC code beat six turbo codes to become the error correcting
code in the new DVB-S2 standard for the satellite transmission of digital television.

       In 2008, LDPC beat convolutional turbo codes as the FEC scheme for the
ITU-T G.hn standard. G.hn chose LDPC over turbo codes because of its lower
decoding complexity (especially when operating at data rates close to 1 Gbit/s) and
because the proposed turbo codes exhibited a significant error floor at the desired
range of operation. LDPC is also used for 10GBase-T Ethernet, which sends data at
10 gigabits per second over twisted-pair cables.




                                              71
                                 CHAPTER IX

                                CONCLUSION

       In this project, a high-throughput low-complexity decoder architecture for
generic LDPC codes has been presented. To enable pipelining technique for layered
decoding approach, an approximate layered decoding approach has been explored. It
has been estimated that the proposed decoder can achieve more than 4.7 Gb/s
decoding throughput at 15 iterations.

9.1 Future Scope:

       The IEEE 802.16m standard is the core technology for the proposed WiMAX
Release 2, which enables more efficient, faster, and more converged data
communications. The IEEE 802.16m standard has been submitted to the ITU for
IMT-Advanced standardization[26]. IEEE 802.16m is one of the major candidates for
IMT-Advanced technologies by ITU. Among many enhancements, IEEE 802.16m
systems can provide four times faster[clarification   needed]
                                                                data speed than the current
WiMAX Release 1 based on IEEE 802.16e technology.

       WiMAX Release 2 will provide strong backward compatibility with Release 1
solutions. It will allow current WiMAX operators to migrate their Release 1 solutions
to Release 2 by upgrading channel cards or software of their systems. Also, the
subscribers who use currently available WiMAX devices can communicate with new
WiMAX Release 2 systems without difficulty.

       It is anticipated that in a practical deployment, using 4X2 MIMO in the urban
microcell scenario with only a single 20-MHz TDD channel available system wide,
the 802.16m system can support both 120 Mbit/s downlink and 60 Mbit/s uplink per
site simultaneously. It is expected that the WiMAX Release 2 will be available
commercially in the 2011-2012 timeframe.




                                               72
                                  REFERENCES

[1]. R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory,vol.
IT-8, pp. 21–28, Jan. 1962.

[2]. A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, rate-1/2 low-
density      parity check code decoder,” IEEE J. Solid-State Circuits, vol. 37, no. 3, pp.
404–412, Mar. 2002.

[3] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Multi-Gbit/sec low density
parity check decoders with reduced interconnect complexity,” in Proc. ISCAS, May
2005, vol. 5, pp. 5194–5197.

[4] C. Lin, K. Lin, H. Chang, and C. Lee, “A 3.33 Gb/s (1200, 720) lowdensity parity
check code decoder,” in Proc. ESSCIRC, Sep. 2005, pp. 211–214.

[5] J. Sha, M. Gao, Z. Zhang, L. Li, and Z. Wang, “Efficient decoder implementation
for QC-LDPC codes,” in Proc. ICCCAS, Jun. 2006, vol. 4, pp. 2498–2502.

[6] K. K. Gunnam, G. S. Choi, and M. B. Yeary, “A parallel VLSI architecture for
layered decoding for array LDPC codes,” in Proc. VLSID, Jan. 2007, pp. 738–73.

[7] E. Sharon, S. Litsyn, and J. Goldberger, “An efficient message-passing schedule
for LDPC decoding,” in Proc. 23rd IEEE Convention Elect. Electron. Eng. Israel,
Sep. 2004, pp. 223–226.

[8] D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding
of LDPC codes,” in Proc. IEEE Workshop Signal Process. Syst., 2004, pp. 107–112.

[9] L. Chen, J. Xun, I. Djurdjevic, and S. Lin, “Near shannon limit quasicyclic low-
density parity-check codes,” IEEE Trans. Commun., vol. 52, no. 7, pp. 1038–1042,
Jul. 2004.

[10] Y. Zhang, W. E. Ryan, and Y. Li, “Structured eIRA codes with low floors,” in
Proc. Int. Symp. Inf. Theory, Sep. 2005, pp. 174–178.

[11] Z.Wang and Z. Cui, “A memory efficient partially parallel decoder architecture
for QC-LDPC codes,” in Proc. 39th Asilomar Conf. Signals, Syst. Comput., 2005, pp.
729–733.
                                                73
[12] H. Xiao-Yu, E. Eleftheriou, and D. M. Arnold, “Regular and irregular
progressive edge-growth tanner graphs,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp.
386–398, Jan. 2005.

[13] J. Zhang and M. P. C. Fossorier, “Shuffled iterative decoding,” IEEE Trans.
Commun., vol. 53, no. 2, pp. 209–213, Feb. 2005.

[14]. Ardakani, M., and F.R. Kschischang. July 2002. “Designing irregular LPDC
codes using EXIT charts based on message error rate.” Proceedings of the IEEE
International Symposium on Information Theory.
[15]. Ardakani, Masoud, Terence H. Chan, and Frank R. Kschischang. May 2003.
“Properties of the EXIT Chart for One-Dimensional LDPC Decoding Schemes.”
Proceedings of CWIT.
[16]. Bahl, L.R., J Cocke, F. Jelinek, and J. Raviv. March 1974. “Optimal Decoding
of   Linear Codes for Minimizing Symbol Error Rate.” IEEE Transactions on
Information

Theory 20:284–287.

[17]. Barry, J.R. oct. 2001. Low-Density Parity-Check Codes. Available at
http://www.ece.gatech.edu/~barry/6606/handsout/ldpc.pdf.
[18].Battail, G., and A. H. M. El-Sherbini. 1982. “Coding for Radio Channels.”
Annales des.
[19]. T´el´ecommunications 37:75–96.




                                            74

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:3/30/2013
language:Unknown
pages:74