Document Sample
WiMAX-TMC-Final Powered By Docstoc

    Performance Characteristics of an Operational
                 WiMAX Network
                    James M. Westall, Affiliate, IEEE CS and James J. Martin, Member, IEEE

     Abstract—The term WiMAX is used to refer to a collection of standards, products, and service offerings derived from the IEEE 802.16
     family of standards for wireless networks. These standards define physical and MAC layer elements that ensure interoperability of
     compatible equipment. However, the standards leave both the details of the packet scheduling algorithms and the values of performance
     related configuration parameters to the discretion of the equipment vendor or network operator. These algorithms and parameters
     ultimately determine fundamental performance characteristics such as round-trip latency and sustainable throughput on the network.
     In this paper we examine performance characteristics of an operational WiMAX testbed upon which we were able to conduct controlled
     experiments in the absence of competing traffic. We characterize latency, throughput, protocol overhead, and the impact of WiMAX on
     TCP dynamics. We show that scheduling policies and parameter values impact actual performance in ways that are not possible to
     characterize in generic studies of WiMAX.

     Index Terms—WiMAX, IEEE 802.16, wireless, performance.


1    I NTRODUCTION                                                        to conduct such studies but have been generally reluc-
The term WiMAX, an acronym for Worldwide Interop-                         tant to publish detailed results. Therefore, virtually all
erability for Microwave Access, is commonly used to                       published studies of the performance characteristics of
refer to a collection of standards, products, and ser-                    WiMAX systems have been derived from simulation or
vice offerings derived from the IEEE 802.16 family of                     analytic models.
standards [1], [2]. The IEEE standards include many
implementation options that are left to the discretion                    1.1 Factors underlying WiMAX performance
of equipment vendors or network operators. Conflicting
design choices can make interoperation of the equipment                   The WiMAX standards define a large collection of op-
of multiple vendors problematic. The WiMAX forum                          erational parameters whose values can have a profound
was organized by equipment vendors in 2001 to define                       impact on the performance delivered to the application
operational profiles, certify interoperability, and promote                layer. The values of these parameters must be chosen by
the use of the technology. A discussion of the roles of the               the equipment vendor or by the network administrator.
IEEE and the WiMAX forum in the development of the                        The standards also do not specify the details of the traffic
standards and profiles can be found in [3].                                scheduling algorithms to be used, and so these must also
   The equipment described in this paper is compli-                       be designed by the equipment vendor.
ant with the IEEE 802.16-2004 standard [1] which is                          Although some performance bounds may be inferred
sometimes called IEEE 802.16d or “fixed WiMAX” be-                         from the standards, it is not possible make precise
cause it does not support seamless handoff for mobile                     characterizations of the performance delivered to end
clients. The subsequent amendment, IEEE 802.16e-2005                      systems by a “generic” WiMAX network. Performance
[2], sometimes called 802.16e, added support for mobile                   characteristics reported by vendors tend to reflect best
clients. Perspectives on the evolution of WiMAX may be                    case scenarios at the physical layer. Performance charac-
found in [4], [5], [6]. A very thorough discussion of the                 teristics reported by simulation or analytic models reflect
WiMAX physical layer is provided in [4]. A discussion of                  explicit or implicit assumptions made in constructing the
WiMAX as it relates to alternative wireless technologies                  models.
is found in [7].                                                             The objective of this paper is to augment the results
   At present WiMAX usage is not widespread when                          obtained in simulation studies of hypothetical equip-
compared to competing access network technologies                         ment with measured results obtained from an opera-
such as cable and DSL. Furthermore, it is very difficult                   tional WiMAX testbed. We describe an approach for
to conduct controlled experiments measuring best case                     analyzing elements of the performance of an operational
throughput and latency on an operational public net-                      WiMAX network. We show that it is possible to infer
work. Equipment vendors obviously have the capability                     some characteristics of the underlying scheduling algo-
                                                                          rithms, and we demonstrate the impact of the choice
• The authors are with the School of Computing, Clemson University,       of operational parameters on system overhead. We also
  Clemson, SC 29634.                                                      show that some elements of the observed performance
  E-mail: see
                                                                          are so intricately tied to details of the proprietary im-

plementation that it is not possible either to predict or                         TABLE 1
explain them.                                                                  OFDM Timing Data
   Our primary focus is upon measuring the impact                          Parameter               Value
of MAC layer parameterization and the vendor pro-                          Bandwidth            5.00 × 10+6
vided scheduling system on the latency, throughput, and                    Nf f t               2.56 × 10+2
TCP dynamics experienced by an end system on an                            Sampling frequency   5.76 × 10+6
                                                                           Subcarrier spacing   2.25 × 10+4
uncongested WiMAX network. To maintain this focus,                         Useful symbol time   4.44 × 10−5
we do not address the impact of factors such as radio                      Cyclic prefix time    5.55 × 10−6
signal propagation or improved scheduling algorithms                       OFDM symbol time     5.00 × 10−5
                                                                           Frame time           1.00 × 10−2
on overall system performance.                                             Symbols / frame      2.00 × 10+2

                                                             equipment is summarized in the terminology of p. 428
The testbed used in this study is deployed on the campus
                                                             of the standard in Table 1.
of Clemson University. The network operates in the 4.9
                                                                The sampling frequency is computed as ⌊n ×
gigahertz (GHz) public safety band which is comprised
                                                             Bandwidth/8000⌋∗8000 where n = 144/125 for a channel
of ten channels of five MHz each spanning 50 MHz of
                                                             whose bandwidth is an even multiple of 1.25 MHz. The
spectrum between 4940 and 4990 MHz. Base and sub-
                                                             subcarrier spacing is the sampling frequency divided
scriber stations operating in this spectrum are limited to
                                                             by the total number of subchannels (FFTs). The useful
no more than 27 dBm of transmitter power and no more
                                                             symbol time is the inverse of the subcarrier spacing. The
than 40 dBm of effective isotropic radiated power. Al-
                                                             cyclic prefix time is the useful symbol time divided by
though a WiMAX Forum profile for 4.9 GHz has not yet
                                                             8, and the OFDM symbol time is the sum of the useful
been defined, WiMAX equipment vendors have agreed
                                                             symbol time and the cyclic prefix time.
on a set of operating parameters allowing interoperabil-
                                                                The data carrying capacity of each symbol is a function
ity. These parameters are consistent with the 802.16-2004
                                                             of the number of data carrying subchannels, the modu-
standard and are used in equipment currently offered by
                                                             lation technique, and the number of bits reserved for
Airspan, M/A-COM (whose wireless products division
                                                             forward error correction. Values supported by 802.16-
was subsequently acquired by the Harris Corp), and
                                                             2004 are shown in Table 2. The column labeled bits
Nortel. The specific equipment used in the study in-
                                                             per sample shows the aggregate number of data and
cludes a M/A-COM VIDA Broadband MAVM-VMXBD
                                                             FEC bits that can be encoded on a single channel using
hardened base station, M/A-COM VIDA Broadband
                                                             the given modulation scheme during a single OFDM
MAVM-VMCLL subscriber stations, and Airspan EasyST
                                                             symbol time. The number of data bits per symbol is
subscriber stations. In the remainder of this section we
                                                             obtained by multiplying bits per sample by the number
review aspects of the 802.16-2004 standard that pertain
                                                             of data channels (192) and the coding rate (fraction of
to this equipment and this study.
                                                             bits representing data) shown in the leftmost column.
                                                             The number of symbols per network layer protocol data
2.1 The Physical Layer                                       unit (NPDU) is 1500 divided by the number of bytes per
The 802.16-2004 standard defines single carrier (SC), or-     symbol.
thogonal frequency division multiplexing (OFDM), and
orthogonal frequency division multiple access (OFDMA)                               TABLE 2
modes of operation at the physical layer. The M/A-COM                            Symbol Capacity
equipment implements only OFDM operation on a 5                                Bits per   Data Bits   Kbps    Syms per
MHz channel.                                                      Modulation   Sample     per Sym               NPDU
  Operational parameters that bound the capacity of an            BPSK 1/2            1         96     960         125
                                                                  QPSK 1/2            2        192    1920        62.5
OFDM WiMAX physical layer include:                                QPSK 3/4            2        288    2822       41.67
  • channel bandwidth,                                            16-QAM 1/2          4        384    3840       31.25
  • number of data-carrying subchannels,
                                                                  16-QAM 3/4          4        576    5760       20.83
                                                                  64-QAM 2/3          6        768    7680       15.63
  • modulation and forward error correction (FEC) tech-           64-QAM 3/4          6        864    8640       13.89
     nique and,
  • duplexing mode (time or frequency division duplex-          The modulation technique may change dynamically in
     ing).                                                   response to signal quality. In the M/A-COM equipment,
  The 5 MHz bandwidth is partitioned into 256 subchan-       modulation changes are triggered by changes in the
nels as specified in the standard: eight pilot channels are   carrier to interference and noise ratio (CINR). CINR
used in physical layer synchronization; 55 channels are      levels needed to trigger change are configurable by the
used as guard bands; and 192 channels carry data. A null     system administrator. A dual level triggering mechanism
carrier is transmitted on the remaining center frequency     is used to prevent “modulation flapping.”
channel. The OFDM timing data used by the M/A-COM               The standard defines an optional MAC level ARQ

mechanism designed to provide fast recovery from phys-           The precise number of management-related overhead
ical layer errors not corrected by the FEC. ARQ is not        symbols per frame is not directly specified in the stan-
implemented in our M/A-COM equipment.                         dard because it depends upon both configuration (e.g.,
   The M/A-COM equipment employs time division du-            how often the DCD and UCD are sent) and provisioning
plexing (TDD). A single transmit frequency is used, and       (e.g., the number of active RTPS flows). Nevertheless, if
the equipment rapidly alternates between transmit and         the maximum network layer throughput is known, then
receive modes. The standard supports seven different          the average number of overhead symbols per subframe
frame durations ranging from 2.5 to 20 ms. Smaller            may be inferred as described later in this paper.
frame durations provide shorter round-trip latency but a
larger fraction of the frame is then required for overhead.
Frame time in the M/A-COM testbed is 10 ms which              2.2 WiMAX provisioning and scheduling
yields 200 physical layer symbols per frame. A frame          2.2.1 Service flows
is comprised of a downlink subframe in which the base         Unlike WiFi networks, WiMAX networks support very
station transmits and the subscribers receive followed by     fine grained control over provisioning of network traf-
an uplink subframe in which the reverse is true. Each         fic flows. An individual flow is a unidirectional entity
subframe may be further subdivided into transmission          referred to as a servicef low. A three-phase model for
bursts with modulation and/or coding rate changing dy-        activation of a service flow is described on p. 223 of
namically from burst to burst within a single subframe.       the standard. Nevertheless, three-phase activation is not
Relative lengths of uplink and downlink subframes are         required, and M/A-COM’s implementation does not
configurable. We employed a nominal 50/50 split but            support dynamic service activation at all.
later discovered that downlink MAC overhead was sub-
                                                                 QoS attributes that may be assigned to service flows
stantially larger than on the uplink, and the resulting
                                                              are identified on p. 695 of the standard. They include
split at the network layer was actually closer to 44/56.
                                                              traffic priority, maximum sustained traffic rate, maxi-
   The upper bound on physical layer throughput in
                                                              mum traffic burst, minimum reserved traffic rate, min-
Kbps is shown in Table 2. It assumes a 50/50 downlink-
                                                              imum tolerable traffic rate, maximum latency (delay
uplink split. Throughput values are obtained by multi-
                                                              within subscriber or base station from the time a packet
plying bytes per symbol by 100 symbols per subframe
                                                              is received on the wire interface until it is transmitted
by 100 frames per second. Because of PHY and MAC
                                                              on the RF interface), and tolerated jitter.
overhead, actual network layer throughput is consid-
erably smaller. Sources of symbol consuming overhead
at both the PHY and MAC layers are identified on               2.2.2 Scheduling
p. 449 of the standard. Transmit-receive and receive-         Traffic scheduling in a WiMAX network is similar to
transmit transition gaps (TTG, RTG) are required be-          scheduling in a DOCSIS cable network. Allocation of
tween subframes. The downlink subframe begins with            transmission opportunities for both downlink and up-
a two-symbol long preamble, and the uplink subframe           link traffic flows is vested in the base station. The
employs a one-symbol short preamble. The downlink             DL-MAP and UL-MAP data structures, which contain
MAC layer data begins with the frame control header           starting offset and encoding of each burst, provide the
(FCH) which is always transmitted using BPSK 1/2.             mechanism by which the results of the scheduling poli-
It contains the downlink frame prefix (DLFP) which             cies are made known.
contains the burst profiles of up to the first three bursts        The standard does distinguish four distinct scheduling
of the downlink subframe. In OFDM systems, a burst            types that pertain to the allocation of uplink capacity.
profile contains the starting offset measured in symbols       For unsolicited grant service (UGS), periodic grants of
and the modulation technique used in the burst.               sufficient capacity to carry the provisioned bit rate are
   The first burst following the FCH is sometimes called       conveyed in the UL-MAP to each subscriber station
the broadcast burst. It is transmitted using the least        provisioned with a UGS flow. In real-time polling service
robust modulation technique that all subscribers are          (RTPS) and non-real-time polling service (nRTPS) flows,
thought to be able to presently decode. This burst always     the UL-MAP periodically identifies dedicated slots in
contains the uplink MAP (UL-MAP) which describes              which the subscriber station can make contention-free
the allocation of uplink symbols. In the M/A-COM              requests for uplink capacity. Contention slots, also iden-
implementation the UL-MAP in frame n describes the            tified in the UL-MAP, may be used by subscriber stations
uplink symbol allocation in frame n + 1. In the default       to request capacity for both best effort (BE) and nRTPS
configuration the M/A-COM implementation always in-            flows. Contention requests can be destroyed by collisions
cludes downlink and uplink channel descriptors (DCD,          among competing subscriber stations. Collisions trigger
UCD) and a downlink map (DL-MAP) in the broadcast             a binary exponential backoff. As in DOCSIS, contention
burst, but the frequency at which the DCD and UCD             is minimized by permitting a subscriber station with
are sent is configurable. If a downlink subframe contains      backlogged best effort traffic to “piggyback” a request
more than three bursts, the DL-MAP carries their burst        for additional capacity onto the packet currently being
profiles.                                                      transmitted. Two additional MAC layer capabilities also

found in DOCSIS facilitate scheduling and reduce over-                              wimaxgw          wimax01
head. F ragmentation allows a large NPDU to be broken
up and carried in multiple MAC layer PDUs. Fragmen-
tation can be used to ensure that available capacity in a                            Gbit
frame is not wasted because it is not sufficient to hold
a full NPDU. Concatenation allows a single MAC layer
PDU to carry multiple small NPDUs thus reducing MAC
layer overhead. The M/A-COM equipment supports                                       Station

both fragmentation and concatenation.                                    Downlink
   Although service flows, QoS attributes, and schedul-                                      Link
ing mechanisms are well-defined by the standard,                                                 Uplink
scheduling policies are not defined. The standard does                               Subscriber
not specify how the QoS attributes are to be incorpo-
rated into the underlying scheduling algorithms, nor
does it specify any required behavior in the event of
overcommitment of resources. Because the scheduling                                  Gbit
policies are not defined, it is possible to infer from the
standards or published research on WiMAX only very
coarse bounds on the actual performance of a particular
WiMAX implementation.
   M/A-COM’s scheduling algorithms are proprietary
and were not revealed to us. It will be shown subse-         Fig. 1. WiMAX network configuration
quently in this paper that these algorithms produced be-
havior that ranged from the expected to the unexplained
to the truly anomalous.                                      commercial WiMAX network as ranging between 55 and
                                                             66 ms.
2.3 Provisioning M/A-COM equipment                              These values were presumably observed for best effort
                                                             traffic and are reflective of what might realistically be
The M/A-COM base station employs an Intel IXP2350
                                                             expected at the end systems with a 10 ms frame time
network processor and runs an ADEOS (Adaptive Do-
                                                             in use. Nevertheless, we found that because of the
main Environment for Operating Systems) real-time
                                                             underlying scheduling policies, the round-trip latency
variant of the 2.4.20 Linux kernel. Provisioning the M/A-
                                                             experienced by best effort traffic on our network was
COM equipment is accomplished using M/A-COM’s
                                                             considerably longer.
web-based Unified Administration System (UAS) which
                                                                In the remainder of this section we characterize the
runs on an auxiliary Linux computer. The provisioning
                                                             latency experienced by unsolicited grant service (UGS),
data is distributed to the WiMAX base station via SNMP.
                                                             real-time polling service (RTPS), and best effort (BE)
The base station then conveys provisioning information
                                                             service classes when ping type probe packets are sent
to the subscribers over the air link.
                                                             both uplink and downlink across the WiMAX network.
   Provisioning a network with UAS requires defining
and configuring a hierarchy of four entity types: base        We show how the measurements obtained can be used to
stations; subscriber stations; service flows; and classifier   understand aspects of the underlying packet scheduling
                                                             algorithms. Elements of the WiMAX network used in
rules. An instance of each entity type has a name and
attributes. Elements higher in the hierarchy bind to         the study are shown in Fig. 1. The systems named
entities in lower levels by using the name of the lower      wimaxgw and wimax24 are both multi-homed Linux hosts
                                                             in the School of Computing at Clemson University. In
level entity.
                                                             addition to the WiMAX network, these two hosts are also
                                                             connected to the School of Computing’s gigabit LAN.
3   L ATENCY                                                    While the tests were being conducted, three subscriber
Round-trip latency has been reported in a vendor po-         stations were provisioned and powered on. One of these,
sition paper1 as follows: “The average TDD latency           which was not involved in the test, was provisioned with
in a PMP system is about two frame times and the             a single RTPS flow. Thus the UL-MAP in each downlink
best case latency is about one frame time.” One can          subframe described a reservation slot in the next uplink
correctly infer from this that latency is directly tied to   subframe in which a bandwidth request could be made.
frame time, but, because of scheduling considerations,       However, during the measurement period no competing
latency experienced by end systems is invariably much        applications generated traffic.
larger. Another web-based source2 reports latency on a          Measurements were collected using a UDP client and
                                                             server pair. The client periodically sends a small probe
FDD TDD WiMAX Position Paper FINAL.pdf                       packet to the server located on the other side of the
  2. showthread.php?t=134468    WiMAX network. The probe packet is carried in an

                         BE                                   In the remainder of this section we use empirical cumu-
                       RTPS                                   lative distribution functions to explore the underlying
                 0.8                                          dynamics that produce these results.

                                                              3.1 Measuring one-way latency
                 0.4                                          Round-trip latency is easily measured to near microsec-
                                                              ond accuracy using the Linux gettimeofday() facility.
                 0.2                                          However, determining one-way latency is more chal-
                                                              lenging unless a global time source is available on both
                       30     40    50       60   70   80     client and server, and in this case it was not. Neverthe-
                                   Time in msec               less, the fact that client and server were connected both
                                                              by WiMAX and a gigabit LAN (not shown in figure 1)
Fig. 2. Uplink probe: Observed round-trip latency             make it possible to determine one-way latency to sub-
                                                              millisecond accuracy even in the face of clock skew and
                                                              clock drift.
NPDU of 56 bytes, small enough to be transmitted in              Suppose measured round-trip latency over the gigabit
a single WiMAX subframe regardless of the modulation          LAN is trg . Then the best estimate for one-way delay is
in use. Before sending the probe, the client stores a         teg = trg /2. Now suppose tmg is the measured one-way
sequence number and the local time in the packet. When        delay obtained by subtracting the timestamp stored in
the server receives the packet, it adds its own local         the probe packet by the client sender from the timestamp
timestamp to the packet before echoing the packet back        stored by the server. This is the actual one-way delay if
to the client. When the echoed probe is received by           and only if the client and server clocks are synchronized.
the client, the sequence number, the two timestamps,          If clock skew exists, then tmg is inaccurate by the amount
and the time at which the response was received are           of the clock skew, and thus the best estimate for present
logged to a file. Packet send times are controlled by the      clock skew is teg − tmg . Furthermore, this estimate is
pthread cond timedwait() facility, and the client may be      bounded by ±trg /2 because we know that the actual
configured to send randomly or synchronously.                  one-way time is greater than 0 and less than trg .
   The WiMAX network operates synchronously with                 It should be clear that if we were trying to determine
respect to its 10 ms frame time. Nevertheless, even when      the one-way delay over the gigabit LAN, this analysis is of
probes are issued synchronously with an interprobe time       no benefit. But since trg is on the order of 200 µs, we have
that is an integral multiple of 10 ms, round-trip times       an estimate of clock skew whose maximum error is on
experience a circular drift because the clocks of the         the order of ± 100 µs. Since both one-way and round-trip
WiMAX base station and the issuing host do not run            times exceed 10 ms on the WiMAX path, using teg − tmg
at precisely the same rate.                                   as the estimate of clock skew imparts an error of no more
   For this reason, it is easier to understand the underly-   than 1% to measurements made on the WiMAX path.
ing dynamics when the probe packets are issued with           Therefore, we estimate the one-way delay on the much
random interprobe times. Our interprobe times were            longer WiMAX path as t1w = tmw +(teg −tmg ) where tmw
uniformly distributed in [0.5, 1.5] seconds. With these       is the difference between the timestamps in the probe
random interprobe times, it is to be expected that ob-        that traversed the WiMAX network.
served round-trip times would be uniformly distributed           The above technique addresses the issue of instan-
across a time interval of 10 ms with the actual value         taneous clock skew, but it does not address the issue
of an individual observation being a function of the          of continuous clock drift. It was observed in the out-
offset of the generation time within the 10 ms frame.         put of several runs that the clock on wimaxgw lost 0.1
For UGS, RTPS, and BE provisioning, 1200 probes were          seconds of time relative to the clock on wimax24 in
sent, requiring approximately 20 minutes of real time for     1200 seconds. For the average one-way latency of 60
each of the three experiments. No loss of probe packets       ms experienced by the BE probe, the amount of drift
was observed.                                                 is thus 0.060 × 0.1/1200 = 5µs. Thus the clock drift
   Figure 2 shows the empirical cumulative distribution       within a single probe period is small relative to the known
of the measured round-trip times when 1200 probes were        error of 100 µs. Nevertheless, it is not possible to ignore
sent uplink from wimax24 to wimaxgw. For each service         drift over a 1200 probe run of 20 minutes, and so clock
class, the distribution is approximately uniform with         skew must be recomputed every probe period. This was
a spread of 10 ms as expected. The mean round-trip            accomplished by running concurrent probe processes on
latencies of 36.69 ms (UGS), 56.74 (RTPS), and 76.77 (BE)     both paths, using a common pseudo-random number
are offset by 20 ms.                                          seed to keep them synchronized, and recomputing clock
   Being accustomed to seeing WiFi round-trip latencies       skew for every probe.
consistently less than 5 ms, we were surprised to observe        Using the above technique for computing outgoing
such large values on an otherwise idle WiMAX network.         latency, return path latency is computed by subtracting

outgoing from round-trip. Round-trip and one-way la-                                                      1
tencies for each of the six experiments are shown in Table                                                          RTPS
3. The following characteristic behaviors may be noted in                                                0.8

the table. Downlink latency is independent of scheduling

type. Uplink latency increased by 20 ms from UGS to                                                      0.6

RTPS and from RTPS to BE. Uplink latency is longer than
downlink latency for all scheduling types regardless of
whether the client was uplink or downlink. Round-trip
latency was increased by approximately 4 ms when the
client was sending downlink. The scheduling dynamics
that produce this behavior are described below.                                                                12             14      16         18            20   22
                                                                                                                                      Time in msec
                                       TABLE 3
                                Observed Probe Latency                             Fig. 4. Uplink probe: Observed return latency

                                     UGS            RTPS              BE
                 Uplink              µ     σ        µ      σ         µ      σ      3.2 WiMAX scheduling
                 Outgoing        19.86  2.91    39.81   2.85     59.87     2.88
                 Return          16.82  0.24    16.92   0.25     16.90     0.29    Having obtained repeatable measurements of outgoing,
                 Round-trip      36.69  2.93    56.74   2.85     76.77     2.89    return, and round-trip latency we sought to use them to
                 Downlink            µ     σ        µ      σ         µ      σ      infer the scheduling dynamics of the WiMAX network.
                 Outgoing        17.20  2.88    17.30   2.89     17.28     2.95
                 Return          23.04  0.23    43.04   0.32     63.01     0.29    We believe we were successful in doing so for both UGS
                 Round-trip      40.24  2.88    60.34   2.95     80.29     2.93    and RTPS scheduling types. Nevertheless, some details
                                                                                   of BE scheduling remain unclear. We first analyze the
   The empirical distributions of the one-way and round-                           behavior of the uplink probes in which the client was
trip times provide insight. The one-way distributions of                           located on wimax24.
the outgoing uplink probe times are shown in Fig. 3.
Note that the distributions have the expected uniform                              3.2.1 UGS probes
shape over a 10 ms interval with means of 19.86, 39.88,                            Recall that the uplink latency of the probes was uni-
59.87. Sample standard deviations of 2.91, 2.85, and 2.88                          formly distributed between 15 and 25 ms. The under-
are consistent with the 2.89 standard deviation of the                             lying dynamics of this process are explained by Fig.
uniform distribution with spread 10.                                               5. Each row of the table represents a single 10 ms
                                                                                   frame that is equally apportioned between downlink
                                BE                                                 (left) and uplink (right) components. The numerals in
                               UGS                                                 parentheses identify the approximate times that specific
                                                                                   events associated with the probe occur.
                                                                                      The probe packet is generated at some random time

                                                                                   in frame 0 ms with mean offset at 5 ms. It arrives for
                                                                                   transmission at the subscriber station in less than 1 ms
                                                                                   at time (1).
                   0.2                                                                    Time (ms)                      Downstream                 Upstream

                                                                                                    0                                      (1)
                         10      20     30        40        50     60         70
                                                                                                    10     (2)
                                             Time in msec

                                                                                                    20                                     (3) (4) (5)
Fig. 3. Uplink probe: Observed outgoing latency
   Unlike the uplink and round-trip distributions, one
                                                                                                               (6) (7)
would not expect to see a uniform distribution for the                                              40

return latency of uplink probes. The packets arrive at
the uplink host synchronized with the 10 ms frame time                             Fig. 5. Uplink probe: UGS timeline
of the WiMAX system, and thus the strongly modal
downlink distribution shown in Fig. 4 is to be expected.                              In the next frame at time (2), a grant is received in
Here the observed means are 16.92, 16.90, and 16.82 ms.                            the downlink MAP. Since there is no competing traffic,
   To confirm that the lengthy delays were indeed occur-                            the grant is located near the start of the next uplink sub-
ring within the WiMAX network, we ran ICMP pings                                   frame, and the probe is transmitted uplink and received
between wimaxgw and the base station and wimax24                                   by the base station (3), forwarded to wimaxgw (4), and
and the subscriber station. We obtained mean round-trip                            the response is received by the base station(5). Events (3)-
times of 322 µs and 775 µs respectively.                                           (5) all occur in a timespan of less than 1 ms. The reply

is finally received by the subscriber station at (6) and          RTPS scenario is not clear. In the best effort timeline,
forwarded to wimax24 at time (7) yielding the observed           point (2) represents receipt of an uplink MAP defining a
round-trip time of approximately 36.7 ms.                        contention opportunity for making a bandwidth request.
   Two aspects of the scheduling algorithms can be in-           Point (3) represents issuance of a contention request for
ferred from this data. The subscriber station is provi-          bandwidth. This will never experience a collision on this
sioned to receive a grant in each UL-MAP. Thus, in frame         otherwise idle network. Point (4) represents sending of a
0 ms, it was granted the right to send in frame 10 ms.           grant. Nevertheless, it is not possible to precisely identify
Nevertheless, it is apparent from the data that this grant       the frames in which events (3) and (4) take place, and
can be used only for data that arrived before the start of       consequently the specific cause of the additional 20 ms
the frame containing the grant. An analogous situation           delay is not clear.
can be observed on the downlink side. Even though
                                                                        Time (ms)                  Downstream                 Upstream
measurements indicate that the probe reply arrives at
the base station in frame 20 ms, the response is not                              0                                   (1)

transmitted until frame 40 ms.                                                    10 (2)

3.2.2        RTPS probes
Extension to RTPS is straightforward and illustrated in                           30                                  (3)

Fig. 6. Instead of receiving a grant at time (2), the sub-                        40 (4)
scriber station receives a poll granting it the opportunity
to make a bandwidth request which occurs at time (3)
in frame 20 ms. The grant is received at time (4) and                             60                                  (5)(6)(7)
the remainder of the exchange occurs as it does with
UGS scheduling and yields the observed round-trip time
of 57 ms. As in the UGS case, the subscriber station                              80    (8)(9)

also received a poll in frame 0ms, but the scheduling
algorithm did not permit its use in frame 10 ms.                 Fig. 7. Uplink probe: BE timeline
   Also note that the bandwidth request arrives in frame
20 ms and the grant immediately follows in frame 30
ms. Thus the downlink scheduling of grants differs from
the downlink scheduling of data packets. In both the             3.3 Downlink probes
UGS and RTPS cases, a data packet arriving at the base           The experiments were repeated with the client sending
station in the same relative location within the uplink          downlink from wimaxgw to wimax24. The data obtained
frame time is forced to wait a full frame time before            supports the inferences derived from the analysis of the
being transmitted.                                               uplink probes. The distribution of downlink component
                                                                 times is shown in Fig. 8. The outgoing latencies are uni-
    Time (ms)                Downstream               Upstream   formly distributed and independent of the scheduling
        0                                  (1)                   type. Return latencies, shown in Fig. 9 are now strongly
                                                                 modal and carry the characteristic 20 ms offsets that
        10     (2)
                                                                 propagate into the round-trip latencies shown in Fig. 10.
        20                                 (3)

        30   (4)                                                                                    BE
        40                                (5)(6)(7)                                    0.8



        60         (8) (9)

Fig. 6. Uplink probe: RTPS timeline

3.2.3        BE probes                                                                       10     12      14   16     18        20     22   24
                                                                                                                 Time in msec
Understanding of best effort dynamics is more difficult.
In Fig. 7 points (1),(5), (6), (7), (8), and (9) are based       Fig. 8. Downlink probe: Observed outgoing latency
upon observed one-way and round-trip latencies and
represent the same events as their counterparts in the             One anomalous aspect of the data is the unexpectedly
RTPS and UGS diagrams. However, the source of the                high mean downlink latency which ranged from 17.2
extra 20 ms delay in the uplink as compared to the               ms to 17.3 ms. This effect can also be observed to a

                              BE                                                 4   T HROUGHPUT       AND   TCP    DYNAMICS
                             UGS                                                 In this section we report upon throughput and TCP
                                                                                 dynamics. As with the latency tests, these tests were con-
                                                                                 ducted with three active subscriber stations. One of these

                                                                                 was provisioned with a single RTPS flow. Thus the UL-
                                                                                 MAP in each downlink subframe described a reservation
                                                                                 slot in the next uplink subframe. However, no data was
                                                                                 actually transferred on this flow. The subscriber station
                                                                                 that carried the throughput tests was provisioned with
                                                                                 a single best effort flow.
                             20        30          40        50        60           Our objectives were to develop robust measures of
                                             Time in msec
                                                                                 the number of symbols per uplink and downlink frame
Fig. 9. Downlink Probe: Observed return latency                                  that were allocated to system overhead and to identify
                                                                                 any adverse impact on TCP dynamics created by the
                                                                                 inherently bursty TDD system. It was found that the
                                                                                 scheduling algorithms on both the downlink and uplink
                              BE                                                 properly constrained maximum throughput to precisely
                             UGS                                                 the provisioned amount when that amount was less than
                                                                                 the capacity of the link. Therefore, the flows on which the
                                                                                 throughput tests were conducted were configured with

                                                                                 best effort scheduling and over-provisioned at 6 Mbps.
                                                                                 4.1 Throughput
                 0.2                                                             All but two of the TCP throughput tests were conducted
                                                                                 by serially running eight iperf transfers of 8 MB each
                  0                                                              between Linux hosts wimax24 and wimax01 shown in
                       30         40    50        60        70    80        90
                                             Time in msec                        Fig. 1. During the throughput tests, the tcpdump utility
                                                                                 captured port-filtered raw packet traces at both ends of
Fig. 10. Downlink probe: Observed round-trip latency                             the connection. The traces were subsequently analyzed
                                                                                 in a post-processing step.
                                                                                    The Airspan subscriber station that services wimax24
                                                                                 is located in a third floor office window. It has a clear
slightly lesser degree in the uplink probes. Given that                          line of sight to the base station which is affixed to the
the mean offset within frame of the time the trans-                              roof of a nine story building at a horizontal distance
mission was initiated is 5 ms, a value closer to 15.5                            of approximately 200 m. The office window is treated
ms would have been expected. Possible explanations                               to resist thermal energy transfer, and this treatment
include failure of the subscriber station to forward the                         produces almost a 20 dBm loss of signal power when
probe in a timely manner or that a substantial amount of                         compared to subscriber stations positioned outside at
overhead preceded the probe in the downlink frame. The                           similar distances. The Airspan has a maximum power
throughput study presented in the next section confirms                           output of 20 dBm in comparison to 27 dBm at the base
the substantial overhead hypothesis.                                             station. Because of these factors, the most efficient mod-
                                                                                 ulation achievable was 16-QAM 3/4 on the downlink
   In summary, it can be seen that any WiMAX system
                                                                                 and QPSK 3/4 on the uplink. Consequently, the 64-
operating in time division duplexing mode inherently
                                                                                 QAM 2/3 downlink and 16-QAM 1/2 uplink tests were
provides round-trip latency that is tightly coupled to
                                                                                 performed using a M/A-COM subscriber station with 27
frame time and is asymmetric with respect to the location
                                                                                 dBm power output that was mounted in an automobile.
of the sender. Uplink initiated round-trips are up to one-
                                                                                 These consisted of a single 8 MB transfer.
half of a frame time shorter than downlink
                                                                                    The modulation modes in use between the base station
  Furthermore, the actual round-trip latency obtained                            and a subscriber are not directly controllable by the sys-
on a given WiMAX system is strongly dependent on                                 tem administrator. We controlled them by repositioning
the scheduling algorithms used. The M/A-COM WiMAX                                the subscriber station in the window sill and interposing
system produced round-trip latency that is strongly                              publications of various thickness into the line of sight
differentiated by scheduling class and quite high for                            while monitoring the Web interface. It was observed that
a LAN or MAN environment. Different scheduling al-                               modulation schemes occasionally changed dynamically
gorithms can reduce round-trip latency and produce                               during an experiment yielding throughput numbers that
identical round-trip times for RTPS and BE service on                            were inconsistent. This situation was easily detected, and
uncongested networks.                                                            when it occurred, the entire experiment was repeated.

In the absence of modulation changes throughput was                    The column labeled Kbps is the measured throughput
very stable as shown in Table 4 which shows the elapsed              at the IP layer in thousands of bits per second. It is
time in seconds, downlink and uplink throughput, and                 followed by the number of data bits carried by each
network layer packet counts for the eight individual                 symbol. The column labeled Sym/Frame is the average
runs of the 16-QAM 3/4 downlink throughput test. Note                number of physical layer symbols per frame consumed
that in seven of the eight runs an identical number of               by the transfer as measured at the network layer. This
acknowledgments was transmitted.                                     value should be independent of the modulation and
                                                                     coding rate but dependent upon the base station profile
                      TABLE 4                                        and can be used to characterize the total PHY and MAC
           16-QAM 3/4 Downlink at Receiver                           overhead.
                                                                       The table shows that the Def ault base station pro-
          Elapsed          Tx          Rx      Tx         Rx
              time      Kbps        Kbps     Pkts       Pkts         file imposes an overhead of approximately 40% on the
            17.670     68.841    3934.827    2924       5799         downlink but only 20% on the uplink. The Speedy profile
            17.671     68.840    3934.818    2924       5799         recovers 9 to 10 symbols per downlink subframe reduc-
            17.662     68.875    3936.811    2924       5799
            17.682     68.797    3932.341    2924       5799         ing the overhead to slightly more 30%. However, even
            17.662     68.899    3936.816    2925       5799         when all of the elements of system overhead discussed
            17.682     68.797    3932.352    2924       5799         in section 2.1 are accounted for, expected overhead is no
            17.671     68.840    3934.814    2924       5799
            17.671     68.840    3934.818    2924       5799         more than 20%. We could neither understand nor explain
                                                                     the 10% discrepancy, but, after reading a draft of this pa-
                                                                     per, M/A-COM engineers informed us that the extra 10%
                       TABLE 5                                       was due to a scheduling issue that would be corrected
             Downlink IP Layer Throughput                            in a subsequent firmware release. Uplink overhead was
                                                                     reduced to approximately 12% using Speedy.
    Modulation       BS Profile    Kbps      Bits/Sym      Sym/Sec      In summary, WiMAX in general produces asymmetric
    QPSK 1/2         Default      1140            192        59.3
    QPSK 3/4         Default      1705            288        59.2
                                                                     overhead on the uplink and downlink. The magnitude
    16-QAM 1/2       Default      2268            384        59.1    of the asymmetry can be significantly affected by how
    BPSK             Speedy        662             96         69.0   the base station is configured. This data shows that
    QPSK 1/2         Speedy       1325            192        69.0    a nominal 50/50 split is strongly biased in favor of
    QPSK 3/4         Speedy       1985            288        68.9
    16-QAM 1/2       Speedy       2641            384        68.7    the uplink. To achieve a target split at the application
    16-QAM 3/4       Speedy       3936            576        68.3    level it is imperative that the average overhead in both
    64-QAM 2/3       Speedy       5256            768        68.4    directions be known.

                         TABLE 6                                     4.2 Packet Loss
               Uplink IP Layer Throughput                            By sending UDP bursts of increasing size at 100 Mbps
                                                                     to both the base station and the subscriber station it
    Modulation       BS Profile    Kbps      Bits/Sym      Sym/Sec
    BPSK 1/2         Default       775             96        80.7    was determined that both could buffer over 200 NPDUs
    QPSK 1/2         Default      1553             96        80.9    of 1500 bytes each. TCP sender buffer size is limited
    BPSK 1/2         Speedy        848             96        88.3    to 128 KB in our Linux systems. Thus, a single TCP
    QPSK 1/2         Speedy       1692            192        88.1
    QPSK 3/4         Speedy       2540            288        88.2    connection on an otherwise idle network can never
    16-QAM 1/2       Speedy       3361            384        87.5    overflow the buffer space of either the base or subscriber
                                                                     station. Consequently, packets were never dropped for
  The downlink throughput results are summarized in                  congestion during these tests.
Table 5 and the uplink results in Table 6. The left column              The M/A-COM base station does not support ARQ.
of these tables identifies the modulation being used.                 Therefore, highly aggressive strategies for dynamic mod-
The next column shows the base station profile that                   ulation adjustment are not recommended and were
was in use when the data was collected. The profile                   not used. The results reported in Table 5 and Table
labeled Def ault is the profile that was supplied with                6 represent 98 total long-running TCP transfers, and
M/A-COM’s UAS system. The profile labeled Speedy                      the tcpdump traces showed that there was no packet
contains changes suggested by M/A-COM to improve                     retransmission in these runs.
throughput. Speedy specifies that the DCD and UCD                        Nevertheless, these results should be viewed as a best
be included in the downlink every other frame instead                case scenario. They are likely to be achieved only when
of every frame. On the uplink Speedy provides six-                   a fixed or nomadic subscriber station is stationary and
symbol ranging opportunities every fifth frame instead                has a direct line of sight to the base station that is
of every frame and allocates two symbols instead of five              unobstructed by moving traffic or other obstacles such
for contention in each uplink subframe. The nominal                  as foliage. Even in a slowly moving vehicle, shadowing
savings is an average of 10 symbols per frame on the                 and multipath effects produce significant packet loss in-
downlink and 8.2 on the uplink.                                      cluding complete loss of MAC layer connectivity. These

effects will be fully discussed in our subsequent paper                                             1

on coverage.
4.3 TCP Dynamics

The frame structure of the WiMAX system ensures that                                               0.6

TCP dynamics will be somewhat bursty. In steady state
                                                                                                   0.4                                                 Run   1
flow, all packets that arrive in a single subframe are                                                                                                  Run   2
                                                                                                                                                       Run   3
typically delivered with an interarrival time of approx-                                                                                               Run   4
                                                                                                   0.2                                                 Run   5
imately 180 µs. Then nearly 10 ms elapses before the                                                                                                   Run   6
                                                                                                                                                       Run   7
next arrival. However, the limited capacity of the frame                                                                                               Run   8
ensures that bursts never exceed 10 packets.                                                              2      4      6        8    10    12    14     16      18   20
  Fig. 11 shows the empirical cumulative distribution                                                                       Interpacket time in msec
functions of the interarrival times for eight downstream
runs in which the sender was using 16-QAM 3/4 mod-                                Fig. 12. Acknowledgment interarrival distribution
ulation. The resulting bit rate of 3936 Kbps, shown
in Table 5, corresponds to 3.2 IP packets of size 1500
                                                                                  7. The first column is the number of acknowledgments
bytes per WiMAX frame. Thus, if all frames are fully
                                                                                  per frame that were observed. The second column has
packed, it is to be expected that 1/3.2 or 30% of the
                                                                                  the number of frames observed to carry that number of
arrivals will experience a full frame delay while 70% will
                                                                                  acknowledgments. The aggregate column is the product
immediately follow their predecessor as shown in Fig.
                                                                                  of the first two columns. Thus 1641 frames carried a total
                                                                                  of 2731 acknowledgments, and 1639 of the 2731 or 60%
                  1                                                               of them were the first acknowledgment in a frame. The
                                                                                  other 40% followed a predecessor very closely as shown
                 0.8                                                              in Fig. 12. Analogous interarrival distributions of packets
                                                                                  and acknowledgments were observed in all throughput


                                                              Run   1
                                                                                                                       TABLE 7
                                                                                                          Distribution of Acknowledgments
                                                              Run   4
                 0.2                                          Run   5
                                                              Run   6                                    # of        Frames       % Frames       Aggregate
                                                              Run   7                                    ACKS        Having         Having          ACKS
                                                              Run   8
                  0                                                                                      0                 2           0.12              0
                       2   4   6        8    10    12    14     16      18   20                          1               770          46.92            770
                                   Interpacket time in msec                                              2               648          39.49           1296
                                                                                                         3               219          13.35            657
Fig. 11. Packet interarrival distribution                                                                4                 2           0.12              8
                                                                                                         Total          1641                          2731

4.3.1 ACK Compression
The interarrival distribution of the acknowledgment
                                                                                  4.4 Source bursting and TSO
stream at the sender is shown in Fig. 12. It can be
seen in Table 4 that an average of approximately 2924                             The packet and acknowledgment arrival streams, filtered
acknowledgments per run were transmitted. The aver-                               through the leaky buckets of the base and subscriber
age run time was 17.67 seconds, and thus the average                              stations, are not particularly bursty. However, this is
run comprised 1767 WiMAX frame times. Therefore,                                  not the case with the initial sending of packets and
in the most even possible distribution of the 2924 ac-                            consequently their arrival at the edge of the WiMAX
knowledgments, 1157 of the 1767 frames would carry                                network. Furthermore, the presence of TCP segmenta-
two acknowledgments and the other 610 would carry a                               tion offload (TSO) was seen to exacerbate this problem.
single acknowledgment. In this best case scenario 1157                            A TSO capable network interface controller (NIC) can be
/ 2924 = 40% of the acknowledgments would arrive in                               passed an IP packet containing a TCP segment of up to
the same frame with the preceding acknowledgment.                                 nearly 64 KB in size and will re-segment it into multiple
Fig. 12 shows that the actual arrival distribution closely                        IP packets of MTU size or smaller. TSO is not the same
approximates the best case with approximately 40% of                              as IP fragmentation. Its presence can be inferred when a
acknowledgments immediately following their predeces-                             tcpdump trace of the output stream shows packets whose
sor.                                                                              sizes exceed the MTU of the outgoing interface. The
   A more detailed study of the 1641 frames comprising                            17376 byte payload shown below constitutes 12 normal
the steady state operation of one of the eight TCP trans-                         size payloads of 1448 bytes each.
fers was performed. The actual distribution of acknowl-                           16:27:17.072890 IP >
edgments within these 1641 frames is shown in Table                         . 107985:125361(17376)

The objective of TSO is to reduce processor overhead         began with a a short transmit burst consisting of one
at gigabit and higher speeds where 100,000 or more           or more TSO packets typically comprising a total of
segments per second may be processed. However, even          44 segments of 1448 bytes each followed by a 600 ms
in the absence of TSO, two aspects of the Linux imple-       period in which the 22 acknowledgments were more
mentation of TCP induce bursty behavior at the sender:       or less evenly distributed among the 60 frames. The
the implementation tries to do as much work as possible      resulting throughput was the maximum achievable on
for a specific TCP connection in the context of the           the otherwise idle channel.
process that created the connection; and it also tries         Although this bursty behavior has no particular ad-
to avoid block/unblock “flapping” in which a process          verse effect when the network is otherwise idle, it would
rapidly alternates between being blocked with full buffer    have strong negative consequences should a burst of 44
quota and being unblocked with two available segments        segments arrive at the subscriber station when buffer
when an acknowledgment is received. Therefore, when          availability was very limited.
a process becomes blocked due to full buffer quota, it
will not be unblocked, and no further segments will
be transmitted until a substantial fraction of the buffer    4.5 Round-trip times
space becomes free.                                          For TCP data packets, we use the term ”round-trip time”
   The following data taken from a tcpdump of the 16-        to refer to the time that elapses between the transmission
QAM 3/4 downlink transfer at the sender is representa-       of the packet and the receipt of an acknowledgment
tive of steady state operation over the entire transfer.     for the data contained therein. Round-trip time was
The sender, wimax01, does not support TSO. The left          measured for each data packet whose transmission im-
column encodes the operation as transmit or receive. The     mediately followed the receipt of an acknowledgment
second column is the time since the start of the transfer    packet.
in seconds. The next two columns specify the number             To obtain maximum throughput, it is necessary to
of unacknowledged bytes and segments respectively.           maintain a backlog of data carrying packets at the base or
The last column is the usable window which is the            subscriber station sufficient to ensure the full payload of
offered window minus the number of unacknowledged            each subframe is populated by data packets or fragments
segments. We see that the receive at time 17.025 reduces     thereof. For uplink transfers, it is also necessary to ensure
the number of unacknowledged segments to 44 and              that the subscriber station has enough backlog to ensure
triggers a burst of 26 transmissions that all occur within   that uplink capacity can be continuously allocated via
the span of a millisecond. The next acknowledgment is        the piggybacking process.
received approximately 10 ms later at 17.035, and one or        Additional sender buffer capacity beyond that which
more acknowledgments continue to arrive every 10 ms          is necessary to support continuous transmission on the
until the number of unacknowledged segments drops            bottleneck air link cannot increase throughput. Beyond
below 45 again at time 17.105. This triggers another         this point the magnitude of the round-trip time grows
transmit burst of 26 segments.                               linearly according to Little’s Law as a function of increas-
    R     17.025 63712      44   46                          ing packet population in the network.
    T     17.025 65160      45   45                             Round-trip times experienced by the downlink 16-
    T     17.025 66608      46   44                          QAM 3/4 experiment are shown in Fig. 13. As previ-
    T     17.025 68056      47   43                          ously described, the packets for which RTT measure-
            :                                                ments were taken were typically transmitted with a
      --- 20 analogous records ---
            :                                                population of 45 unacknowledged packets in the system.
    T     17.025 98464      68   22                          Throughput is 328.37 packets per second, and so the
    T     17.025 99912      69   21                          approximate magnitude of the expected round-trip time
    T     17.025 101360     70   20                          is 46/328.57 = 0.140s. Jitter in the system produces a
    R     17.035 98464      68   22                          second mode at 150 ms.
    R     17.045 95568      66   24
            :                                                   The uplink BPSK 1/2 transfer with SO SN DBU F set
      --- 10 analogous records ---                           to 8644 produced round-trip times with modes of 70, 80,
            :                                                and 90 ms.
    R     17.105 63712      44   46
    T     17.105 65160      45   45
    T     17.105 66608      46   44                          4.6 Transport protocol and scheduling class effects
  The effect may be even more pronounced when TSO            UDP iperf tests that were run with a controlled bit
is enabled. On the uplink channel running in BPSK            rate, showed that the maximum sustained throughput
mode with network layer throughput at 848 Kbps, single       at the IP layer was the same as was achieved with TCP.
segments as large as 65212 bytes (45 standard segments)      TCP tests were also run with the scheduling class set
were observed with tcpdump. With TSO enabled on              to RTPS, and no change in maximum throughput was
the BPSK uplink channel, steady state behavior was           observed. The UGS scheduling class is not appropriate
periodic with a period length of 600 ms. Each period         for unconstrained offered loads.

                                                                                  network in which frequency division duplexing is used
                                                                                  by the base station. They formally prove properties of
                                                                                  the algorithm and then demonstrate via simulation its
                   150                                                            effectiveness in carrying a mix of VoIP and Web traffic.
    RTT in MSecs

                                                                                     The use of rate control on the packet arrival process for
                   100                                                            assuring the QoS guarantees for uplink RTPS or nRTPS
                                                              Run   1
                                                              Run   2             traffic is studied in [9]. An analytic model is developed
                                                              Run   3
                                                              Run   4             and validated via simulation. A RED-like mechanism is
                                                              Run   5
                                                              Run   6             proposed for controlling the arrival rate of each uplink
                                                              Run   7
                                                              Run   8             source that uses polled service. When the current polled
                         0   2   4   6        8     10   12     14      16   18   service queue length is less than τmin , the arrival rate is
                                         Tx time in Secs                          unconstrained. When the polled service queue length is
                                                                                  greater than τmax , the arrival rate is constrained to some
Fig. 13. Observed round-trip times
                                                                                  value λmin . As the queue length grows from τmin to
                                                                                  τmax , the maximum allowed arrival rate is continuously
4.7 Effects of over-provisioning                                                  throttled until it reaches λmin at τmax . The technique
                                                                                  is shown to stabilize delay in both steady state and
The standard does not specify required behavior when                              transient conditions.
the provisioned bandwidth of active flows exceeds the                                 In [10] an ad hoc simulation developed by the authors
carrying capacity of the network. We ran multiple tests                           is used to evaluate the capability of a simulated FDD net-
involving multiple over-provisioned flows. Observed                                work to provide differentiated services to video confer-
behavior would be classed as “reasonable and unsur-                               encing, VoIP, and data transfer workloads. Deficit round-
prising” in all cases, but analysis of flow traces at the                          robin (DRR) scheduling is used at the base station for
end systems provided little insight into the underlying                           downlink scheduling. Because DRR requires knowledge
dynamics of the scheduling system of the WiMAX base                               of the size of the head-of-line packet, weighted round-
station.                                                                          robin was used to schedule the allocation of uplink
   When two concurrent full rate best effort TCP trans-
                                                                                  bandwidth. A no-loss channel was also assumed. A
fers with each provisioned at 6 Mbps competed, as                                 related study, [11], uses simulation to evaluate the impact
one would hope and expect, bandwidth was shared
                                                                                  of using different physical layer frame sizes on both data
approximately equally between them. This was true for
                                                                                  and multimedia workloads. Other papers that report on
both uplink and downlink flows regardless of whether
                                                                                  the simulation of scheduling algorithms include [12],
the flows passed through a single subscriber station or
                                                                                  [13], and [14].
multiple subscriber stations using the same modulation                               Evaluation of specific aspects of the MAC layer proto-
and coding rate. Because of piggybacking, no impact
                                                                                  col itself is the focus of other papers. In [15] simulation
from contention collisions was expected, and none was
                                                                                  is used in an investigation of the optimal number of
observed. Aggregate throughput was the same as ob-                                contention slots in a WiMAX network. An OPNET sim-
served with a single flow.
                                                                                  ulation of the effectiveness of piggybacking compared
   When equally over-provisioned RTPS and BE flows
                                                                                  to contention is presented in [16]. Simulation is used to
competed, the RTPS flow consistently obtained some-
                                                                                  demonstrate the impact of fragmentation and concate-
what higher throughput. However, we were unable to
                                                                                  nation in a TDD WiMAX network in [17].
derive a model of the underlying scheduling system that
                                                                                     Simulation of the performance of extensions to the
was capable of predicting the allocation.
                                                                                  802.16 standard is the theme of [18] in which it is
   When one of the flows that was provisioned at 6
                                                                                  proposed that dynamically varying priorities be associ-
Mbps, was configured to generate traffic at less than
                                                                                  ated with service flows. Other works use simulation or
one-half the available carrying capacity of the network,
                                                                                  analysis to model characteristics of the physical layer.
it appeared that the scheduling algorithms enforced a
                                                                                  Included in these are [19] in which an OPNET simula-
max-min fair sharing. However, here again we were not
                                                                                  tion is used to demonstrate the importance of dynamic
able to derive precise details of the underlying queuing
                                                                                  modulation changes and [20] in which a physical layer
and scheduling dynamics.
                                                                                  model is used in a simulation study of coverage at 450
                                                                                  M Hz and 3.5 GHz.
5           R ELATED WORK                                                            Aspects of both physical layer modeling and enhanc-
Although there exists a considerable body of published                            ing TCP performance are found in [21]. An OPNET
research in the WiMAX domain, most of it is not directly                          simulation is used to evaluate the impact of ARQ re-
related to our work. Development of specific scheduling                            transmission delay on TCP performance. Because traf-
algorithms and then using analytic or simulation models                           fic flow on a TCP connection is typically asymmetric
to evaluate their performance is the focus of several                             with full-sized segments flowing in one direction and
papers. The authors of [8] propose a scheduling algo-                             small acknowledgments flowing in the other, asymmet-
rithm for half-duplex subscriber stations operating in a                          ric adaptation of modulation is suggested. It is shown

that overall throughput is optimized when a more ro-          MAC protocol such as channel descriptors and ranging
bust but less spectrally efficient modulation technique is     opportunities are generated. Overhead imposed by the
used on the acknowledgment channel than on the data           PHY and MAC layers can also be strongly asymmetric
channel.                                                      depending on configuration parameters. Although 10%
   Of the few studies that involve measurements taken         of the 30% overhead we observed was found to be due
on an operational WiMAX network, the most extensive           to a scheduling anomaly, the 10% difference produced by
is [22]. It is complementary to our work in that it uses      a change of base station configuration parameters could
a commercial network in which the authors have no             be applicable to any WiMAX network.
control over the provisioning nor the level of competing         Compared to WiFi technology, WiMAX provides sig-
traffic. The network in this study is characterized as         nificantly improved capability for provisioning QoS
“fixed” WiMAX and operated in Canada by two com-               guarantees. The contention-free UGS service provides
mercial service providers in the 2.496-2.699 GHz band.        a superior capability than even 802.11e for supporting
The authors’ equipment was attached to the network via        constant bit rate streams. The use of piggybacking signif-
Motorola Expedience RSU-2510F subscriber stations. The        icantly reduces the number of collisions experienced by
providers limit downlink rates to 1.5 Mbps and uplink         best effort traffic when compared to 802.11e. Absolutely
to 256 Kbps. The study compares the throughput and            enforced limits on throughput also ensure better fairness
RTT obtained using four TCP variants with transfers           than 802.11. Nevertheless, the capability is not without
in both downlink and uplink directions as a function          cost. Provisioning is a time consuming task. The system
of transmit buffer size. It is shown that a buffer size       administrator must have a fundamental understanding
of 64KB is necessary to obtain near link speed down-          of both PHY and MAC protocols to avoid unnecessary
link throughput with an RTT varying from 0.12 to 0.40         protocol overhead.
seconds. Throughput differences among the NewReno,               The long term future of WiMAX remains unclear. At
Cubic, Vegas, and Veno TCP implementations were not           present it occupies something of a niche role as an access
significant, but Cubic TCP was shown to produce ex-            network technology. We believe that its use is feasible
cessive retransmissions when using auto-tuned sender          in additional niches such as public safety and military
buffer space management.                                      applications. The key to its growth in the commercial
   A very brief paper [23] describes the performance of       sector is clearly its ability to support mobile clients and
synthetically generated VoIP traffic over a real WiMAX         the demand for this service.
network. The authors compare the performance of
G.723.1 and G.729.2 codecs as a function of the number
of concurrent calls using the E-model as a metric. It is      ACKNOWLEDGEMENTS
not clear if the network was carrying competing traffic        This project was supported by Grant No. 2006-IJ-CX-
at the time of the study.                                     K035 awarded by the National Institute of Justice, Of-
   Another study that employs an operational network          fice of Justice Programs, US Department of Justice. The
is [24]. The focus of this paper is the development           project would not have been possible without the interest
and evaluation of an end-to-end protocol for dynamic          and cooperation of the Police Departments of Clemson
addition of service flows in a hybrid network that used        University and the City of Clemson.
802.11e for access and WiMAX for backhaul. The focus            The authors also acknowledge the outstanding sup-
of this work is on the performance of dynamic service         port provided by M/A-COM personnel in procuring and
activation triggered by the subscriber stations. The au-      installing the equipment and in technical support. The
thors worked with the equipment vendors in the design         contributions to the project of M/A-COM engineers Tom
and implementation of the protocols.                          Brown and Mike Gaudette are especially noted.
                                                                The authors also thank the anonymous referees for
                                                              their comments and suggestions.
In this paper we have shown that, while WiMAX stan-             Points of view in this document are those of the
dards impose bounds on performance, choices made in           authors and do not necessarily represent the official
the selection of configurable parameters and scheduling        position or policies of the US Department of Justice, the
algorithms have significant impact on the ultimate per-        public safety agencies involved in the project, or M/A-
formance delivered to end systems.                            COM, Inc.
   Latency is strongly affected by the choice of frame size
and by scheduling algorithms. Mean round-trip latency
is inherently asymmetric with approximately a one-half        R EFERENCES
frame time advantage in favor of probes initiated in
                                                              [1]   “IEEE standard for local and metropolitan area networks part 16:
the uplink direction. Furthermore, scheduling choices               Air interface for fixed broadband wireless access systems,” IEEE
can produce multiple frame-time differences in latency              Std 802.162004 (Revision of IEEE Std 802.162001), pp. 1–857, 2004.
among compliant WiMAX implementations.                        [2]   “IEEE standard for local and metropolitan area networks part
                                                                    16: Air interface for fixed and mobile broadband wireless access
   Application-level throughput is affected by frame size           systems,” IEEE Std 802.16e2005 (Revision of IEEE Std 802.162004),
and the rate at which management elements of the                    2005.

[3]    J. Burbank and W. Kash, “IEEE 802.16 broadband wireless tech-           [23] N. Scalabrino, F. De Pellegrini, I. Chlamtac, A. Ghittino, and
       nology and its application to the military problem space,” Military          S. Pera, “Performance evaluation of a WiMAX testbed under VoIP
       Communications Conference, 2005. MILCOM 2005. IEEE, vol. 3, pp.              traffic,” in Proceedings of the 1st international workshop on Wireless
       1905–1911, 17-20 Oct. 2005.                                                  network testbeds, experimental evaluation & characterization. ACM
[4]    J. Andrews, A. Ghosh, and R. Muhamed, Fundamentals of WiMAX.                 Press New York, NY, USA, 2006, pp. 97–98.
       Upper Saddle River, NJ: Prentice-Hall, 2007.                            [24] P. Neves, S. Sargento, and R. L. Aguiar, “Support of real-time
[5]    C. Eklund, R. Marks, K. Stanwood, and S. Wang, “Ieee standard                services over integrated 802.16 metropolitan and local area net-
       802.16: a technical overview of the wirelessMAN air interface                works,” in 11th IEEE Symposium on Computers and Communications
       for broadband wireless access,” Communications Magazine, IEEE,               (ISCC’06), 2006. Los Alamitos, CA, USA: IEEE Computer Society,
       vol. 40, no. 6, pp. 98–107, Jun 2002.                                        2006, pp. 15–22.
[6]    A. Ghosh, D. Wolter, J. Andrews, and R. Chen, “Broadband wire-
       less access with WiMax/802.16: Current performance benchmarks
       and future potential,” IEEE Communications Magazine, vol. 43,
       no. 2, pp. 129–136, 2005.
[7]    M. S. Kuran and T. Tugcu, “A survey on emerging broadband
       wireless access technologies,” Computer Netwworks, vol. 51, no. 11,
       pp. 3013–3046, 2007.
[8]    A. Bacioccola, C. Cicconetti, A. Erta, L. Lenzini, and E. Mingozzi,
       “Bandwidth allocation with half-duplex stations in IEEE 802.16
       wireless networks,” IEEE Transactions on Mobile Computing, vol. 6,
       no. 12, pp. 1384–1397, 2007.
[9]    D. Niyato and E. Hossain, “Queue-Aware Uplink Bandwidth
       Allocation and Rate Control for Polling Service in IEEE 802.16
       Broadband Wireless Networks,” IEEE Transactions on Mobile Com-
       puting, pp. 668–697, 2006.
[10]   C. Cicconetti, L. Lenzini, E. Mingozzi, and C. Eklund, “Qality of
       service support in IEEE 802.16 networks,” IEEE Network, vol. 20,
       no. 2, pp. 50–55, 2006.
[11]   C. Cicconetti, A. Erta, L. Lenzini, and E. Mingozzi, “Performance
       evaluation of the IEEE 802.16 MAC for QoS support,” IEEE
       Transactions on Mobile Computing, vol. 6, no. 1, pp. 26–38, 2007.
[12]   R. Jayaparvathy, G. Sureshkumar, and P. Kanakasabapathy, “Per-
       formance evaluation of scheduling schemes for fixed broadband
       wireless access systems,” Networks, 2005. Jointly held with the 2005
       IEEE 7th Malaysia International Conference on Communication., 2005
       13th IEEE International Conference on, vol. 2, pp. 6 pp.–, 16-18 Nov.
[13]   A. Lera, A. Molinaro, and S. Pizzi, “Channel-aware scheduling for
       QoS and fairness provisioning in IEEE 802.16/WiMAX broadband
       wireless access systems,” Network, IEEE, vol. 21, no. 5, pp. 34–41,
       Sept.-Oct. 2007.
[14]   J.-C. Lin, C.-L. Chou, and C.-H. Liu, “Performance evaluation for
       scheduling algorithms in WiMAX network,” Advanced Information
       Networking and Applications - Workshops, 2008. AINAW 2008. 22nd
       International Conference on, pp. 68–74, 25-28 March 2008.
[15]   S.-M. Oh and J.-H. Kim, “The analysis of the optimal contention
       period for broadband wireless access network,” in Third IEEE
       International Conference on Pervasive Computing and Communications
       Workshops (PERCOMW’05). Los Alamitos, CA, USA: IEEE Com-
       puter Society, 2005, pp. 215–219.
[16]   R. Pries, D. Staehle, and D. Marsico, “Performance evaluation of
       piggyback requests in IEEE 802.16,” Vehicular Technology Confer-
       ence, 2007. VTC-2007 Fall. 2007 IEEE 66th, pp. 1892–1896, Sept. 30
       2007-Oct. 3 2007.
[17]   C. Hoymann, M. Putter, and I. Forkel, “Initial Performance Eval-
       uation and Analysis of the global OFDM Metropolitan Area
       Network Standard IEEE 802.16 ,” in Proc. of European Wireless
       conference, 2004.
[18]   L. de Moraes and P. Maciel, “A variable priorities MAC protocol
       for broadband wireless access with improved channel utilization
       among stations,” Telecommunications Symposium, 2006 International,
       pp. 398–403, 3-6 Sept. 2006.
[19]   S. Ramachandran, C. Bostian, and S. Midkiff, “Performance Eval-
       uation of IEEE 802.16 for Broadband Wireless Access,” in Proceed-
       ings of OPNETWORK, 2002.
[20]   T. Javornik, G. Kandus, A. Hrovat, and I. Ozimek, “Comparison of
       WiMAX coverage at 450mhz and 3.5ghz,” 2006 International Con-
       ference on Software in Telecommunications and Computer Networks,
       2006, pp. 71–75, 2006.
[21]   X. Yang, M. Venkatachalam, and S. Mohanty, “Exploiting the
       MAC layer flexibility of WiMAX to systematically enhance TCP
       performance,” Mobile WiMAX Symposium, 2007. IEEE, pp. 60–65,
       March 2007.
[22]   E. Halepovic, Q. Wu, C. Williamson, and M. Ghaderi, “TCP over
       WiMAX: A measurement study,” in Proceedings of MASCOTS 2008,
       To Appear, 2008.

Shared By: