Conceptual System Overview

Document Sample
Conceptual System Overview Powered By Docstoc
					        A Design Environment for
        Single Chip Digital Radio
                Systems
               Department of Electrical Engineering and Computer Science
                                          And
                           Electronics Research Laboratory
                           University of California, Berkeley
                               Berkeley, CA 94720-1774




                                     Final Project Report
                              September 9, 1997 - September 8, 2000

                                        Principal Investigator:
                                        Robert W. Brodersen
                                        rb@eecs.Berkeley.edu
                                            (510) 642-1779

                                   SPONSORED BY:
                     DEFENSE ADVANCED RESEARCH PROJECTS AGENCY


                                        MONITORED BY:
                                    UNITED STATES AIR FORCE
                               AIR FORCE RESEARCH LABORATORY




The views and conclusions contained in this document are those of the authors and should not be
interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of
the Defense Advanced Research Projects Agency (DARPA), the Air Force, or the U.S. Government.




                                                       I
               Project Title:
  A Design Environment for
  Single Chip Digital Radio
          Systems
Cooperative Agreement No: F30602-97-2-0346



    FINAL TECHNICAL REPORT


     Berkeley Wireless Research Center

 Principal Investigator: Professor Robert Brodersen




                       Revision 9




                           II
REPORT DOCUMENTATION PAGE                                                                                                                   Form Approved
                                                                                                                                           OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing
data sources,
gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other
aspect of this
collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports,
1215 Jefferson
Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188), Washington, DC
20503.

    1.    AGENCY USE ONLY                                2.       REPORT DATE                      3.       REPORT TYPE AND DATES COVERED
    (Leave Blank)                                                 28-March-01                               Final Report 09 Sept 97 – 08-Sept-00
    4.      TITLE AND SUBTITLE                                                                                                      5. FUNDING NUMBERS
            A Design Environment for Single Chip Digital Radio Systems                                                                 Agreement No: F30602-97-
                                                                                                                                       2-0346
                                                                                                                                       PR No: N-7-5839
                                                                                                                                       DARPA Order/Amendment
                                                                                                                                       No: E117/30
    6.      AUTHORS                                                                                                                    Catalog of Federal
            Principal Investigator: Robert W. Brodersen                                                                                Domestic Assistance no:
                                                                                                                                       12.910

    7.      PERFORMING ORGANIZATION NAME (S) AND ADDRESS (ES)                                                                       8. PERFORMING
            The Regents of the University of California                                                                                ORGANIZATION REPORT
                                                                                                                                       NUMBER
            University of California, Berkeley

    9.     SPONSORING / MONITORING AGENCY NAME (S) AND ADDRESS (ES)                                                                 10. SPONSORING /
           SPONSORING AGENCY:                                                                                                           MONITORING AGENCY
                                                                                                                                        REPORT NUMBER
         Defense Advanced Research Projects Agency (DARPA)
           MONITORING AGENCY:
           Rome Laboratory, Air Force Materiel Command; USAF
           26 Electronic Parkway, Rome, NY 13441-4514
      11. SUPPLEMENTARY NOTES
       12a.          DISTRIBUTION / AVAILABILITY STATEMENT                                                                          12b. DISTRIBUTION CODE
          Distribution Statement A. Approved for public release; distribution is unlimited.




      13. ABSTRACT (Maximum 200 words)




      14. SUBJECT TERMS                                                                                                                               15. NUMBER
          SOC System-on-a-chip                                                                                                                            OF PAGES
                                                                                                                                                          31
                                                                                                                                                      16. PRICE
                                                                                                                                                          CODE

    17. SECURITY                                       18. SECURITY                                       19. SECURITY                                20. LIMITATION
    CLASSIFICATION OF REPORT                           CLASSIFICATION OF THIS                             CLASSIFICATION OF                               OF
                                                       PAGE                                               ABSTRACT                                        ABSTRACT

NSN 7540-01-280-5500                                                                                                                                Standard Form
                                                                                                                                                    298 (Rev. 2-89)
                                                                                                                                                     Prescribed by
                                                                                                                                                     ANSI Std. Z39-1
                                                                                                                                                     298-102




                                                                           III
                                         Abstract


This report covers the final technical results of the research project to create a hierarchical
automated design environment for low-power signal processing integrated circuits for single chip
digital radios. The work was conducted at the Berkeley Wireless Research Center. A
methodology for rapid evaluation of candidate communications algorithms, architectures and
systems is presented. A modular design flow framework based on a combined Simulink and floor
plan description has been developed, which drives automatic layout generation for CMOS
integrated chips. The design flow includes automatic characterization of the layout to improve
system-level estimates as well as functional and timing verification. Comparisons to industry
standard design flows are made to demonstrate the benefits of this new design environment. The
design flow has been validated using the subsystems of CDMA and OFDM receivers and a 300k-
transistor test-chip.

A prototype wireless communication system was constructed as a replacement to the current
wired intercom system used by military personnel in the high noise environments of motorized
vehicles. The system was integrated into a standard marine helmet, adds flexibility and improves
ease-of-use. The modular system contains radio, digital, power and handheld controller
subsystems.




                                              IV
                                     Table of Contents
  i.   List of Tables
 ii.   List of Figures
iii.   Acknowledgements
iv.    Summary

   Part I: Design Environment:
                Hierarchical Simulink-to-Silicon Automated Design Flow for Digital Radios

       1.0 Introduction
       2.0 The Simulink-to-Silicon Design Flow
           2.1 Automation Framework
               2.1.1 Automation Programmer’s Interface
               2.1.2 Simulink-EDIF Translator
               2.1.3 Stateflow-VHDL Translator
           2.2 Verification Methodology
               2.2.1 Function Verification
               2.2.2 Timing Verification
                   2.2.2.1 Cycle Time
                   2.2.2.2 Race Margin
           2.3 Comparison to Industry Standard Flow
       3.0 Test Chips

   Part II – Hardware:

       1.0   Project Goal
       2.0   Hardware System Description
       3.0   Intercom Hardware Design Overview
       4.0   Accomplishments

   Results and Discussion

   Conclusions

   References




                                                 V
                                       List of Tables

Table 1: Comparison of communication algorithms, 28 users                       pg. 3
Table 2: Comparisons of critical paths delays for decimation filter test-chip   pg. 14
Table 3: Race Margin statistics for decimation filter test-chip                 pg. 14
Table 4: Complexity statistics for the timing recovery test-chip                pg. 18
Table 5: Complexity and are statistics for the decimation filter test-chip      pg. 19




                                                 VI
                                      List of Figures

Figure 1: Stages of the automated design flow                                           pg. 6
Figure 2: Levels of automation in the Design Flow                                       pg. 7
Figure 3: Illustration of a Simulink hierarchy (a), expansion of primitives (b), etc.   pg. 9
Figure 4: Example of a simple Stateflow chart                                           pg. 10
Figure 5: Illustration of timing closure problem traditional vs. our flow               pg. 13
Figure 6: Stacking of delay profiles to estimate critical-path delay                    pg. 14
Figure 7: Industry Standard Digital Radio Design Flow                                   pg. 15
Figure 8: Stages of design process with our methodology                                 pg. 16
Figure 9: Block diagram of the CDMA timing-recovery systems                             pg. 17
Figure 10: Floor plan for the test chip                                                 pg. 19
Figure 11: Wireless Intercom System Configuration                                       pg. 22
Figure 12: The Helmet with Custom Earpieces and The Custom Hand Held Device             pg. 23
Figure 13: The Board Sets Assembled in the Earpieces                                    pg 23
Figure 14: Digital Processing Board                                                     pg. 24
Figure 15: The Power Board                                                              pg. 25
Figure 16: The Radio Adapter Board and Radios                                           pg. 26
Figure 17: Single System Test Setup                                                     pg. 27
Figure 18: The Multiple System Test Setup                                               pg. 28
Figure 19: The Electro Mechanical design of earpiece and hand controller                pg. 29
Figure 20: Completed Intercom Marine System with Fused deposition Machine               pg. 30




                                                VII
                                Acknowledgements

Effort sponsored by the Defense Advanced Research Projects Agency (DARPA) and Rome
Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-97-2-0346.
The U.S. Government is authorized to reproduce and distribute reprints for Governmental
purposes notwithstanding any copyright annotation thereon.

This project was also supported by the contributions of and technical discussions with Cadence
Design Systems, Ericsson Mobile Systems and Lucent Technologies. STMicroelectronics
provided the CMOS fabrication of the test chip. The following graduate students were involved:
Rhett Davis, Hayden So, Ben Coates, and Dave Wang on developing the design environment and
Paul Hustead on testing the design environment. BWRC technical staff members Sue Mellers
and Fred Burghard developed the hardware prototype.




                                             VIII
                                          Summary

Researchers at UC Berkeley’s Wireless Research Center have created a design environment for
single chip digital radio systems. This design environment consists of a hierarchical automated
design flow for low energy consumption radio systems on a chip and hardware prototyping tools
to support real-world evaluation of complete wireless systems. This environment accelerates the
ASIC chip design process by automating the design flow to mask layout from a single
description. The Simulink tool from The Mathworks is used to capture functional, signal, circuit
and floor plan information and allows designers to make decision tradeoffs within a common
behavioral description. Signal processing architectures are typically data-path heavy. Control
logic accounts for less than one-tenth the power and area of the data-path logic for baseband
signal processing in wireless systems. Simulink offers a number of primitives such as adders,
multipliers, and switches, which allow easy description of a data-path. Control logic is described
in state machines using Stateflow, a primitive of Simulink.

To demonstrate the power of this ASIC design flow, five complex macros were designed and a
test-chip was designed and fabricated in .25 micron CMOS.

To validate the overall system approach, a complete wireless communication system prototype
was constructed. This system is functionally equivalent the current wired intercom system used
by the military for communication within personnel carriers. The system has added flexibility
and improves ease-of-use. It consists of digital, radio transceiver, power management and
handheld controller subsystems in a four board modular format. The first three boards are
mounted in the earpiece of a marine communications helmet. Six systems were built and a small
intercom network consisting of remote units and a base station was tested and debugged. The
network supports full-duplex peer-to-peer voice communication and remote to base station
control.
A radio adaptor board was also built for the Ericsson Bluetooth radio module as they became
available. This radio is better suited for this system since the Intercom application requires lower
power consumption and shorter range than the original Proxim radio.




                                                 1
         Part I – Single Chip Design Software Environment
1. Introduction

The emerging standard for system-on-a-chip (SOC) design flows is well suited to existing
portable radio system standards. A state-of-the-art mobile radio transceiver contains a digital
base-band chip described by a hardware description language (HDL) and utilizing a small number
of intellectual property (IP) blocks such as a digital signal processor, Viterbi decoders, and QPSK
demodulators. When we look beyond today’s radio system standards, however, we see that there
are still opportunities for vast improvement. Increased bandwidth efficiency comes at a cost of
much greater computational complexity, which needs to be provided with low energy in the
smallest possible area.

One study of indoor multi-user communication algorithms [1] shows that through the use of
multiple antennas and complex signal processing, we stand to gain as much as an order of
magnitude better bandwidth efficiency (see Table 1) over a commercially available radio.
Unfortunately, the complexity of this algorithm is nearly two orders of magnitude beyond the
capacity of an embedded DSP. Given the industry trend towards programmable “software-
radios,” it would seem that practical implementation of these algorithms will have to wait many
years for embedded processor speeds to catch up. The study also determined that a direct
mapping of the most complex algorithms to gates would be well within the capabilities of today’s
processing technology (10 mm2, 19 mW). This work focuses on a design methodology for these
kinds of direct-mapped signal processing algorithms.


                                      Commercial        CDMA           FDMA
                                        Radio         w/ adaptive        w/
                                        Proxim          MMSE          multiple
                                        WLAN           detection      antennae
                     bandwidth        0.3 bps/Hz      1.8 bps/Hz     3.6 bps/Hz
                     efficiency
                   no. of parallel          -             23              87
                     processors                        (303 mW)         (1149
                      (power)                                           mW)
                   direct-mapped            -            3 mm2         10 mm2
                    area (power)                       (6.8 mW)       (19 mW)
                   direct-mapped            -             60K           200K
                        gates

Table 1: Comparison of communication algorithms for 28 users, 0.8 Msym/sec symbol rate, 15
dB signal-to-noise ratio and 10-5 bit-error-rate. Number of required processors is based on
estimates of a low-power 16-bit fixed-point DSP. Direct mapping based on unrouted estimates
of a 0.25 um technology operating at 1V and 25 MHz.




                                                2
In order to determine which of the many candidate algorithms is the most practical, we need to
evaluate the power and area of their implementations. The complexity of these systems is so
great, however, that any method of evaluation other than prototyping tends to be inconclusive. It
is not sufficient to stop the design process after the RTL coding phase. The inaccuracy of wire-
load models creates too much uncertainty about meeting speed requirements. Therefore, we must
develop a design methodology that allows us to map all of the candidate algorithms to silicon for
evaluation.

By focusing on the mobile digital radio design problem, we greatly reduce the difficulty of the
design flow. Clock distribution is simplified because these chips seldom have clock rates above
100 MHz. Signal integrity problems are minimal even for deep-sub micron technologies due to
the low-power, low-voltage nature of the designs. Instead, the challenge for this flow is reduce
the design time of a direct-mapped signal-processing algorithm. This paper presents and
discusses our design methodology and the first test-chip fabricated with the flow.

2. The Simulink-to-Silicon Design Flow

The central goal of our design flow is to decrease design time. We find that designers spend most
of their time in trial-and-error with CAD tools, trying to get the desired output from the available
input. By writing scripts to constrain and guide designers’ usage of these tools, we can alleviate
this problem only to a point. Over-constrained scripts prevent the designer from truly optimizing
the system, while under-constrained scripts are more likely to fail. Some approach must be found
to this fundamental problem of CAD tool usage.

If we change our focus from decreasing design time to decreasing re-design time, however, the
problem becomes much more straightforward. If a successful path through the CAD tools is
found for a certain design, it should be a simple task to capture the flow and re-run it at the push
of a button. This decreases the overall design time by accelerating design iterations. To achieve
this goal, CAD scripts must not only constrain and guide usage of the particular tool, but also
constrain and guide the script’s usage in a more complex flow. This was the organizing principle
for the creation of our Simulink-to-Silicon design flow.

Our flow is specified as follows: we want to be able to make changes to a system and observe the
effect of these changes on the system performance at the push of a button. For this we had to
provide the following:
    1. An initial description which captures the changes we would like to make
    2. A library of primitives which are already well understood and optimized
    3. Automation spanning several CAD frameworks to generate mask patterns from the initial
         description
    4. A verification methodology to guarantee that the chip will work as described

We furthermore decided that the flow should support arbitrary hierarchy depth. For example,
placement and routing should be flat or hierarchical as needed. It is easier to observe the effect of
a change if the effect is not distributed throughout the rest of the system. The rest of this section
focuses on the problems that had to be solved in order to provide a design flow that meets this
specification.




                                                  3
We chose Simulink as the input description for our flow because it is inexpensive and often
familiar to both algorithm and circuit designers. It is relatively simple to express a design in
Simulink, which contains all the detail of a VHDL or Verilog description. The Simulink Fixed-
Point Block set allows the description of data-path logic with arbitrary word-lengths. The
Simulink Enable block can be used to model a gated clock. State machines designed with
Stateflow editor model control logic. User libraries can be created and referenced easily in
Simulink, allowing us to describe systems hierarchically. On the whole, Simulink captures the
optimizations we wish to make and serves as a complete specification for the functionality of a
chip. It is well suited as an initial description for our design flow.

The primitives for our flow were chosen in the context of a base-band CDMA receiver and
required the following primitives:

       Adder, Subtractor (ripple-carry)
       Selectable Adder/Subtractor (for CORDIC subsystems)
       Register
       Multiplexor
       Multiplier (Booth-encoded Wallace-tree array)
       Shifter (Barrel shift)
       Look-up table
       Bus ripper
       Constant value
       Less than zero

We find that most radio algorithms of interest can be expressed in terms of these primitives.
They represent only the highest level of granularity at which we want to design. The automation
framework and verification methodology should make the design of more complex primitives as
easy as possible. Some conceived examples are finite impulse response (FIR) filters, fast Fourier
transform (FFT) processors, and Viterbi decoders. Another optimization that we want our
automation framework to recognize is that some primitives do not map to transistors. These
wiring primitives include bus rippers, constant values, and less-that-zero operators (sign bits).
We now turn our attention to the challenges in providing an automation framework and
verification methodology that creates a chip from a Simulink system designed with these
primitives.

2.1 Automation Framework

Figure 1 shows the stages of our automated design flow. The elaboration step turns the Simulink
system into a transistor-level schematic and a collection of abstract views for routing. This step’s
name was chosen because of its similarity to the process of elaborating an RTL design from
VHDL code. The next step creates a new floor plan from the schematics and abstracts and
merges the information from the previous iteration’s floor plan. This merging is performed by
traversing each hierarchy from the top-down, descending into views with matching instance
names. This ensures that the merge will be successful as long as the instance names in Simulink
did not change between iterations. The last step routes the design from the bottom level of
hierarchy to the top and performs design-rule checks (DRC) and layout vs. schematic (LVS)
checks as needed.




                                                 4
 Simulink
  System




Elaboration

schematics                          last
    &                                                   The chip assembly flow is otherwise very
                                 Floor-plan
 abstracts                                              similar to a standard industry ASIC design
                                                        flow. The greatest difficulties were involved
                                                        in the development of the scripting
              Floor-plan Merge                          framework, the translation of the Simulink
                                                        design into an electronic design, and the
                    new                                 generation of control logic from Stateflow.
                 floor-plan                             The scripting framework was established
                                                        with an automation programmer’s interface.
                                                        The electronic design was generated with a
                                                        home-grown EDIF translator. Control logic
              Route & Verify
                                                        was generated with a Stateflow-VHDL
                 Layout
                                                        translator. The following sections describe
               Mask Patterns                            these solutions.


Figure 1: Stages of the automated design flow


2.1.1 Automation Programmer’s Interface

The hardest part about creating a “push-button” design flow that spans several frameworks from
several companies is making sure that it does not break. This is because it is impossible for the
design flow architect to account for all possible designs. Unexpected input leads to unexpected
cases for the design flow. Some examples are the following:
     Names are used which are reserved by one of the tools in the flow
     Names generated by one tool contain characters not allowed by one of the tools in the
        flow
     Name mapping by one tool prevents a connection from being made by another
     One of the tools contains a bug for a particular case of input
     Floor-plan is too congested and cannot be routed

The price of an unbreakable design flow may be too high. The only way to avoid name conflicts
is to choose names that are complex and confusing. Omitting a tool from the flow simply
because it contains a bug deprives designers of the power of that tool. Congested floor plans are
desirable because they lead to efficient use of area.

If we are not to create an unbreakable design flow, then we can at least create a design flow that
breaks with maximum grace. Such a design flow would have the following properties:
     Well documented and easily modified




                                                 5
       Common log file
       Robust error trapping mechanism

In order to achieve a design flow with these properties, we have created a cross-framework
Automation Programmer’s Interface (API). As illustrated in the figure 2 below, the API provides
a basis of functions from which a collection of design flow scripts can be written (called steps)
which invoke commercial tools. The API has been implemented in Perl, SKILL (the scripting
language for Design Framework II from Cadence) and Common Lisp (the scripting language for
the Pillar framework from Cadence). The API is structured so that it has an identical look and
feel in all frameworks. Thus, even design flows, which span several frameworks, can be easily
documented in terms of their calls to this API.




                          Commercial Tool Invocation
                           File/Database Format Specs              Design Flow
                                                                   Steps
                               Directory Structure
                                 Database Access
                           Run Directory Initialization
                                                                   Automation
                                Common Tech-file                   Programmer’s
                                                                   Interface
                         Error Trapping & Control Flow
                                Common Log File

                       Figure 2: Levels of Automation in the Design Flow


The goal of the API is to take care of the ugliest details of design flow automation while
refraining from imposing any file or database formats or assumed directory structure. It is
designed in five layers, each layer calling functions implemented in the lower layers. The lowest
layer implements the common log file by offering functions to open and close the log-file and
print log messages. The functions allow messages to be selectively printed according to a desired
level of verbosity. Messages are also indented according to a provided parameter. A set of usage
conventions comes with the functions that ensure that log messages are printed to the log file in
the proper order.

The next level is the error-trapping and control-flow level that provides a set of functions to begin
a step and end a step either successfully or unsuccessfully. A function is also provided to invoke
another step within a step. This sub-step can be a design-flow step implemented with the API in
any of the three supported platforms, or it can be a generic UNIX program, SKILL or Common-
Lisp procedure. In the case of the latter, functions are also provided which scan log files for
regular expressions that indicate success or failure of the sub-step. In the case of the former,
arguments are passed automatically which ensure the common log file and verbosity are




                                                  6
preserved as well as incrementing the indentation level to allow tracking the progress of the
hierarchical design flow through the single, common log file.

The next two levels provide a common technology file mechanism and a run directory
initialization mechanism. The technology file format is very simple and restricted to string types.
This is sufficient, since all frameworks offer the means to convert strings to other types. This
technology file can be used, among other things, to provide paths to default tool initialization
files. Most tools rely on special initialization files in their run directories to set up the technology
and libraries. Since the API does not impose a directory structure, it provides the functions to set
up run directories for certain tools in arbitrary locations. The API currently supports the
following tools:
      Synopsys Design Compiler (VHDL synthesis)
      Synopsys Module Compiler (datapath synthesis)
      Synopsys EPIC Tools (power estimation, static timing verification)
      Synopsys Arcadia (parasitic extraction)
      Cadence Design Framework II (structure processing and netlist generation)
      Cadence Pillar/Design Planner (floor-planning)
      Cadence Qplace (placement)
      Cadence IC Craftsman (routing)
      Mentor Calibre (physical verification)

The highest level of the API provides a standard means for modifying the design databases. This
is not a unified database, since different tools have different databases with different procedural
interfaces for modifying them. Instead, the API restricts the structure of code that modifies the
database. It creates a special kind of step (called a generator) that opens and modifies a single
cell-view in the design hierarchy. Functions for traversal of the entire hierarchy are also provided
which use the generator as their building block. The functions act on the database associated with
each framework, i.e. SKILL generators act on the Design Framework II database, common-lisp
generators act on the Pillar database. Perl generators also act on the Design Framework II
database through the use of SWIG [1] wrappers for C-based access functions provided by
Cadence.

By implementing our design flow in terms of this API, we have been able to create one 300k
transistor chip in 0.25 um technology (described later) and are maximally reusing the code as we
extend the flow to handle larger, more complex chips in 0.18 um and 0.12 um technologies. By
periodically re-running the flow on a set of reference designs, we have gained a few residual
benefits:
     Existence of a working example for every tool in the design flow. This helps to educate
         new team members about the usage of certain tools.
     Ability to exchange old tool versions with new tool versions in a matter of hours. The
         design is re-executed at the push of a button, and inconsistencies between versions are
         flushed out in the process.

2.1.2 Simulink-EDIF Translator

The first major task in the implementation of our design flow was the development of a way to
translate the Simulink design into an electronic design. We decided that a structural mapping was




                                                   7
favorable to a behavioral mapping because it is simpler and gives us more freedom when
optimizing our primitives. This is similar to the approach used by the Hardware Design System
(HDS) extension to Cadence’s SPW [1] which provides a library of special primitives which map
to VHDL or Verilog code. We therefore created a Simulink-to-EDIF translator, since EDIF is the
most common and generic format for representing structural design data. This translator relies on
the Simulink fixed-point block-set types to specify bus widths and the Simulink subsystem to
specify hierarchy.

In the context of our hierarchical design flow, the primary task of the translator is to map a
Simulink hierarchy such as the one shown in figure 3(a) to the EDIF hierarchy shown in 3(b).
Each primitive (C), though unique in Simulink, can map to multiple cells in the EDIF hierarchy
(C1,C2). Two adders with different word lengths, for example, remain a single primitive in
Simulink but must be two cells in EDIF. In contrast, the commercially available translator (the
Real-Time Workshop from MathWorks) loses the notion of a block reference above the level of
the primitives, resulting in the hierarchy shown in 3(c).

                  A                                      A                                  A


          B                 B               B                      B             B1                  B2



     C        C       C         C      C1       C2           C1        C2   C1        C2        C1        C2

                      (a)                                    (b)                      (c)
      Figure 3: Illustration of a Simulink hierarchy (a), expansion of primitives (b), and loss of
                       reference information above the primitives with RTW (c)

From an automation standpoint, figure 3(c) is better. It can be difficult to define what it means for
two blocks to be functionally equivalent, so the more robust automation strategy is to make every
block unique. From an optimization standpoint, however, figure 3(b) is better. If we wish to
examine the performance trade-off for two implementations of block B, for example, it is much
easier to simply modify B than to modify every instance of B in the hierarchy. Our hierarchy
mapping was a compromise of the competing needs for automation and optimization. We
eventually settled on the following two rules:

         Rule 1: Two instances of a Simulink primitive become instances of the same
                 EDIF cell if
          all ports have the same fixed-point types
          all distinguishing parameters have equivalent values

         Rule 2: Two instances of a Simulink subsystem become instances of the same
                 EDIF cell if all of their contained primitives map to the same EDIF cells.

The first rule states that, aside from port types, it is the job of creator of the primitives to decide
what parameters distinguish functionally equivalent blocks. For example, a parameter, which sets
the number of cycles of latency for a multiplier, distinguishes functionally different blocks. A
parameter, which sets the cycle time for a register, distinguishes a connection to a different clock
but does not make the blocks functionally different.




                                                     8
One very important repercussion of these rules is that traditional methods of optimization, such as
driver-load balancing, no longer work above the level of the Simulink primitive. This is because
circuit parameters such as loading are not a factor in determining the equivalency of blocks. This
encourages us to use primitives that traditional synthesis tools optimize well and leave all other
optimization up to the chip designer.

2.1.3 Stateflow-VHDL Translator

Another major obstacle for our design flow was a strategy for handling control logic. Unlike
data-path logic, control logic tends to be quite unstructured and is seldom reused between chip
projects without modification. Given the difficulty of verifying functional equivalency between
Simulink primitives and their implementations, it is unreasonable to expect that a new primitive
will be designed for every new piece of control logic. We therefore created a primitive around
the Simulink Stateflow chart.

Simulink Stateflow is a graphical language for describing control flow with state machines, based
almost entirely on the StateCharts language developed by Harel [1].
The primary challenge for this translator is to distill a synthesizable VHDL hardware state-
machine from a software state machine. The difficulty of translating a software state machine
into hardware can be better understood by examining the simple Stateflow chart shown in figure
4. This chart shows an output signal called count that is modified upon entering one state or
simply by remaining in another state (specified by the “during” key word). In contrast, a
hardware state machine uses combinational logic to determine the next state and output from the
current state and input. This contradicts the Stateflow chart in which the output changes even
though the current state and input do not.




                                                 9
Figure 4: An example of a simple Stateflow chart that counts from 0 to 7 and restarts from zero if
                the input signal sat_en is low or stays at 7 while sat_en is high.

Our goal was generate a hardware state machine that was as efficient as possible while remaining
functionally equivalent to Stateflow. We expressed this goal as the following optimization rule:

        Optimization Rule: The hardware should contain a number of flip-flops equal to
                           Log2 of the number of states plus the minimum amount of storage
                           for variables.

Once this optimization rule was chosen, the creation of the VHDL code was a matter of declaring
memory signals for the states and variables and reverse engineering the Stateflow execution
model in a single VHDL process. We rely on synthesis tools to map the VHDL process to
efficient combinational logic. The translator accepts a subset of the Stateflow syntax. Some
constructs not supported by the translator are the following:
          Transition actions (all data must be set within a state)
          State exit actions (all data must be set with an “entry” or “during” action)
          Some types of hierarchical state machines
          Variable shift operations on data
         Although we have not yet found reliable mappings of these constructs to VHDL, no
fundamental barrier to their translation is known.

One of the biggest difficulties for this methodology was the implementation of a reset condition.
In hardware, a reset is implemented by setting the values of all registers (i.e. the current state and
variables) to some known value. In Stateflow, however, the reset is indicated by a transition
rather than a state, as indicated by the large black dot in the upper left corner of figure 4. We




                                                  10
solved this problem by adding to the VHDL code a synchronous transition from every state to the
initial state, depending on the value of a global “Stateflow reset” signal.

The translator was verified through implementation of four state machines the a timing recovery
test-chip. The Stateflow charts ranged in complexity from 5 to 27 states, 4 to 35 ports, and 0 to
30 bits for variables. Functionality was verified by cross-simulation of the entire system in
Simulink and EPIC TimeMill. Implementing one of the smaller state machines in hand-coded
Verilog tested the hardware efficiency. It was found that the hand-coded version, though
significantly less verbose, was 20% larger in terms of cell area after synthesis. We have therefore
come to rely on the Stateflow chart and its VHDL translator as an efficient primitive for our
automated design flow.

2.2 Verification Methodology

In order for our Simulink-to-Silicon methodology to be viable for system optimization, we need
to be able to provide guarantees that a certain change will result in functional hardware. The
automated flow uses flat DRC and LVS checks on the final mask patterns to ensure that
placement and routing were successful. A scan-chain is automatically inserted to allow for
testing of fabricated chips. Signal integrity problems are assumed to be negligible due to the low-
power, low-speed nature of the digital radio design space. The problems of function and timing
verification, however, can be serious stumbling blocks. In this section, we discuss our approach
to these problems.

2.2.1 Function Verification

Since the Simulink system is our complete specification for the behavior of a system, the resulting
hardware must be functionally equivalent. Anything less than equivalency would lead to
optimizations, which “might” be feasible, which means that we could not reliably answer
questions about hardware efficiency. In the absence of formal methods to verify functional
equivalency between Simulink and transistor-level descriptions, we are left with simulation.

Assuming that the Simulink-EDIF translator is bug-free, the problem reduces to the matter of
proving functional equivalency of Simulink primitives to their hardware implementations. We
used EPIC TimeMill simulations of our timing-recovery test design to debug our primitives. Our
primary problem was that the Simulink Fixed-Point Block set’s method of truncating most- and
least-significant bits is not clearly defined. In the end, we were still unable to provide functional
equivalency for all combinations of fixed-point types. We therefore adopted the following rules
to guarantee functional equivalency:


        Rule 1: Each primitive has a set of rules which specify legal fixed-point types for
                ports. For example, an adder with a 12-bit unsigned integer input port
                must have the same type for the other input port and a 13-bit unsigned
                output port.
        Rule 2: Sign extension, zero filling, and MSB and LSB truncation are always
                performed by specially verified blocks, never in an arithmetic block
                (such as an adder or multiplier).




                                                  11
A simple tool was created for the purpose of scanning a Simulink design and providing fast feed-
back about the observance of these rules. For systems which use our primitives and follow these
rules, we have not yet observed the case where the Simulink and transistor-level descriptions are
functionally different. An alternative to these rules exists for primitives implemented in design
systems which provide automatic test-pattern generators (ATPG). For these primitives, a
complete set of test patterns which can be used to verify the functional equivalency of a Simulink.
Due to the overhead of supporting ATPG, however, this approach has not yet been attempted.

2.2.2 Timing Verification

In order to guarantee the functionality specified by the Simulink system, we must provide
assurance that the assumed cycle time can be met and that no races exist between registers. These
conditions are typically checked by generating a set of setup and hold time constraints for every
path between registers and using static timing models to verify that the constraints are met. The
process of adjusting a design to meet the constraints is commonly called timing closure.

The timing closure problem is very different in our design flow from the traditional ASIC flow.
The traditional flow expects that the creators of the source code for a design have little or no
interaction with those performing the physical design. As a result, the physical design industry
has come to rely heavily on tools which iteratively search for mask layout which meets the setup
and hold time constraints. Figure 5(a) illustrates the standard process of extracting wiring delay
or capacitance values from a routed design and using these values for re-routing, re-placing, or re-
synthesizing. The timing closure problem becomes a matter of finding a successful iterative path
through the tools. Modifying the source code is a last resort because it represents the largest
delay in the design process.




                                                12
              Modify                                              Modify
              Source                                              Source
              VHDL                                             VHDL/Verilog
              Verilog                                            Simulink


             Synthesize                  Modify                  Synthesize                 Modify
                                        Floor-plan                                         Floor-plan
              Std. Cell                                          Std. Cell
               Netlist                  Floor-plan                Netlist                  Floor-plan



                             Place                                              Place
                            Placed                                              Placed
                          Floor-plan                                          Floor-plan


                            Route                                               Route
                            Layout                                             Layout


                            Extract                                     Extract/Analyze
                          Wire Delay/                                        Cycle Time
                          Capacitance                                        Race Margin

                              (a)                                                 (b)

                             Figure 5: Illustration of the timing closure problem
                                in a traditional flow (a) and with our flow (b)

Because our design flow allows no human intervention aside from modification of the source
code and floor plan, the implementation of the kind of iteration in figure 5(a) is even more
difficult. Our approach has been instead to assume that the creators of the source code interact
closely with physical designers and to focus on techniques that report the minimum cycle-time
and race margin for a design. This simplified flow is illustrated in figure 5(b). The timing
closure problem becomes a matter of reporting these estimates quickly and accurately. The rest
of this section deals with our methods of estimating cycle time and race margin.

2.2.2.1 Cycle Time

We used EPIC PathMill to find the critical-path and cycle-time of our decimation filter test-chip.
The flow automatically creates the files necessary to run PathMill on both unrouted and extracted
netlists and will eventually invoke the tool as well. To provide faster feedback to the designer,
we created a Simulink timing estimator that reports a critical path based on maximum delay
values for each primitive. The delay values are generated with a SPICE-like simulator on netlists
extracted from routed versions of each primitive. The delay values are parameterized for
different word-lengths, supply voltages, and fan-out. These Simulink-level cycle time estimates
tend to be pessimistic because of the over-simplification of the timing paths. For example, the
critical path through two cascaded adders is much less than the sum of their individual critical




                                                     13
paths. The latest version of this estimator checks for the special case of cascaded adders and
increments the critical path by a single full-adder delay. A comparison of these four methods of
cycle time estimation is shown in table 2.

         PathMill            PathMill            Simulink estimator        Simulink estimator
        (unrouted)          (extracted)                                   w/ adder special case
          13 ns                22 ns                    33 ns
                Table 2: Comparisons of critical paths delays by various estimators
                                for the decimation filter test-chip

Future work will focus on using the floor plan to provide estimates of wire-load delays. We are
also developing a method for more general and more accurate Simulink-level cycle time
estimation. Figure 6 illustrates our approach. By assuming that all inputs to a Simulink primitive
arrive simultaneously, we can determine the worst-case output delay for each bit. We can then
work backwards to determine the latest possible arrival time for each input bit. These
arrival/delay profiles can then be stacked to estimate the critical path. This algorithm is described
more rigorously in [1], along with a method for generating these arrival/delay profiles
automatically with the Synopsys’ PrimeTime tool.


                   Delay
                                                          Ripple-carry Adder
                                                          Arrival/Delay Profile




                                                          Array Multiplier
                                                          Arrival/Delay Profile

                                                              Bits
                  Figure 6: Stacking of delay profiles to estimate critical-path delay.

2.2.2.2 Race Margin

In every combinational-logic network between registers, there also exists a minimum delay path.
If this minimum delay is smaller than the clock skew between the registers, then the circuit is
vulnerable to a race condition and may not function properly. Our approach to this problem is
estimate the clock skew and use flip-flops with large, negative hold times. For simplicity, we
define the race margin to be the system-wide minimum path delay minus the system-wide
maximum clock skew and flip-flop hold time. A positive race margin indicates a chip immune to
race conditions.

The automatic flow generates the files to run Synopsys’ Arcadia tool to extract an RC model of
the clock-tree. This model is then simulated with a SPICE-like tool to exhaustively determine the
maximum clock skew. Table 3 shows the calculation of race margin for the decimation filter test-
chip. The system-wide minimum path delay was assumed to be zero for simplicity. The large,




                                                 14
positive race margin indicates that there was no need to determine path delays to guarantee race
immunity.

Maximum Clock Skew               Flip-flop Hold Time              Race Margin
                                                                  5 ns
                Table 3: Race Margin statistics for the decimation filter test-chip

It remains to be seen how large the chips can be before this method breaks down. Future work
will explore the design of larger chips and a technique for early estimation of clock skew from the
Simulink design and floor plan.

2.3 Comparison to Industry Standard Flow

Let us compare our methodology to a more standard design flow. Consider the design of a digital
receiver for a third-generation indoor wireless network. The data rates will high (2-3 Mbps) and
the power budget very low due to battery life constraints. Extensive signal processing will be
required due to the unfriendly indoor wireless environment (multi-path echoes) and multi-access
interference from other users. The challenge of the design process is to create a chip that meets
the performance specifications while keeping the area and power at a minimum.


                                       System Design

                                           Matlab
                                            C
                                           SPW



                                      ASIC/Front-End
                                         Design
                                          VHDL
                                         Verilog




                                     Physical/Back-End
                                          Design
                                          Layout
                                       Mask Patterns


                     Figure 7: Industry Standard Digital Radio Design Flow

As shown in figure 7, common industry practice is to break the chip design into three phases
handled by three different engineering teams. The divisions between these phases are so well
understood that they can even be distributed between different companies. The chip vendor
performs the system design and delivers a specification to the ASIC design house in the form of a




                                                15
simulation. Matlab code, C code, and Cadence SPW designs are some of the most common
formats for this simulation. The simulation contains a rough block-diagram of the system with
functional blocks such as Viterbi decoders, adaptive equalizers, multi-path echo cancellers, and
timing-recovery systems. These functional blocks are designed with primitives, which range in
complexity from adders, registers, and multipliers up to FIR filters and Fast Fourier Transform
(FFT) processors. This simulation can be used to generate a bit-error-rate vs. signal-to-noise ratio
(BER vs. SNR) characteristic that the system designer expects the ASIC designer to meet.

The ASIC design team uses the simulation to create a VHDL or Verilog design, which it delivers
to the physical design house. The ASIC designer is free to choose any implementation for the
primitives used in the simulation. Adders can ripple-carry or carry-look ahead, array or serial
multipliers can be used depending on the timing and latency requirements. Complex primitives
such as FFT’s offer the most opportunity for optimization. Optimization beyond the primitives is
discouraged, however, because it runs the risk of yielding a worse BER vs. SNR characteristic.

The physical design team synthesizes and maps the VHDL or Verilog code to a standard-cell
library and uses place route tools to generate layout mask patterns for fabrication. The code must
meet all of the requirements for successful synthesis, clock-tree insertion, and timing verification.
More advanced ASIC design teams deliver chip floor plans, but in general it is the physical
designer’s job to create a floor plan. The physical designer must guarantee that fabricated chip
will match the functionality of the VHDL or Verilog description.

Since time-to-market constraints are so critical to the success of a chip project, designers are
focused on the task of making sure that this pipeline does not stall. There is not enough time for
multiple passes through this flow in search of a design with less power consumption and area. As
a result, system designs are less aggressive to give flexibility to ASIC designers. ASIC design is
less imaginative because it needs to meet BER vs. SNR requirements as well as the physical
format requirements. The physical design remains as highly constrained as possible to give more
freedom to ASIC designers.

                    System Design                        Macro Design

                      Behavioral                              New
                       Simulink                            Simulink
                        System                             Primitives




                                       ASIC Design                            Physical Design

                                         Structural                                Floor
                                         Simulink                                  Plan
                                          System

Figure 8: Stages of the design process
          with our methodology
                                                          Automated Flow

                                                              Layout
                                                            Mask Patterns

                                                 16
Figure 8 shows how the design process changes with the application of our methodology. The first notable
difference is that the automated flow must exist on one computer system and cannot be spread across three
companies as before. The second difference is that system, ASIC, and physical designers are forced to
work together to produce a design which will pass successfully through the automated design flow.

System designers now produce a Simulink simulation as a specification for the chip. ASIC
designers must map this system to a structural Simulink system in terms of the available
primitives. If new primitives are needed, they must be designed in VHDL or Verilog along with
a functionally equivalent Simulink block. Physical designers work with ASIC designers to
produce a floor plan at the same time that the structural Simulink design is being developed.

Once a single pass through the automated flow has been achieved, the structural Simulink system
and floor plan can be altered and the mask-patterns regenerated at the push of a button.
Alternative macro designs can be substituted and evaluated. The similarity between the
behavioral and structural Simulink systems makes it easier to explore modifications to the system
that achieve the same (or better) BER vs. SNR characteristic.


3. Test Chips

Our Simulink-to-Silicon flow was originally developed in the context of an indoor, wireless
CDMA system created in-house. The system supports 14-users at 3.2 Mbps/user with a code
length of 15 and chip rate of 25 MHz, described more fully in [1]. Our flow was used on two
test-chips, which are subsystems of the base-band receiver. A block diagram for the first test-
chip is shown in figure 9. This system is intended to provide coherent timing recovery and code
acquisition in a multi-user-detection receiver system and is described in more detail in [2]. The
system expects 8 parallel streams of data from the A/D converter, each representing a timing
offset of 1/8 of a chip. To generate these streams, an 8x parallel decimation filter for the sigma-
delta converter was also designed and included on the test-chip.

       Sigma-Delta                                                      Digital         (Adaptive)
          A/D          8                  3              1               Phase            Data
        Converter                               MUX                     Locked          Correlator
                             MU                                          Loop

                       3      X
                             Pilot            Fine Timing
                           Acquisition        And Carrier
                           And Coarse            Offset
                            Timing             Estimation




        Figure 9: Block diagram of the CDMA timing-recovery system for which the flow
            was originally developed (A/D converter was designed outside this flow)




                                                    17
Complexity statistics for the timing-recovery test-chip are shown in table 4. It was built
primarily with tiled, hard-macro primitives that we call modules. The module generators were
implemented in SKILL using custom leaf-cells. The Stateflow and look-up table primitives were
implemented as soft-macros.



               macro name               no. of   no. of     total xstrs total cell
                                        version instances               area
                                        s                               mm2
               Register                      30         632    239,976 2.49
               Add                           49         721    181,748 0.98
               Multiply                        4         20     86,798 1.26
               Subtract                      16         226     64,070 0.34
               Add/Subtract                  25          47     22,868 0.15
               MUX                             6         42       5,160 0.03
               Barrel Shift                    4         12       2,680 0.02
               Other                           6         61       4,620 0.03
               all modules                  140       1714     607,920 5.3

               lookup tables                   3            3          403 0.002
               Stateflow                       6            7       17,725 0.11
               all soft macros                 9           10       18,128 0.11

               clock buffers                  24           24        2,320 0.09

                        Total               149         1724      628,368 5.5


                  Table 4: Complexity statistics for the timing recovery test-chip

As can be seen from these statistics, data-path elements are the most heavily used. The fact that
the ratio of versions to instances is small indicates that the Simulink-EDIF translator is effectively
expanding the primitives. The complete flow for this design (not including routing and
verification) runs in about 30 minutes on a Sun Ultra-4 processor with 4 GB of swap space.

Due to the complexity of the timing-recovery design, only its primitives were carried through the
entire flow to mask layout. The complete system was verified functionally but never routed. To
test the automatic physical design flow, a second test-chip was fashioned around the decimation
filter subsystem. The decimation filter is roughly the same size as the timing-recovery system but
uses only 5 primitives. This system was carried through the complete flow and fabricated in a
0.25 um technology. Complexity statistics for this test-chip are shown in table 5, along with a
floor plan in figure 10. The complete flow for this chip runs in about 70 minutes with all but 10
minutes required for routing and verification.




                                                   18
                                           transistors       area
                        all modules        307,320           2.2 mm2
                        all soft blocks    6,244             0.037 mm2
                        total unrouted     306,582           2.3 mm2

                        routed core        313,564           6.8 mm2
                        with pad ring      313,726           10.7 mm2

            Table 5: Complexity and area statistics for the decimation filter test-chip




                              Figure 10: Floor plan for the test chip

Examination of the complexity and area in table 5 and figure 10 shows that the core area
utilization (34%) is far below what one would expect in industry. Another limiting factor of this
flow is the custom leaf-cells must be re-designed for every new technology. For these reasons,
we are redeveloping the physical flow to use soft-macro data-path elements that can be
synthesized and floor-planned more compactly. We are encouraged by the success of this chip
and are now focusing on the task of demonstrating this methodology on a chip in the 1-10
million-transistor range.




                                                19
                      Part II - Hardware Implementation

1. Project Goal

A wireless communication system prototype was constructed as a functional alternative to
existing wired intercom system used by the Marines for communication within armored vehicles.
The requirements of the Marine Intercom were for a multi-station network with high-quality
voice communication that could be configured to meet soldier-to-soldier communication needs
under high ambient noise field operational conditions. The system also adds flexibility and
improves ease-of-use through the use of an imbedded microcontroller. Figure 11 shows the
system configuration.

2. Hardware System Description
The wireless intercom system is a single cell system. Each mobile has a frequency-hopped radio
with a data rate of 1.6Mbps. Channelization is provided through a time division multiple access
scheme. Voice communication is from node-to-node and is not processed through the base
station. Each conversation is full duplex with a rate of 64 Kbps, which when combined with
control data gives a cell capacity of approximately 20 users. The base station maintains control
of the system, such as handling the set-up of new conversations and the allocation of resources,
and may be implemented by a mobile unit switching into base station mode, or it can be a
separate fixed unit.

3. Intercom Hardware Design Overview
The hardware is relatively simple due to the two core system components, the StrongARM SA-
1100 CPU and the Xilinx FPGA. The highly integrated nature of the StrongARM and the
flexibility of the Xilinx allow peripherals to be directly interfaced to both devices.

The StrongARM and the Xilinx form the heart of the system. With the exception of the analog
front-end (codec) chip and the radio, all input/output is tied directly to the StrongARM.

Audios input (microphone) and output (speakers) are connected directly to the multipurpose
codec chip, which communicates with the StrongARM through a synchronous serial Multimedia
Communications Port. Upon receiving information the codec will trigger an interrupt at the
StrongARM through a dedicated IRQ signal. The StrongARM then passes this information (after
any necessary processing) to the Xilinx, which is memory mapped into the StrongARM’s address
space. The StrongARM reads and writes information to and from the Xilinx like a bank of
registers. The data path between the Xilinx and the StrongARM is 32 bits, with 8 bits of
addressable space. The Xilinx, which has its own memory for buffering, FIFOs and look-up
tables, packetizes the data and sends it to the radio in accordance with its protocols. Finally the
radio transmits the data. Data, which is received at the radio, follows the opposite of the path just
described. The radio provides an analog Received Signal Strength Indicator (RSSI) signal that is
directed to one of the A/D converters on the codec chip. This gives the intercom an idea of the
channel state, which it can use to modify its algorithms to improve performance.
Not including the 1 MB of memory dedicated to the Xilinx, the intercom has a total of 8 MB of
memory. 4 MB of Flash memory, which is used to store boot code, programs, and other
permanent information. The remaining memory consists of 4 MB of DRAM.




                                                 20
The intercom is also equipped with two external serial ports. One is an RS-232-like serial port
that can communicate at 230 Kbps and is integrated with the StrongARM. This UART can be
used to program the Xilinx from a laptop, update the intercom’s firmware, or enable the use of
ARM’s development software. The second serial port is a 12 Mpbs Universal Serial Bus (USB),
which is also integrated on the StrongARM. This is intended to be used as a backbone
connection between the base station and a PC or other device.

Four general purpose input signals to the StrongARM (as well as power and ground) are brought
out to an external connector. This auxiliary input port is used for prototyping different input
devices (i.e. the Hand Held Device), which serves as the primary interface to the intercom in the
final version.

4. Accomplishments

A preliminary version, Phase 1, of the Marine intercom was demonstrated at the June 1998
DARPA PI meeting. It consisted of a Motorola two-way radio installed in a Marine helmet
validating the physical aspects of a single-station setup and first order multiple-unit
communication via limited channel selection. The operator was able to communicate with a
second mobile while wearing the Marine helmet.

The final Intercom system, Phase 2, consisted of two major parts:

   The physical aspects of a unit for a single station i.e. custom-designed helmet earpiece
    enclosures and custom circuit boards installed in the enclosures, and a hand-held unit with
    push buttons for power, push-to-talk, and mode selection. The enclosures were designed and
    build by the Integrated Manufacturing Lab in Mechanical Engineering at U.C. Berkeley. The
    boards were designed by the Berkeley Wireless Research Center and subcontracted for
    production.

   Software and configurable logic was used to implement peer-to-peer voice channels and
    control protocols. The control protocols provided a high degree of flexibility for network
    configuration. Voice communication could be point-to-point between users, a conference
    between multiple and arbitrary sets of users, multicast, or broadcast from an arbitrary station.
    System configuration could be set by a privileged user and could be modified instantaneously
    from a control console. Conversations could be initiated or terminated by use of “no-look”
    combinational manipulation of four buttons on the hand-held unit, limited by network
    restrictions imposed by the privileged user (e.g. commander). For instance, broadcast could
    be a command-only capability, listen in or break into a conversation could be limited to
    certain individuals. The system supported eighteen users, each with a dedicated 64Kbps of
    bandwidth for voice. All software and FPGA designs were done by the Berkeley Wireless
    Research Center.

All earpiece-mounted boards of the phase 2 Intercom system are complete and fully tested.
Figure 12 shows the helmet complete with the earpieces and hand held device. The boards
assembled into the earpieces are shown in figure 13. The board system includes a digital
subsystem board with an ARM processor and Xilinx FPGA, an adapter board for the Proxim
Rangelan II radio, and a power board which supplies the required regulated voltage levels for the




                                                 21
digital system and the radio. Figure 14 describes the digital processing board. The power board
is described in figure 15. The radio adapter board and radios are described in figure 16. There
are currently six units in operation, each composed of the three boards plus a radio. A small
Intercom network consisting of several remote units and a base station have been tested and
debugged. The network supports full-duplex peer-to-peer voice communication and remote to
base station control.

We have also designed and built an adapter board for the Ericsson Bluetooth radio module. This
radio is better suited for our purposes, since the Intercom application requires lower power
consumption and shorter range than offered by the Proxim.

Two Chip Intercom (TCI) is a continuation of research from Intercom that is targeting a dual chip
implementation of the Intercom architecture and protocols. The TCI system consists of a single
digital chip and a companion analog (RF) chip. The digital chip will contain a Tensilica Xtensa
processor, an embedded FPGA, and custom logic interconnected with a Sonics configurable back
plane.

To leverage the effort put forward for Intercom, a comprehensive development environment has
been created that allows a researcher to create and test generic applications for a low power
piconet using Intercom-derived hardware and software.

Tasks remaining for a fully functional system include design and implementation of the electrical
components for a manual control unit (the enclosure for this unit has been built), full integration
of the hardware into the helmet, and final test and debug of several advanced protocol features.

All major hardware elements of the Intercom prototype have been completed. All basic software
components have also been completed and tested. We have successfully demonstrated a three-
unit Intercom system on the bench, with peer-peer voice communication. Figure 17 shows a test
setup for testing a single system. A multiple system test setup is shown in figure 18. Figure 19
explains the development of the casing using a fused deposition modeling rapid prototyping
system for fabrication. A complete intercom system is being worn in figure 20.




                                                22
                                                                                       Cable/Strap
  Digital
  Processing                                                 RangeLAN II
  Board                                                      Proxim Radio
                                                                                                Push to
                                                                                                Talk


Power –                                                                                        Hand
I/O Board                                                                   Mode               Held
                                                                            Switches           Device
                                                               Speaker
            Speaker
                                                                              On/Off           Volume
                                                                              Switch
  Microphone
                                                                                       2 AA Batteries



                      Figure 11: Wireless Intercom System Configuration




                                                 23
                           The System Boards
1. Digital Board
    StrongArm1100/Xilinx/Memory Board
    Xilinx external RAM and ROM to support protocol research
    Complete CPU subsystem
2. Power/Codec/Connector Board
    Switchable Power Supplies to 3.3v, 5v, and 1.5v
    External Control Interface; connectors to Radio board and Hand Held Device
    Audio Interface
3. Radio Adapter Board
    Connector adapter for Proxim RangeLANII Radio or Bluetooth Single Chip Radio
4. Hand Held Device Board
    Mode switches/ volume adjust/ On/ Off/ Push to Talk/ 2 AA Batteries
Figure 12: The Helmet with Custom Earpieces and The Custom Hand Held Device




Figure 13: The Board Sets Assembled in the Earpieces




                                        24
             Digital Processing Board Features
   Complete CPU subsystem and FPGA
   System Reset
   DRAM (1M x 16) x 2
   FLASH Memory (1M x 16) x 2
   Xilinx with SRAM (256K x 26), Flash Memory (256K x 16), PROM for serial
    configuration (512K x 1)
   Board to Board Connectors




            Component Side                                Solder Side


                    Figure 14: Digital Processing Board




                                    25
                    Power/ Debug Board Features
   Maxim Power Level Converters:
       o 3.3 volts dedicated to StrongArm
       o General 3.3 volts
       o 1.5 volts
       o 5 volts
   Transceiver for serial connector
   Board to Board connector
   Debugging connectors: serial, address, and data busses
   Codec Interface
   Hand Held Device Connector
   Radio Adaptor Board Connector




              Component Side                                 Solder Side



                             Figure 15: The Power Board




                                          26
                             Radio Adapter Board
     Test connector
     Radio connector
     Board to board connectors
     Prototyping area

Figure 16: The Radio Adapter Board and Radios




    Adapter Board - Component Side              Adapter Board - Solder Side




       ProximRangeLANII Radio

                                                      Bluetooth Radio




                                       27
Figure 17 shows a complete single test setup with the laptop running the arm development
system. This software is downloaded into the StrongArm and Xilinx through a serial port.
Audios input and output are through the headsets connected to the digital board. A codec on the
digital board translates the audio signal to a digital signal ready for input and processing at the
StrongArm.




                               Figure 17: Single System Test Setup




                                                 28
     Figure 18 shows a complete test setup of two Intercom Systems talking to each other through
     Proxim RangeLan2 radios. Each setup has their own headset for listening and voice input as well
     as a laptop for independent downloading of custom software. The custom software is for system
     control as well as running custom applications.




                                                                                    Digital Analysis System




Laptops



                                                                                             Oscilloscope




                                                           Audio I/O



                              Figure 18: The Multiple System Test Setup




                                                   29
       Fabrication Method using Fused Deposition Modeling
The process of fused deposition modeling involves decomposing a 3D object into slices. A
nozzle traverses the area of an individual slice, feeding out strands of molten plastic. The part is
then built from the ground up with each completed slice.

Fused deposition modeling is a rapid prototype method. Fabrication time for this intercom case
took approximately 10 hours. With this method, one is able to use many different types of plastic
resins. The result is a tough, working prototype with 70% porosity.




  Figure 19: The Electro Mechanical design of earpiece and hand controller for Marine Corps
                                 communications system




                                                 30
Figure 20: Completed Wireless Intercom System Integrated into a Marine Helmet next to the
             Fused Deposition Machine used for Prototype Case Fabrication




                                           31
                                Results and Discussion

This project has demonstrated that it is possible to design and implement a wireless system to
meet the specifications of the Marine Intercom system. Prototype wireless systems were built
and tested in peer-to-peer and wireless Intercom network mode.

An environment to support rapid design of single chip, low power wireless systems was
developed and a test chip was produced to validate the basic approach. The success of the test-
chip and ease of the macro hardening flow are encouraging. The next step is to apply the flow to
the design of systems in the 1M to 10M-transistor range. The most difficult aspect of this flow is
the verification of functional equivalency of macro generators and their Simulink models. As
macros become more complex, more opportunities for discrepancy arise, leading to potential
problems when macros are combined. Future work will focus on comparisons of the estimates
gained from this approach to estimates made with other system-level design methods. Also,
much more investigation is needed into the level of detail needed during floor planning and into
which macro granularities scale best to future process generations.

All major hardware and software elements of the Intercom prototype have been completed and
tested. A three-unit Intercom system was successfully demonstrated for peer-to-peer voice
communication and in mobile-base station networked communication. The remaining tasks for
the prototype are to design and implement the electrical components for a manual control unit
(the enclosure for this unit has been built), and final test and debug of several advanced protocol
features.

This project has shown that digital radio chips can be designed using the single chip environment
tools that have been created. Furthermore, a complete wireless Intercom system has been
demonstrated. The final single chip digital radio was not implemented due to cancellation of
funding for this work.




                                                 32
                                       Conclusions

 A design environment for single chip digital radio systems has been developed and a prototype
system was been built and tested. The design environment was validated through the design and
  fabrication of a CMOS decimation filter test chip. The foundation of the basic approach to the
    design of single chip digital radios has been validated. A large research effort has gone into
   creating the foundation design systems that are captured in this report. However, much work
     remains to be done; including; demonstrating the design tools on chips in the 1-10 million
 transistor range, integration of a complete digital radio on one chip and constructing a complete
 wireless communications system using the chip. These tasks are being pursued at the Berkeley
   Wireless Research Center and further funding is being solicited to support extensions of this
                                                work.




                                               33
                                   References
[1] R. Brodersen, “The network computer and its future,” Proc. of the 1997 International
    Solid State Circuits Conference.

[2] N. Zhang, et al., “Trade-offs of performance and single chip implementation of indoor
    wireless multi-access receivers,” Proc. of the 1999 Wireless Communications and
    Networking Conference.




                                           34

				
DOCUMENT INFO