Learning Center
Plans & pricing Sign in
Sign Out

Embedded Systems


Buses are shared communication media used by devices to “talk to” each other both on-chip and off-
chip. The communication actions which take place can carry both data and control structures.

In this lecture we will point at following distinctions:

       on-chip vs. off-chip buses
       serial vs. parallel buses
       wired vs. wireless buses

Transaction is a complete piece of communication. All the transfers which take place across the bus
are split across transactions.

There are three distinct phases of every transaction:

    1. Arbitration – it is decided which device will own the bus (drive the common medium) for the
       time of transaction, thus becoming a master. It is only applicable to multi-master buses.
    2. Addressing – the master activates another listening device by broadcasting (via bus) its
       address (i.e. device address or control register address) for reception.
       Can be unicast (message applies to single device) or multicast (applies to many). The
       addressed devices are traditionally called slaves.
    3. Actual data transfer.

The arbitration and addressing phases are considered as overhead, as they carry no useful
information, just serve to establish a link. Actually, the goal of communication is to transfer the data.
There is a trade-off: increasing size of the chunk of transferred data for each arbitration and
addressing reduces the need of frequent arbitration, also reducing the overhead and improving the
throughput. However, long transfers render the bus unusable for other devices, which are possibly
waiting for it, increasing latency.

Synchronization methods are needed for assurance that data presented in each of the
aforementioned phases are valid (not corrupt or changing) when being read. We can distinguish
three types of synchronization protocols across the bus:

       synchronous protocol – there is a clock signal, which informs that all the data is stable, and
        can be read safely
        As vast majority of logic circuits is synchronous, the idea of extending it to the buses seem




    All the data is read only on the CLK edge.

    However, it is often that different devices are clocked by different clock signals, which can
    run at different frequencies (e.g. CPU and UART peripheral), or be completely independent of
    each other (e.g. two communicating UARTs).
    The question is, how to implement such functionality, so that blocks (master and slaver)
    running on different speeds could talk to each other.
   semi-synchronous protocol – there is a clocking signal synchronously to which a request from
    master occur. The response from slave is indicated on dedicated line (i.e. READY or WAIT),
    whose value is sampled by the clock signal.

    In the timing diagram above (for a read transaction), the data item at a clock edge
    corresponds to the address at the previous clock edge. The slave selected in the first cycle
    drives READY low to notify the master that it is not ready yet to provide the data in the
    second cycle, but only at the third one, when the master can safely read the data,

   asynchronous protocol – there is no implicit time constraints – the asynchronous events are
    issued both by master and slave.




                                            Slave put data.              Slave has finished.

                      There is valid address on the bus.
                                                              Master confirms reading data.
                      Slave acklowledges its address.

        The protocol uses two lines dedicated to asynchronous messaging: ACK and REQ.

Asynchronous buses in on-chip communication may gain popularity in future due to following
reason. In synchronous transfer, the clock frequency must be such that the slowest signal can
propagate from source to destination. This may even take 4-5 clock cycles to propagate the whole
chip! There are also many signals that can reach destination much faster, but are forced to wait for
the longest, worst case clock delay. The asynchronous protocol does not need such constraints.

Serial vs. parallel buses
When in need of large bandwidth, resorting to parallel bus (where data is transmitted simultaneously
across multiple wires) is a natural solution. They can increase throughput proportionally (roughly) to
the number of wires.

However, if we have many wires, the propagation conditions for each may be slightly different (the
distributed reactance can differ from one to another) thus resulting in different travel time across
whole line. It is difficult to balance many lines, so that they have equal propagation times. This effect
is called the skew.

The maximal clocking frequency for parallel buses is

The more lines the parallel bus have, the larger is the skew, but for short distances the effect is not
so large (proportional to distance). So, the parallel buses are option of choice for on-chip buses, while
in off-chip communication one often resorts to serial protocols.
AMBA is a feature-rich example of parallel on-chip bus standard. It is defined by ARM company and
used in µC’s with ARM cores, thus being the most popular 32 bit on-chip bus. The standard defines
two buses serving different roles:

        AHB – which stands for ARM High-performance Bus – it links fast peripherals providing high
         clocking frequency and large throughput. Has very complex hardware and is expensive to
        APB – slow bus, very simple in comparison, cheap to implement in hardware

                      ARM                           DMA

             I$                 D$

    master             master                   master
        slave                        slave               slave                      slave

                                                        DISK                         AHB/APB
                RAM                      ROM
                                                     CONTROLLER                      BRIDGE


        slave                        slave               slave              slave

                UART                   TIMER                     D/A           GPIO

The bridge functions as slave to the AHB and is only master of APB bus. It adapts relatively slow APB
bus (can be 16x slower) to high speed AHB, dealing with timing, split transactions and packing and
unpacking the bytes of AHB word.

AHB is fast, parallel, multi-master, pipelined bus with support for burst and split transactions.

Pipelined bus can perform each of three aforementioned transaction phases simultaneously:

 Arbitration                 T1                 T2                  T3                 T4

 Addressing                                     T1                  T2                 T3

 Data transfer                                                      T1                 T2

Burst transaction – special type of transaction with specifying burst request to the slave, which may
be able to increment (or decrement) the given address, thus potentially reducing the need of sending
many sequential addresses through bus. This reduces power, since computations at the slave are less
power consuming, and may increase performance, since address decoding is done at the destination.

Burst transaction is often used by cache controllers when fetching a block of data from memory.

Split transaction
Its primary goal is to improve performance if following scenario:

A master initiates a transaction which potentially can take a long time (i.e. communicating with APB
peripherals). Without possibility of split transactions it would hold the bus for many cycles, which
could not utilized by other masters (e.g. DMA).

With split transaction, the masters issues a request, then releases the bus, and waits for notification,
which can come many cycles after.

APB stands for ARM Peripheral Bus. It is a slow, parallel bus with no pipeline, burst or split
transaction with single master. The peripherals connected to it won’t utilize high speed provided by
AHB, so it can be i.e. 12x slower than APB. Otherwise it would be a pure waste of power and chip

Typical buses used in 8-bit µC’s are very similar to APB – this is due to the fact, that they are mostly
legacy devices fabricated in older technologies, and where simplicity values higher.

Connectivity in AHB
Here is a simple diagram. More detailed information can be found in AMBA specification from ARM
  MASTER                                               BUS                                      SLAVE
                   HBUSREQ                                                 HSPLIT


                               Interested           Interested
                                 entities             entities

                   32          HADDR                                                     HSEL

                                                                 lower part (register)

                   2      HTRANS – transfer type - not ready, burst mode request
                                                 HWRITE - direction

                   3             HSIZE – 8/16/32/…/1024 bits of data size
                                           HPROT – levels of protection
                                    HREADY (semi-synchronous feature)
                                               HRESP – error condition


This represents data lines in AHB bus (H in the beginning stands for AHB). Note, that only single
master and single slave side is drawn. In fact each slave and each master would have it’s own
implementation of the hardware needed, and in the place where bus touches master and slave sides,
multiplexers are found if necessary.

HMASTLOCK – signals a locked transaction

HLOCK can be used to perform atomic instructions such as Test-And-Set, locking the bus from other
devices, that could change sensitive data in the middle of critical section.
Decoder presence is not necessary, but removes decoding logic redundancy from every peripheral.

There are no 3-state buses in AHB, nor open collector, as found in i.e. ISA or I2C, as it is chip
optimized. In every place, where multiple sources could drive same line, we find multiplexers.

Interesting diagrams can be found in AMBA specification from ARM.

Additional AMBA features
Implementor of AMBA specification may implement only part of functionality, as long as it support
basic transaction (i.e. off-chip). They may also be as complex as having three levels of buses, split
transactions etc.

Split transactions

Used to fill buffers of peripherals with sequential

1st part of transaction is asynchronous, similarly to normal transaction

Master signals request for reading/writing sequential data from/to a slave. It keeps SEQ on HTRANS
during the transfer. Processor often requests lock on the bus – this enables to ease keeping track on
where you are. You can also have soft-buses, which can grant access to the other peripherals.
The data on HWDATA doesn’t change during clock trigger, if HREADY is in wait state.

Wrapping burst

This type of transaction is well suited for filling the cache.

When there is a request for data from cache which finishes with a miss, the cache fetches the desired
location first, then follows with fetching the following addresses, because it is how the cache
controllers work. This is the order:

Memory locations:
Fetch order:           3                     4                 1                  2
                                                               ^ CPU Request
The wrapping burst implements such kind of transfer in hardware. Wrapping burst size can be i.e.: 4,
8, 16 bytes, matching the cache line. This should be designed by chip architects.

AMBA doesn’t specify order in which the arbiter grants access to the masters. It can be fixed priority,
round robin, priority queue, etc.

Split transaction
Stages of split transaction:

    1. Master initiates transaction as usual.
    2. If the slave is not ready, it asserts split and remembers the active master (provided by arbiter
       to anyone interested).
    3. The arbiter grants the bus to other masters.
    4. Slave asserts HSPLIT line to the arbiter, telling which master can resume.
    5. Arbiter restores bus grant to interrupted master.

There is a diagram in AMBA documentation, but it does not show the whole complexity.


Write transfer
The following picture applies to APB. The bus does not implement pipelining.

The PSEL duplicates some of PADDR part to simplify decoding logic in peripherals, thus reducing

PENABLE high indicates data phase.

If PCKL = 10MHz, the actual data throughput is 5MHz, cause data is held by 2 clock cycles.
The buses of 8-bit µC’s look very similar to APB. They may even share physical lines for address and
data – this is used i.e. when interfacing with off-chip devices.

The width of APB data mismatches data width of AHB. The bridge responsibility is to split and merge
Universal Serial Bus.

It was meant to replace huge variety of serial protocols which existed at the time of its design in PC’s
and embedded systems. It become quite a success with PC, in the world of embedded systems it still
competes with others.

I1C – one of competitor. Very simple, has only data and clock lines. Used i.e. with boot control, when
CPU talks to external ROM.

CAN, LIN, others – competitors in automotive world. Provide i.e. guarantee of service, multiple

Has one master (mostly a PC). Some smartphones can act both as master or as slave.

It is tree-structured. Master is called a host. Slaves are called functions. There are also nodes, that
only implement connectivity, called hubs.



                               FUNCTION           HUB           FUNCTION

                                       FUNCTION           HUB

Functions cannot request anything from Master. Every communication is initiated by Master. There
are no interrupts. Receiving data from functions is done by polling.

USB provides “hot plugging and unplugging” capability, which is fairly unique, but is dependant on
good-written software.

As much as 128 devices can be plugged-in at the same time.

USB has (like AMBA) several levels of performance. There are all implemented on same physical
medium (unlike AMBA). The levels differ i.e. by speed (dictated by frequency). From 100MB’s/s –
video, HDD, to kB’s/s with i.e. mouse, keyboard.

USB carries both information and limited power supply to power simple functions. They can also be
connected to external power sources.
         USB Lines

         Power supply                                 Data lines
                                    D-                (differential)

At transmission start the encoding used is binary, then is NRZ with bit stuffing (zeros stuffed into
sequences of ones). Zero is encoded as transition, one as no change. In USB specification they are
called J and K. The transmission is differential, and when both lines are in the same state it signals
special condition – synchronization.

The lines are terminated by pull on / pull off resistors which remedy noise problems on unterminated

Units of transfer.

Synchronization field
There is sync field which synchronizes the receiver to transmitter. There are 8-bit alternating

There are 3 types of packets:

         Token packets – control packets, i.e. requests
         Data packets – pure data
         Status packets – return various status information from functions

Sync                    PID (packet identifier)                               Payload
8 bits                  8 bits
                        4 bits of packet type      4 bits complement

It has similar status as a status line. It is used to confirm that data is accepted, or that no data is
ready to be transferred.

Split transaction
The transfer can also be split across may functions. The hubs support split transaction requests from

After 3 unsuccessful transactions, the hosts decides, that the peripheral is no longer connected.
Cyclic Redundancy Check is used to protect the Payload.

Erroneous transactions
USB standard documentation contains illustrative diagrams on various errors that may occur during

Most intelligence is concentrated in Host – PC. This allows function implementations to be fairly
Wireless Sensor Networks
The WSN in one of popular embedded system application.

What characterizes WSN:

       Ad-hoc wireless network
       Sensing
       Computation based on sensing
       Actuation

Bell’s law: there is new computer class each 10 years. I.e. it’s smaller, less people is required to
handle it, new modes of connectivity and interfacing etc.

Exemplary application areas:

       Environment and agriculture
             o Used mostly for monitoring (i.e. fires) to alert interested parties
             o Can be static and mobile (animal handled)
             o Has been prime target of academic efforts
             o Example – Zebranet (Princeton) – tracking migration of zebras across Kenya
             o Example – control of irrigation in response to temperature and humidity across large
       Infrastructure Monitoring
             o Example: gathering information about pressure, temperature across bridge
                 structures, so reduces human labour
             o Predicting ground anomalies (land slide), investigation of causes – lots of wires would
                 be unfeasible
       Indoor automation
             o Most probably the first successful commercial application
             o Example: lights turning on in response to human presence, door opening, etc.
             o Have long lifetime requirements
       Medical, health application
             o Body sensor network replacing wired sensors – human-in-the-loop – i.e. chair with
       Ambient Intelligence
             o Environment that is aware of objects or phenomena, adaptive and responsive
             o Does not require assistance
       Industrial Automation
             o Temperature across the plant
             o Positioning large parts in car manufacturing in submilimeter precision
             o Pattern recognition
             o Control system
             o Adaptive, robust, flexible
       Automotive Applications
           o Traffic assistance
           o Driver help

Inherent features:

       cannot be restarted in field – has to be reliable
       can be installed without effort of putting wires, rearranging buildings


       Monitoring vs. Control
       Statically placed or in move
       Optimization of power – primary goal.
        Application stops when battery runs out of charge and it can be hard or impossible to be
       Network density and size
            o Number of nodes
            o Number of neighbouring nodes – how many nodes are communicating to each other
       Central vs. distributed
       Hierarchy or uniformity

       Cost, size and power (interrelated features) – 1$ should be optimal for a piece
       Robustness
       The algorithm used should be as simple as possible to minimize power requirements
       Ease-of-development and management

Power/energy problem
100µW = 1cm3 of lithium battery volume for 1 year of operation on 100%.

Rechargeable batteries are half as efficient in term of volume.

Need to be replaced every 9 months or recharged every 3-4 hours.

Architecture of sensor network

I.e. Atmega on Mica2dot board

I.e. 512k external serial flash for Atmega.

I.e. RFM or Chipcon 1000
Depends on application.

Power supply
Either battery or energy scavenge.

Exemplary pre-made boards
       Mica2 – big, small memory, runs Atmega
       Mica2Dot
       Tmote Sky – has USB port, 5 sensors integrated, runs MSP430
       Imote – powerful and enery-hungry

Operating systems
If you don’t have OS you have to write everything by yourself – deal with all hardware nuances and
task scheduling. This may be time consuming. A feasible solution would be a system that run on very
small system that can be simply programmed.

Usual services implemented by OS
    Abstracting the system resources.
    Thread/task safety.
    Separation from system mode and user mode for hardware.
    Memory management unit.

Possible implementations of these goals
To imitate regular operating system (i.e. providing POSIX-compatibility)

Create familiar programming interface (namely: processes). Sacrifice process separation to match the
restrictions of the platform.

Small operating system targeting popular WSN platforms.

       Takes about 400 bytes (which architecture?)
       Created at University of UC Berkeley.
       Requires programming in nesC – it’s own programming language, which is an extension of C,
        influenced by VHDL
       Portable across several platforms.

Component-based system (similar to VHDL components). Program is created by wiring them
together. They communicate by exchanging asynchronous events (event-driven).

Most WSN operate in sleep mode, so the system should also be able to be put to sleep. Therefore it
should react to environment conditions asynchronously, and send messages to components

TinyOS components:
    Frame – state holder
    Task – normal execution program (thread?)
       Command handler
       Event handlers


                                                     Buffer which can be
                                                     accessed by both.


There is no context switch – tasks share the same stack. Each task runs to the completion (tasks
cannot sleep). Scheduler is FIFO based. It is similar to single-threaded event queue dispatchers in
higher level frameworks (win32 or java+awt event queue).

Interrupts that arrive don’t do full context switch, but simply enqueue another task to be done after
executing currently queued tasks.

Blocking resources is not an option, as there is actually no concurrency. This implies “Split-phase

Interfaces define functionality that should be implemented in component. Components provide
implementation for common interfaces. (Rather composition than inheritance).

Further reading

How to develop WSN application?
We need:

       Hardware platform
       Software Platform
       And development environment for each

Platform Construction – Software
Usual system construction:
           System Services           Device
OS {
                Kernel               Drivers
Platform system for WSN

Cross platform development support is a pro, as enables easy switching between different hardware
platforms. The code should compile well with compilers which produce binary images for different
platforms, and use abstracted API instead of dealing with chip resources directly.

To ease difficulties in porting to different platforms there is a possibility of debugging on-chip
utilizing JTAG.

       Small footprint.
       Micro-threading.
       Application is ONE linked executable composed of OS and components.
       Event-driven (previous lecture).
       High concurrency using little space.
       Single shared stack across tasks.
       No distinction between kernel and user space; the memory is shared.
       Split phase of request and response - asynchronous command/event.
       FIFO based scheduler which queues tasks which run one-by-one until completion.

       Operating system also tailored to WSN.
       Developed by University of Colorado.
       Offers preemptive multithreading.
       Often abbreviated to MOS.
       Uses < 0.5kB in RAM (with network stack included).
       Coded in standard C.
       Kernel resembles UNIX one.
       Implements POSIX subsystem (mutexes and semaphores for synchronization).

MANTIS comprises of following elements:

   Network Stack           Command Stack           User level threads

                         MANTIS system API

 Kernel/scheduler               COMM                      DEV

Preemptive scheduling
         When using preemptive multithreading, a programmer does not need to take into
          consideration the possibility of one task blocking another, as CPU time is instrumented by
          the scheduler.
         Preemption of running tasks consumes time, memory and energy, as each time the whole
          context is stored on a stack.
         As CPU is not blocked, other types of blocking may occur, when concurrently using resources
          others than CPU.

Preemption of long-running tasks with short ones, dealing with I/O operations reduces the need of
large buffers and reduces the possibility of buffer overflows.

        Non-preemptive multitasking               Preemptive multitasking

         Task 1              Task 2              Task 1              Task 2

         Event               Event               Event               Event           interleaving
                                                                                     running tasks
        Producer                                Producer                             enables to have
                                                                                     much sorter
                                                                                     buffers without
                                                Producer                             an overflow

                           Consumer                                Consumer



Challenges of preemptive multitasking
         limited memory (i.e. 4kB on MICA)
         WSN node lifetime associated with energy – the scheduler should be able to save energy by
          entering sleep mode of a processor

MANTIS – further details
The tasks can have one of the following priorities:

    1.    KERNEL
    2.    SLEEP
    3.    HIGH
    4.    NORMAL
    5.    IDLE
The context switch may take approximately 10µs.

The tasks may be suspended when waiting for resources using simple API calls (such as
mos_task_suspend, mos_task_resume). This puts them on a sleep queue – they are waken up, when
the resource is ready.

Thread table
The thread table is the main kernel data structure. It is statically allocated – designer designates
running tasks at compile-time. It is implemented as a linked list, with additional pointer to current
running task.

The kernel is only triggered by the timer interrupt, which is approx. 10ms by default.

Idle mode
There is an idle thread of lowest priority, which implements power-aware scheduling.

COMM components in MANTIS
Communication API is accessed by MAC (Medium Access Control) protocol. It abstracts the hardware,
giving the programmer the following feasibilities:

       unified interface for UART, USB and radio devices
       management of packet buffers, and synchronization functions
       operates mainly on four functions:
            o com_send
            o com_recv
            o com_mode
            o com_ioctl

com_send is a blocking call, which means, that the calling thread is suspended until completion.

com_recv is also blocking – it waits until a buffer is filled with valid data by underlying hardware and
operating system routines.

Device drivers (DEV)
Device drivers are implemented POSIX-style. This outline different types of sensors found on target
boards (acceleration, temperature, light, humidity, etc.)

To top