Docstoc

IJETTCS-2013-06-25-160

Document Sample
IJETTCS-2013-06-25-160 Powered By Docstoc
					    International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


        A Survey on Content Addressable Memory
                      Narendra Babu T1, Dr. Fazal Noorbasha2 and Bhavani Shankar P3
                1
                 Research Scholar, VLSI Systems Research Group, Electronics & Communication Engineering,
                                             KL University, Guntur, A.P, India
                      2
                       VLSI Systems Research Group Head, Electronics & Communication Engineering,
                                            KL University, Guntur, A.P, India


                                   3
                                    M.Tech Student, Electronics & Computer Engineering,
                                            KL University, Guntur, A.P, India


Abstract: A content addressable memory compares input
search data against a table of stored data and returns the        1.1 Basics of CAM
address of the matching data. CAMs have a single clock cycle
throughput making them faster than other hardware and
                                                                  We now take a more detailed look at CAM architecture.
software based search systems. Basically, these CAMs are          A small model is shown in figure 1. The figure 1 shows
used for packet forwarding in network routers. The CAM has        CAM consisting of 4 words, with each word containing 3
a parallel active circuitry which consumes more power and         bits arranged horizontally (corresponding to 3 CAM
the main challenge in designing the CAM is to reduce the          cells). There is a match-line corresponding to each word
power consumption without reducing the speed and memory           (ML0, ML1, etc.) feeding into match line sense amplifiers
density. In this paper, the circuit level techniques of CAM are   (MLSAs), and there is a differential search line pair
reviewed. At the circuit level, low power match line sensing      corresponding to each bit of the search word (SL0, SL0̅,
techniques and search line driving approaches are                 SL1, SL1̅, etc.). CAM search operation begins with
concentrated.                                                     loading the search-data word into the search-data
                                                                  registers followed by precharging all match lines high,
Keywords: Content-addressable memory (CAM), match                 putting them all temporarily in the match state. Next, the
line sensing, review, search line power.                          search line drivers broadcast the search word onto the
                                                                  differential search lines, and each CAM core cell
1. INTRODUCTION                                                   compares its stored bit against the bit on its
                                                                  corresponding search lines. Match lines on which all bits
Most of the memory devices store and retrieve data by             match remain in the precharged-high state. Match lines
addressing specific memory locations. This path becomes           that have at least one bit that misses, discharge to ground.
the limiting factor for those systems that depend on fast         The MLSA then detects whether its match line has a
memory access. The time required to find the data stored          matching condition or miss condition. Finally, the
in memory can be reduced if the data can be identified by         encoder maps the match line of the matching location to
its content rather than by its address. A memory used for         its encoded address [1].
this purpose is Content Addressable Memory (CAM).
CAM is used in applications where search time is very
critical and very short. It is well suited for several
functions like Ethernet address lookup, data compression,
and security or encryption information on a packet-by-
packet basis for high performance data switches. It can
also be operated as a data parallel or Single
Instruction/Multiple Data (SIMD) processor. Since CAM
is an extension of RAM first, we have to know the RAM
features to understand CAM. In general RAM has two
                                                                             Figure 1 Simple schematic of a CAM
operations read and write i.e. the data stored in RAM can
                                                                     CAM core cells and match line structures of CAM are
be read or written but CAM has three operations read,
                                                                  discussed in section 2 and 3. Match line sensing schemes
write and compare [1]. The compare operation of CAM
                                                                  and search line driving approaches are reviewed in
makes it useful in variety of applications like network
                                                                  section 4 and 5. And the conclusion is given at the end.
routers. The network router is that which forwards the
incoming packets from the sender port to the proper
destination port by looking in to its routing table.
Basically CAMs are used to design network routers for             2. CORE CELLS
fast transfer or forwarding of packets.                           Basically, CAM can be implemented using two cells
                                                                  namely NOR cell and NAND cell.
Volume 2, Issue 3 May – June 2013                                                                                  Page 360
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


                                                              MLn+1 nodes are joined to form a word. A serial nMOS
2.1 NOR Cell                                                  chain of all the Mi transistors resembles the pull down
Figure 2 shows a NOR type CAM cell. The NOR cell              path of a CMOS NAND logic gate. A match condition for
implements the comparison between the complementary           the entire word occurs only if every cell in a word is in
stored bit, D (and D), and the complementary search data
                    ̅                                         the match condition. An important property of the NOR
on the complementary search line, SL (and SL), using
                                                   ̅          cell is that it provides a full rail voltage at the gates of all
four comparison transistors, M1 through M4, which are         comparison transistors. On the other hand, a deficiency of
all typically minimum-size to maintain high cell density.     the NAND cell is that it provides only a reduced logic “1”
These transistors implement the pull down path of a           voltage at node B, which can reach only VDD-Vtn when
dynamic XNOR logic gate with inputs SL and D. Each            the search lines are driven to VDD (where VDD is the
pair of transistors, M1/M3 and M2/M4, forms a pull            supply voltage and Vtn is the nMOS threshold voltage)
down path from the match line, ML, such that a                [1].
mismatch of SL and D activates least one of the pull
down paths, connecting ML to ground. A match of SL
and D disables both pull down paths, disconnecting ML
from ground. The NOR nature of this cell becomes clear
when multiple cells are connected in parallel to form a
CAM word by shorting the ML of each cell to the ML of
adjacent cells. The pull down paths connect in parallel
resembling the pull down path of a CMOS NOR logic
gate. There is a match condition on a given ML only if
every individual cell in the word has a match [1].

                                                                           Figure 3 9-T NAND Type CAM


                                                              3. MATCH LINE STRUCTURES
                                                              Basically, match line is one of the key structures in
                                                              CAMs. The NOR cell and NAND cell are used to
                                                              construct a CAM match line.


                                                              3.1 NOR Match Line
              Figure 2 10-T NOR type CAM                      The schematic form of NOR match line is shown in
                                                              figure 4. The NOR cells are connected in parallel to form
                                                              a NOR match line. A typical NOR search cycle operates
2.2 NAND Cell                                                 in three phases: search line precharge, match line
                                                              precharge, and match line evaluation. First, the
Figure 3 shows a NAND type CAM cell. The NAND cell
                                                              searchlines are precharged low to disconnect the match
implements the comparison between the stored bit, D, and
                                                              lines from ground by disabling the pulldown paths in
corresponding search data on the corresponding search
                                                              each CAM cell. Second, with the pulldown paths
               ̅
lines, (SL, SL), using the three comparison transistors
                                                              disconnected, the Mpre transistor precharges the match
M1, MD and MD which are all typically minimum-size to
                 ̅
                                                              lines high. Finally, the search lines are driven to the
maintain high cell density. We illustrate the bit-
                                                              search word values, triggering the match line evaluation
comparison operation of a NAND cell through an
                                                              phase. In the case of a match, the ML voltage, VML,
example. Consider the case of a match when SL=1 and
                                                              stays high as there is no discharge path to ground. In the
D=1. Pass transistor is ON and passes the logic “1” on the
                                                              case of a miss, there is at least one path to ground that
SL to node B. Node B is the bit-match node which is
                                                              discharges the match line. The match line sense amplifier
logic “1” if there is a match in the cell. The logic “1” on
                                                              (MLSA) senses the voltage on ML, and generates a
node B turns ON transistor M1. Note that M1 is also
                                                              corresponding full-rail output match result. The main
turned ON in the other match case when SL = 0 and D =
                                                              feature of the NOR match line is its high speed of
0. In this case, the transistor MD passes logic high to
                                                              operation. In the slowest case of a one-bit miss in a word,
raise node B. The remaining cases, where SL = D, result
                                                              the critical evaluation path is through the two series
in a miss condition, and accordingly node B is logic “0”
                                                              transistors in the cell that form the pulldown path. Even
and the transistor M1 is OFF. Node B is a pass-transistor
                                                              in this worst case, NOR-cell evaluation is faster than the
implementation of the XNOR function. The NAND
                                                              NAND case, where between 8 and 16 transistors form the
nature of this cell becomes clear when multiple NAND
                                                              evaluation path [1].
cells are serially connected. In this case, the MLn and
Volume 2, Issue 3 May – June 2013                                                                                 Page 361
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856




  Figure 4 Structure of a NOR match line with n cells.
                                                                    Figure 5 Structure of a NAND match line.

3.2 NAND Match Line                                          4. MATCH LINE SENSING SCHEMES
Figure 5 shows the NAND match line. A number of n            There are different match line sensing schemes which are
cells are cascaded to form the NOR match line. The           used for generating the match result.In this section we
precharge pMOS transistor, Mpre, sets the initial voltage    reviewed the sensing schemes which reduces the power
of the match line, ML, to the supply voltage, VDD. Next,     consumption of CAMs.
the evaluation nMOS transistor, Meval, turns ON. In the
case of a match, all nMOS transistors, M1through Mn are
                                                             4.1 Segmented Match Line Scheme
ON, effectively creating a path to ground from the ML
node, hence discharging ML to ground. In the case of a       Figure 6 shows the proposed segmented match line
miss, at least one of the series nMOS transistors, M1        architecture. In SMA, match lines in TCAM words are
through Mn, is OFF, leaving the ML voltage high. A           partitioned into four segments. For convenience, the
sense amplifier, MLSA, detects the difference between the     segmented match lines are sequentially numbered from
match (low) voltage and the miss (high) voltage. The         the left of figure. Segments 1 and 2 are referred to as left
NAND match line has an explicit evaluation transistor,       segments and the other two segments are called right
Meval, unlike the NOR match line, where the CAM cells        segments. The terms segment and partition are
themselves perform the evaluation. There is a potential      interchangeably used. Each segment implements the
charge-sharing problem in the NAND matchline. Charge         traditional NOR match line circuit with inputs where is
sharing can occur between the ML node and the                the segment number. NOR inputs in each segment are
intermediate MLi nodes. This charge sharing may cause        represented as D0, D1, . . , DNI(k)-1. Any NOR inputs can
the ML node voltage to drop sufficiently low such that the    drain the charge stored in a segmented match line when it
MLSA detects a false match. A technique that eliminates      turns on its associated nMOS transistor connected to
charge sharing is to precharge high, in addition toML,       ground. The numbers of NOR inputs in the four segments
the intermediate match nodes ML1 through MLn-1 . This        are not necessarily the same and can be selected to meet
procedure eliminates charge sharing, since the               power reduction requirements. The four segments are
intermediate match nodes and the ML node are initially       classified from a pre-charging perspective. One segment
shorted. However, there is an increase in the power          type is the pre-charged segment and the other type is the
consumption due to the search line precharge. A feature      charge-shared segment. The match lines in pre-charged
of the NAND match line is that a miss stops signal           segments are charged before match-line evaluation.
propagation such that there is no consumption of power       Match lines of charge-shared segments are never pre-
past the final matching transistor in the serial nMOS         charged but share the chargeswith charged segments
chain. Typically, only one match line is in the match        formatch-line evaluation. In SMA, segments 1 and 4 are
state, consequently most matchlines have only a small        designated as charged segments and the others belong to
number of transistors in the chain that are ON and thus      charge-shared segments. A global signal, segment pre-
only a small amount of power is consumed. Two                charge (SP) is used to source currents only to pre-charged
drawbacks of the NAND match line are a quadratic delay       segments for a match-line evaluation. However, as in the
dependence on the number of cells, and a low noise           previous research [2], [3] regarding conditional pre-
margin. The quadratic delay-dependence comes from the        charging, the SP signal in charged segments can be
fact that adding a NAND cell to a NAND matchline adds        generated from various sources such as partial
both a series resistance due to the series nMOS transistor   comparison results [3]. The circuit associated with the SP
and a capacitance to ground due to the nMOS diffusion        signals is referred to as the segment pre-charging circuit
capacitance. These elements form an RC ladder structure      (SPC). The two segments, pre-charged and charge-shared
whose overall time constant has a quadratic dependence       segments in left or right segments, are electrically
on the number of NAND cells. The low noise margin is         separated by a pass gate, which is referred to as the
caused by the use of nMOS pass transistors for the           charge spread circuit (CSC). CSC is enabled using a
comparison circuitry. NOR cells avoid this problem by        global signal, charge spread (CS). When CSC is enabled
applying maximum gate voltage to all CAM cell                (CS=1), the charges in left or right segments are shared
transistors when conducting [1].                             as a part of match-line evaluation. The match sensor
                                                             block is located between the left and right segments. It
Volume 2, Issue 3 May – June 2013                                                                             Page 362
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


processes the partial (half) comparison results to produce   to meet the sensing voltage boundary. Please note that Cn
the final comparison result for a full TCAM word. The         is proportional to NI(k).
final results from all words fed into the priority encoder
                                                             The charge sharing method was previously applied to
block (not shown) produced a hit or miss result [6].
                                                             CAM to reduce the voltage swing for match lines without
                                                             externally created pre-charged voltage [4], [5]. A low-
                                                             swing match line scheme from [5] is shown in figure 7. A
                                                             tank capacitor is associated with each match line to make
                                                             a lower voltage. The charge at the tank capacitor is used
                                                             to charge the match line with low voltage by driving the
                                                             signal “eval” to the logic high. A similar approach was
                                                             used in [5].

      Figure 6 Segmented Match Line Architecture.


4.1.1 Pre-Charging
The entire match line is pre-charged during every search
operation in the traditional NOR match-line structure,
whenever a word under comparison does not match the
comparison signal. However, SMA segments the total
match line capacitor and pre-charges the subset of the                   Figure 7 Low-Swing Scheme in [5].
match lines. As a result, SMA reduces the power
consumption in match lines by reducing the total             The tank capacitor is pre-charged to VDD and shared
capacitance seen by a power source. SMA is similar to the    with a match line to create a low-voltage swing at each
conditional pre-charging methods in that only a subset of    match line by choosing the size of the tank capacitor. The
the match line is pre-charged in the first phase. The         technique requires an additional tank capacitor. SMA
conditional pre-charging scheme, however, needs to pre-      achieves the same by pre-changing the pre-charged
charge the remaining match lines depending on the first-      segments without additional capacitors. SMA is not
stage results. However, SMA performs charge sharing          limited to a voltage swing and has the flexibility to create
instead of the second pre-charging, and thus does not        an arbitrary voltage at each match line by choosing the
draw current from the power source. The pre-charging         number of NOR circuits in the pre-charged segments
time is also reduced because of smaller RC time constant.    without creating externally generated reference voltages.
The total charging time is further reduced because the       The case selected Ctank to make the match line voltage
two charged segments 1 and 4 are charged at the same         swing of VDD/2and will be referred to as low-swing
time [6].                                                    VDD/2 (LS-VDD/2). Once the static voltages at each
                                                             match line are sensed, the charge stored in all segments is
4.1.2 Charge Sharing                                         re-cycled for subsequent search operations if the word
The charge stored in the nth charged segment originates      comparison result is a match. The re-cycled charge is
from either pre-charging or charge sharing. The charge       then accumulated with the shared charge from pre-
size is referred to as Qnwhere n is the segment number.      charging. The charge shared voltages approach the rail
Q1 and Q4 reach their maximum values when the                voltage, VDD when a word continuously produces the
respective match-line segments, 1and 4, are pre-charged.     match result. The charge shared voltage has minimum
The charge in the charged segments is then shared with       and maximum values, which are referred to as Vm-max,
other match line partitions when CSC is enabled. The         and Vm-min, respectively. The voltage boundary is
static match voltage at left segments, Vlf is established    formulated as
after charge sharing. The voltage is determined by the
charge conservation rule as shown in (1). The capacitance
of the nth segment is represented as Cn.                                                                            (2)
                                                  (1)

The voltage at the right segments Vrf can be similarly       4.1.3 Evaluation
calculated. The two voltages (Vlf, Vrf) do not have to be
the same. The static match voltages are not typically the    After the pre-charging operation, the partial evaluation
rail voltage. It is, therefore, important for the static     results for the four segments are merged to determine the
voltages to meet the minimum sensing voltage of the          final match result. The process is called merging
match sensor block. NI(1) and NI(4) determine the            segmented match lines (MSMs). The MSM phase can be
minimum static voltage and should be carefully selected      broken down into three sub operations. In the first
                                                             operation, the charge is shared through CSC. The first

Volume 2, Issue 3 May – June 2013                                                                            Page 363
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


operation is performed simultaneously in left and right     tell if the word is “match” or “mismatch,” and then
segments. In the second operation, the NOR operation is     automatically disables the charge path to save the power.
performed at each segment by enabling the NOR inputs,       Notably, a reset signal SEARCH_EN will set the DMLSA
which are generated from the TCAM cell. The first two        into an initial state, where ML(i) = SML(i) = 0 and SP =
operations do not have to be sequentially performed. The    0 before the searching process. The detailed operation of
charge-sharing operation is similar to the NAND             DMLSA in the searching process is described below.
operation in the CAM NAND match line structure. Since
each segment is a NOR-type match line structure, NAND
and NOR operations are performed simultaneously when
the first two operations are enabled at the same time.
Since MSM is performed after pre-charging, there is no
dc path from power source to ground. Once the partial
comparison results from the left and right segments are
generated in the first two operations, the results are
combined by the match sensor block in the third
operation. The third operation can not be performed at
the same time as the first two operations. The match
sensor circuit is physically located between the left and                 Figure 9 Schematic of DMLSA
right segments. The main reason for having it in the
center is to speed up the sensing operation by evaluating
the segments at the left and right segments in parallel.    4.2.1.1 Mismatch
The match sensor block produces a logical value of 1 only
                                                            Before the searching process SP = 0, SEARCH =
when the voltages, Vlf and Vrf, are above the minimum
                                                            SEARCH_EN is pulled to high at the beginning of the
final voltages [6].
                                                            searching process. Then, MN1 is turned on to charge the
                                                            ML(i) such that KP will be discharged but not totally
                                                            pulled down to 0. If there is any “mismatch” CAM cell,
4.2 Self Disabled Sensing Technique                         MSi is turned on to make a current path between ML(i)
The self-disabled sensing technique can choke the charge    and SML(i) such that SML(i) will be charged by ML(i).
current fed into the ML right after the matching            When the voltage of SML(i) is high enough to turn off
comparison is generated. Figure 8 shows the CAM             MP3, the voltage of KP will be pulled down such that
architecture, where block C and block DMLSA denote the      MATCHB is equal to logic 1, indicating the comparison
CAM cell and differential MLSAs (DMLSAs)                    result of the word = “mismatch”. By two feedback paths,
respectively. The prototype CAM is 128 words×32 bits.       MATCHB turns MN3 on and MP1 off, respectively, such
The Search Word Register loads the search key and feeds     that the current path of MP1 is shut off to choke the
it into all the CAM cells. Each of the DMLSA charges        charge current of ML(i) and SP is discharged via MN3 to
the ML and senses the voltage variation to generate the     turn off MN1. The former constitutes a positive loop from
match signal, which is sent to the Address Encoder. In      MATCHB to KP through MN3 and MN2, which more
general, there is only one word or no match with the        quickly pulls down KP. Therefore, the power
search key to enable the Address Encoder to generate the    consumption is reduced after the searching process [7].
corresponding address code or a no-match signal after the
                                                            4.2.1.2 Match
searching process [7].
                                                            If all of the CAM cells are “match,” ML(i) and SML(i)
                                                            are isolated without any current path. The voltage
                                                            difference between ML(i) and SML(i) creates an output
                                                            current of the differential pair (MP2 and MP3) to charge
                                                            the KP and SP. As soon as KP is charged to high,
                                                            MATCHB becomes logic 0, indicating that the
                                                            comparison is a “match”. After the SP is raised to high,
                                                            SEARCH will equal to logic 0 and turn off MN1 to choke
                                                            the charge current to ML(i) [7].
                                                            4.3   Parity Bit And Power-Gated Match Line Sensing
Figure 8 Architecture of the Self disabled sensing CAM.
                                                            4.3.1 Parity Bit Based CAM
4.2.1 DMLSA                                                 The parity bit based CAM design is shown in figure 10
Figure 9 shows the DMLSA schematic diagram. The             consisting of the original data segment and an extra one-
DMLSA senses the voltage on the ML(i) and SML(i) to         bit segment, derived from the actual data bits. We only
                                                            obtain the parity bit, i.e., odd or even number of “1”s. The
Volume 2, Issue 3 May – June 2013                                                                            Page 364
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


obtained parity bit is placed directly to the corresponding     EN.At this time, signal EN is set to lowand the power
word and ML. Thus the new architecture has the same             transistor Px is turned OFF. This will make the signal
interface as the conventional CAM with one extra bit.           ML and C1 initialized to ground and VDD, respectively.
During the search operation, there is only one single stage     After that, signal EN turns HIGH and initiates the
as in conventional CAM. Hence, the use of these parity          COMPARE phase. If one or more mismatches happen in
bits does not improve the power performance. However,           the CAM cells, the ML will be charged up. Interestingly,
this additional parity bit, in theory, reduces the sensing      all the cells of a row will share the limited current offered
delay and boosts the driving strength of the 1-mismatch         by the transistor Px, despite whatever number of
case (which is the worst case) by half. In the case of a        mismatches. When the voltage of the ML reaches the
matched in the data segment (e.g., ML3), the parity bits        threshold voltage of transistor M8 (i.e., Vth8), voltage at
of the search and the stored word is the same, thus the         node C1 will be pulled down. After a certain but very
overall word returns a match. When 1 mismatch occurs in         minor delay, the NAND2 gate will be toggled and thus
the data segment (e.g., ML2), numbers of “1”s in the            the power transistor Px is turned off again. As a result,
stored and search word must be different by 1. As a result,     the ML is not fully charged to VDD, but limited to some
the corresponding parity bits are different.Therefore now       voltage slightly above the threshold voltage of M8, Vth8
we have two mismatches (one from the parity bit and one         [8].
from the data bits). If there are two mismatches in the
data segment (e.g., ML0, ML1 or ML4), the parity bits
are the same and overall we have two mismatches. With
more mismatches, we can ignore these cases as they are
not crucial cases. The sense amplifiers now only have to
identify between the 2-mismatch cases and the matched
cases. Since the driving capability of the 2-mismatch
word is twice as strong as that of the 1-mismatch word,
the proposed design greatly improves the search speed
and the Ion/Ioff ratio of the design [8].




         Figure 10 Parity bit based CAM.

4.3.2 Gated Power Match Line Sense Amplifier
The CAM architecture is shown in figure 11. The CAM
cells are organized into rows (word) and columns (bit).          Figure 11 (a) CAM Architecture. (b) Each cell powered
Each cell has the same number of transistors as the                              by two different rails.
conventional P-type NOR CAM and use a similar ML
structure. However, the “COMPARISON” unit, i.e.,
transistors M1-M4, and the “SRAM” unit, i.e., the cross-
coupled inverters, are powered by two separate metal            5. SEARCH LINE DRIVING APPROACHES
rails, namely VDDMLand the VDD, respectively. The               The power consumption of search line mainly depends on
VDDML is independently controlled by a power                    Match line.Some of the search line approaches are
transistor (Px) and a feedback loop that can auto turn-off      reviewed in this section.
the ML current to save power. The purpose of having two
separate power rails of (VDD and VDDML) is to                   5.1 Pipelined Search Line Driving
completely isolate the SRAM cell from any possibility of        To distribute the incoming search word to all the CAM
power disturbances during “COMPARE” cycle. The                  cells at the same time, the SLs span across the entire
gated-power transistor Px, is controlled by a feedback          memory array. Inside the core cell, the SLs drive the bit
loop, denoted as “Power Control” which will                     comparison network to compare the incoming and the
automatically turn off Px once the voltage on the ML            stored bits. Hence, the SL capacitance consists of the
reaches a certain threshold. At the beginning of each           metal-line parasitic capacitance and the parasitic
cycle, the ML is first initialized by a global control signal   capacitance due to the cells, amounting to high values in
Volume 2, Issue 3 May – June 2013                                                                                Page 365
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


large CAM arrays. The SLs broadcast a new search word           is equal to N clock cycles. In addition to the latency, the
every cycle. Assuming random search data, 50% of the            pipelined architecture has some area penalty associated
SLs switch state from cycle to cycle. Therefore, high           with it. The SL registers between the pipeline stages
switching activity of the highly capacitive SLs causes          require extra chip area, compared to the non-pipelined
high energy consumption. One way to reduce the SL               CAM. However, if the size of the pipeline stage is
power is to break the SLs into smaller segments and             sufficiently large with respect to the area occupied by the
deactivate as many segments as possible each clock cycle        pipeline registers, then the area overhead due to the SL
[6], [9]. The pipelined SL driving technique achieves this      registers becomes insignificant.Reducing the number of
effect by dividing the SL into pipeline stages. Figure 12       active stages in the pipelined SL architecture effectively
presents the concept of the pipelined SL architecture. The      decreases the average SL capacitance that is switched
SL registers divide the memory array and the SLs into the       every clock cycle. This, in turn, reduces the SL power
pipeline stages in the direction of the search word             dissipation by the fraction of the idle stages. Moreover,
broadcast. For simplicity, the figure illustrates only two       the proposed CAM structure also reduces the ML energy
pipeline stages, while in general the number of stages can      since the match sensing is disabled in the idle pipeline
be larger. The SL register holds the search word being          stages. The actual savings depend on the distribution of
broadcast into the pipeline stage in the current clock          the matches across the pipeline stages. Partitioning the
cycle. A dynamic NOR gate monitors the outcome of the           memory array into N equally sized pipeline stages results
word matching in the entire stage and generates the eni         in the average activity of only(N+1)/2N stages. Hence, the
signal for the following stage. If none of the stored words     total energy consumption of a CAM with W words and M
matches the search word in stage i , then enable i+1 is         cells per word can be expressed as
high, and the SL register of stage i+1 latches this search
word in the next clock cycle. If there is a match in stage i,                                                     (4)
then enable i+1 is low, and hence stage i+1 remains idle
                                                                Where EREG is the energy dissipation due the pipeline
in the next clock cycle. For simplicity, we assume here
                                                                register (one flip-flop), and ECELL is the energy
that a search results in a single match in the entire CAM.
                                                                consumption per CAM cell. To find the optimal number
We now describe the operation of the pipelined SL
                                                                of stages that minimizes thetotal energy,we differentiate
structure through an example. Figure 13 illustrates the
                                                                this equation with respect to the number of stages, N, and
timing diagram of a CAM with four pipeline stages. In
                                                                equate the derivative to zero.
four consecutive cycles: cycle 1 through cycle 4, the
search initiates for four different words: A, B, C, and D.
The search starts from the first pipeline stage, i.e., stage                                                        (5)
0, and thus this stage is active every clock cycle. The
search for word A in cycle 1 results in a miss in stage 0,
and hence the search for the same word continues in the                                                           (6)
next clock cycle in stage1. A match in this stage stops
further search for word A, and thus idles stage 2 in the
subsequent cycle, thus saving power. The idle phase
(shaded in the figure) then propagates down the pipeline.
For higher power savings, it is desirable for the match to
take place as close as possible to the start of the pipeline,
as is the case with word B. In the worst case (word D), it
takes four clock cycles to complete a search. As a result,
pipelining introduces latency into the system, which is
equal to the number of the pipeline stages. The
throughput of the system remains unchanged, and the
search results are available every clock cycle: cycles 4
through 7 in this example. While the first pipeline stage
                                                                          Figure 12 Pipelined CAM Architecture.
is active every clock cycle, the activity of the remaining
stages is data dependent. With uniform distribution of the
matches across the pipeline stages, half of the last N-1
stages are inactive on average, where N is the total
number of stages. Therefore, on average, a fraction (fIDLE)
of the pipeline stages remains idle:
                                               (3)

If N>>1, then fIDLE approaches 50%. However, any                      Figure 13 Timing Diagram of Pipelined CAM.
power saving comes at the price of search latency, which

Volume 2, Issue 3 May – June 2013                                                                               Page 366
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


5.2 Hierarichal Search Lines                                   overhead of this extra circuitry is small [11].
Another method of saving searchline power is to shut off
some searchlines by using the hierarchical searchline
scheme [9], [10], [11], and [12]. The basic idea of
hierarchical searchlines is to exploit the fact that few
match lines survive the first segment of the pipelined
matchlines. With the conventional search line approach,
even though only a small number of match lines survive
the first segment, all search lines are still driven. Instead
of this, the hierarchical search line scheme divides the
search lines into a two-level hierarchy of global
searchlines (GSLs) and local searchlines (LSLs). Figure
14 shows a simplified hierarchical search line scheme,
where the match lines are pipelined into two segments,
and the search lines are divided into four LSLs per GSL.
In the figure 14, each LSL feeds only a single match line
(for simplicity), but the number of match lines per LSL          Figure 14 Hierarchal Search Line Structure [10], [11].
can be 64 to 256. The GSLs are active every cycle, but the
LSLs are active only when necessary. Activating LSLs is
necessary when at least one of the match lines fed by the
LSL is active. In many cases, an LSL will have no active       6. CONCLUSION
match lines in a given cycle; hence there is no need to        In this paper, CAM and its application and basics related
activate the LSL, saving power. Thus, the overall power        to it are introduced. Various cells of CAM mainly NOR
consumption on the searchlines is:                             cell and NAND cell and their operations are also
                                                 (7)           discussed. This discussion is extended to these cells
                                                               which are used to design a match line of CAM mainly the
Where CGSL is the GSL capacitance, CLSL is the LSL             power consumption of CAM due to match line sensing
capacitance (of all LSLs connected to a GSL) and is the        techniques and search line driving approaches which are
activity rate of the LSLs. CGSL primarily consists of          used to reduce the power consumption of CAM. In future
wiring capacitance, whereas CLSL consists of wiring            many techniques can be used to design Low power
capacitance and the gate capacitance of the SL inputs of       CAMs.
the CAM cells. The factor, which can be as low as 25% in
some cases, is determined by the search data and the data
stored in the CAM. The above equation determines how           References
much power is saved on the LSLs, but the cost of this          [1] Kostas Pagiamtzis, and Ali Sheikholeslami,
savings is the power dissipated by the GSLs. Thus, the             “Content-Addressable Memory (CAM) Circuits and
power dissipated by the GSLs must be sufficiently small             Architectures: A Tutorial and Survey”, IEEE Journal
so that overall searchline power is lower than that using          of Solid-State Circuits, Vol. 41, No. 3, March 2006.
the conventional approach. If wiring capacitance is small      [2] C. Zukowski And S. Wang, “Use Of Selective
compared to the parasitic transistor capacitance [12], then        Precharge For Low-Power Content-Addressable
the scheme saves power. However, as transistor                     Memories”, In Proc. IEEE Int. Symp. Circuits Syst.,
dimensions scale down, it is expected that wiring                  Jun. 9–12, 1997, PP. 1788–1791.
capacitance will increase relative to transistor parasitic     [3] C. Lin, J. Chang, and B. Liu, “A Low-Power
capacitance. In the situation where wiring capacitance is          Precomputation-Based Fully Parallel Content-
comparable or larger than the parasitic transistor                 Addressable Memory”, IEEE J. Solid-State Circuits,
capacitance, CGSL and CLSL will be similar in size,                Vol. 38, No. 4, PP. 654–662, Apr. 2003.
resulting in no power savings. In this case, small-swing       [4] Sanghyeon Baeg, “Low-Power Ternary Content-
signaling on the GSLs can reduce the power of theGSLs              Addressable Memory Design Using a Segmented
compared to that of the full-swing LSLs [10], [11]. This           Match Line”, IEEE Transactions on Circuits And
results in the modified searchline power                            Systems—I: Regular Papers, Vol. 55, No. 6, July
                                              (8)                  2008.
Where VLOW is the low-swing voltage on the GSLs                [5] M. Khellah and M. Elmasry, “Use Of Charge
(assuming an externally available power supply VLOW).              Sharing To Reduce Energy Consumption In Wide
This scheme requires an amplifier to convert the low-               Fain-In Gates,” in Proc. IEEE Int. Symp. Circuits
swing GSL signal to the full-swing signals on the LSLs.            Syst., 1998, PP. 9–12.
Fortunately, there is only a small number of these             [6] G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda,
amplifiers per searchline, so that the area and power               “200-Mhz/200-MSPS 3.2W At 1.5V Vdd, 9.4Mbits

Volume 2, Issue 3 May – June 2013                                                                                Page 367
   International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
       Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 2, Issue 3, May – June 2013                                             ISSN 2278-6856


    Ternary CAM With New Charge Injection Match
    Detect Circuits And Bank Selection Scheme,” in
    Proc. IEEE Custom Integr. Circuits Conf. (CICC),
    2003, PP. 387–390.
[7] Chua-Chin Wang, Chia-Hao Hsu, Chi-Chun Huang,
    and Jun-Han Wu, “A Self-Disabled Sensing
    Technique for Content-Addressable Memories”,
    IEEE Transactions on Circuits And Systems—II:
    Express Briefs, Vol. 57, No. 1, January 2010.
[8] Anh-Tuan Do, Shoushun Chen, Zhi-Hui Kong, and
    Kiat Seng Yeo, “A High Speed Low Power CAM
    With a Parity Bit And Power-Gated ML Sensing”,
    IEEE Transactions on Very Large Scale Integration
    (VLSI) Systems, Vol. 21, No. 1, January 2013.
[9] K. Pagiamtzis and A. Sheikholeslami, “A Low-Power
    Content-Addressable Memory (CAM) Using
    Pipelined Hierarchical Search Scheme,” IEEE J.
    Solid-State Circuits, Vol. 39, No. 9, pp. 1512–1519,
    Sep. 2004.
[10]H. Noda, K. Inoue, M. Kuroiwa, F. Igaue, K.
    Yamamoto, H. J. Mattausch, T. Koide, A. Amo, A.
    Hachisuka, S. Soeda, I. Hayashi, F. Morishita, K.
    Dosaka, K. Arimoto, K. Fujishima, K. Anami, and T.
    Yoshihara, “A Cost-Efficient High-Performance
    Dynamic TCAM with Pipelined Hierarchical Search
    And Shift Redundancy Architecture,” IEEE J. Solid-
    State Circuits, Vol. 40, No. 1, pp. 245–253, Jan.
    2005.
[11]K. Pagiamtzis and A. Sheikholeslami, “Pipelined
    Match-Lines And Hierarchical Search-Lines For
    Low-Power Content-Addressable Memories,” in
    Proc. IEEE Custom Integrated Circuits Conf.
    (CICC), 2003, pp.383–386.
[12]H. Noda, K. Inoue, M.Kuroiwa, A. Amo, A.
    Hachisuka, H. J. Mattausch, T. Koide, S. Soeda, K.
    Dosaka, and K. Arimoto, “A 143 MHz 1.1W4.5 Mb
    Dynamic TCAM with Hierarchical Searching and
    Shift Redundancy architecture,” in IEEE Int. Solid-
    State Circuits Conf. (ISSCC) Dig. Tech.Papers,
    2004, pp. 208–209.




Volume 2, Issue 3 May – June 2013                                                   Page 368

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:7/26/2013
language:English
pages:9