Docstoc

optical parallel

Document Sample
optical parallel Powered By Docstoc
					A guided wave approach to plane-to-plane optical
   interconnects for multistage networks and
           multiprocessor computers


      2D folded perfect
      shuffle permutation                                Multistage hypercube computer


   (2)



                                 Fiber module


                              input             output
                                                               Processor arrays




          Alvaro Cassinelli*, Makoto Naruse*,** and Masatoshi Ishikawa*
                               Univ. of Tokyo*, CRL**
Plan of the presentation

    I. Multistage architecture for optical parallel computers
        Reconfigurable multi-stage architecture
        Hypercube and omega network examples.




    II. Optical fiber-based interconnection module.
        Why guided optics?
        Module decomposition


    III. Prototype fabrication and test
        4x4 exchange prototype..
        Transmittance, alignment tolerances


    IV. Conclusion.
            Present and future research directions
    I. Multistage architectures for optical parallel computers

Hybrid optoelectronic

       Data
       flow
                        Interconnection
                             module
                                                   Interconnection
                                                        module
                                                                         Interconnection
                                                                              module
                                                                                            …
                            Photo-                    VCSEL array
                            detector
                            array          Elementary
                                           Processor Array



  All optical

                Data
                flow    Interconnection/     Interconnection/        …   Interconnection/
                         switch module        switch module               switch module
I.1 Reconfigurable multi-stage architecture: principle
AC:
We will concentrate on network-based parallel computers (or “direct-connection machines”) rather
than on shared memory model (PRAM) as an efficient way to implement parallel computer
architectures. This choice is dictated by the fact that dealing with read/write conflicts in PRAM
    Optical technology offers enhanced parallel communication primitives
machines is more related with control and routing, and we are primarily interested in topology and
communication primitives from a hardware point of view in our research –enhanced communication
primitives is what optical technology offers.

     …of great benefit for network-based parallel computers                                    = distributed memory
                                                                                                shared memory



                                          Static             Dynamic                           Reconfigurable
                             Pn                                                                interconnection
     P1
                                                                              controller
                                                                                                 (X, Y or Z).

      Z                                                                  P1




                                                                                       …

                                                                                            …
          Y                                                                                X
     P2                               control                            P2




                                                                                             …
                                                                                       …
              X                                                                            Y




                                                                                 …
                                                                         …




                                                                                                     …
                                    ULA
                                           mux




                                    Mem




                                                                                             …
                                                                                       …
      Fixed                                                              Pn                Z
      interconnection



                                                                                       …
      (X, Y, and Z)

                            …switches inside                …switches outside processors
                      processors (local control)            (local or global/external control possible)
I.2 :Dynamic architecture vs. static
AC
Actually, in our former research, we studied
single-stage dynamic interconnections with
global control of the switch, using spatial light
               In an n-degree static
modulators (OCULAR II).
                                                                              Technologically challenging
        topology, each processor                                              Non reusable architecture
           should have n distinct
        optoelectronic I/O ports…                                             Bad scalability




        processors    switches     interconnections
                                                                  Static networks can
                 P1                                               be redesigned as
                            …


                                  …




                                                                  single-stage                          Pn
                 P2                                                                 P1
                                                                  dynamic
                            …

                                  …
                 …

                        …




                                         …




                                                                  networks…
                                  …
                            …




                 Pn
                            …




                                                                                      P2
                      Feed-back loop




                                                      Optimal use of electronic, optoelectronic and optics
    …processors, switches and
    interconnections located in                       Scalability, hardware reusability in other topologies
              distinct modules
                                                      possible introduction of multiple stages…
I.3 The multi-stage paradigm
      Single-Stage       architecture can be “spanned” into                        Multi-Stages
                                                 Stage 1         Stage 2                        Stage m

                                           P1                     P1                       P1




                                                                                                       S&I-m
                                                      S&I-1




                                                                           S&I-2
                                           P2                    P2                        P2
                                                                                       …




                                            …




                                                                  …




                                                                                           …
                                            Pn                    Pn                       Pn




                                                                                   Benes        Clos
   Hypercube    Cube Cycle                            Delta

          [computing]
                        Tree                                  [computing & networking]
                                            Omega
   Mesh               De Bruijn                                                             Banyan
            Pyramid                                           Shuffle/exchange

                                  Simplicity (switches can be elemental 2x2 cross-bars)
          The cost of             Scalability / Reconfigurability for different topologies
       multiplying the
    processors is paid            Possibility of pipelining
              back as             Theoretical background: Multi-stage architectures have
                                  been studied for decades in networking applications
                        A optical architecture (connectivity)
I.2 The theoretically bestC :
The linear architecture may be “sub-optimal” (Ozatkas)
when addressing thermal dissipation issues, but offers PD and VCSEL
                               Processor
       easier “scalability”. Also, the “flat” optimal
much Photo-detector                            VCSEL      flip-chip bonded
                           Elements (PE)
architecture, will work well with reflective holograms, but to processor
            (PD) array                         array
                                     Array                      array
would be much difficult to build using fiber arrays.




     X     Optical (2D)
           Data flow                          connection                connection
                                                                                     …
                                                module                    module




       MOAn(X) = A(n).I(n)… A(k).I(k)… A(1).I(1) (X)
                                                                              Matrix
                                                                        representation of
                         Computation         Optical shuffle of         computations on
                         made on PE          data between PE              the Multistage
                         array               arrays                           Optical
                                                                           Architecture
a) Free-Space reconfigurable interconnections


                                         Optoelectronic
                                         processing module

                                                                        OCULAR-II
                          Elementary Processor Array

               Photo-detector array         VCSEL array




                     reconfigurable         reconfigurable                 reconfigurable
                    interconnection        interconnection                interconnection
                        module                  module                         module




 SLM-based
 reconfigurable                              Space-invariant interconnections – good/bad?
 interconnection                             Free-space – alignment issues?
                                             Multi-level CGH – good diffraction efficiency
                                             Reconfiguration (“switch”) freq. – 100 Hz…
b) Fixed interconnections (hybrid opto-electronic)

                                                                  OCULAR – III
   Fixed interconnection modules...
   Processor array in charge of the switching function…




       Data
       flow
                        Interconnection
                             module
                                                Interconnection   Interconnection   …
                                                    module             module

                            Photo-                  VCSEL
                            detector                array
                            array         Elementary
                                          Processor Array




      No lost of interconnection capacity if things are designed properly
      Some examples: shuffle/exchange networks, Clos and Benes crossbars, etc…
I.3 Two well known examples:
AC:
- Ring, Mesh, and Hypercube are all classes of k-dimensional
     [ computing ]
nearest-neighbor networks.             Indirect Binary Cube (“multistage hypercube”)
                               Binary
                               Hypercube…
-In the indirect binary hypercube network (as well as in the
Generalized Cube), we CAN NOT find the exchange
permutation E(k) at the end of stage k, we have to “wait” till
the end (the unshuffle is necessary…).

       Y X
- The FFT isZ algorithm that is easily embedded in a
            an                                        (2)                                  (3)           (4)          -1(4)
                                                                                                                  E(1)
hypercube Wtopology. Moreover, the shuffle-exchange “direct”
                                             P0= E(1)      E(1)                                    E(1)

binary hypercube architecture can be used to demonstrate a                                 feed-back
“pipelined” FFT algorithm very easily, because:
                       (Omega) network
                FFT=(IBnC)-1.,
                       E(1)          E(1)          E(1)          E(1)                         [ networking ]
                (4)          (4)          (4)          (4)
where E(1) has been replaced by W(1). Of course,
     0000                                    0000
                                                                               (IBnC)-1=
“Direct” Binary n-Cube or “generalized Cube network”.
     0001
     0010
                                             0001
                                             0010
         0011                                             Self routing: “switches” are set
                                                                        0011
-The Omega network is also very useful for primitives of
         0100
         0101
                                                          locally by packet address
                                                                        0100
                                                                        0101
parallel computing, like FFT algorithms!! Omega network is(destination – input)
                                                                               Output
         0110
 Input




                                                                        0110
         0111                                                           0111
NOT full connection, it is full access BUT blocking. A non
         1000                                                           1000
                                                           It is full access, but not full
blocking network (rearrangeable, no “strict-non blocking”
         1001                                                           1001
         1010                                                           1010
                                                           connection.
nor “wide sense non-blocking”) is the BENES network.
         1011
         1100
                                                                        1011
                                                                        1100
CLOS is another which is strict-non blocking (but the
         1101                                              Also useful on computing (FFT)…
                                                                        1101
         1110                                                           1110
network is not constructed using 2x2 cross-bar switches).
         1111                                                           1111
II. Optical fiber-based plane-to-plane interconnection modules


               (2)
                                                   Fiber module


                                                input             output




                      …an optical “3D optical wiring” module
                            between 2D VLSI arrays.
II.1 Fiber-based interconnection blocks for multistage architectures.

  • Inter-stage connection fixed and point-to-point: channels can be fibers.
  • Fibers have better efficiency and just like free-space optics, no cross-talk in 3D.
  • No space-invariance required.
  • Precise and robust alignment possible.
  • Theoretically more volume efficient than free-space equivalent!      “Volume-consumption comparisons of
                                                                        free-space and guided-wave optical
                                                                        interconnections”, Y.Li and J. Popelek,
                                                                        p.1815-1825, Appl.Opt. Vol 39, n.11, april
                                                                        2000.




  Prototype Fiber module
  (fibers and holders)                    • Maybe “hard” to build? Boring, but not a
                                          fundamentally difficult - can be automated…
          input            output
                                          • Alignment of both output and input needed…
                                          • Power dissipation may be a fundamental
                    (2)                  limitation, but we are far from these limits…

    “integrated”
       2D folded
  perfect shuffle                          …wave-guide arrays for fixed, point-to-point
    permutation                            and space variant interconnections are an
         module
                                           interesting alternative to free-space optics
A C “Decomposition” of the interconnect into modules
II.2 :
In group-theoretic-based construction of MINs
(giving symmetric networks), the most useful
    [ Problem ]
permutations are the exchange, the shuffle, the
butterfly, the bit reversal and the use simple, regular interconnections…
       Many multistage networks shift
permutation.


The nature of the decomposition of the interconnect
into EITHER the column, row or the diagonal
dimensions may also reintroduce the use of light-
efficient one dimensional, non pixilated, rapid
reconfigurable diffractive elements (such as acousto
        However, when folded in a plane, these may materialize as non-regular, non-
optics).
      scalable and non-reusable interconnection modules!


                    columns
               0     4   8    12   0   2   8    10


               1     5   9    13   1   3   9    11
        rows




               2     6   10   14   4   6   12   14


               3     7   11   15   5   7   13   15




                   Scan map        Fractal map
[Solution ]

     Because it may be possible to cascade fiber-based modules without too much
          loss of light power, let’s “break” these into simple to fold modules.

  “simple to fold” means:
     1) Simple to implement by stacking planer wave-guide structures

                                                                  Permutations are
                                                             decomposed “ad-hoc” into
                                                              their “row” and “column”
                                                               exclusive permutations
                                                             parts, plus some simple-to-
         vertical           horizontal      diagonal          fold “link” permutation…

     2) Or simple to implement using previously built modules (scalability)



                                                   Permutations are
                                               decomposed “recursively”
The idea is to define permutation “constructors” that correspond to basic building
              steps using PLC circuits (stacking, grouping modules).


 Permutation                                       Permutation
  layer Pn/2                                       module Pn


          Ln/2 Pn/2        Rn/2 Pn/2




                                                       Z Pn          Q Pn
          Vertical        Horizontal
                          replicator                  “zoom”       “quadrant”
         replicator
                                                    constructor    constructor



This decomposition methodology also applies to the switching stages (no other
                  thing that a set of possible permutations)
Let’s try that on the previous examples:

Indirect Binary n-Cube Network



                                                                                   …uses the butterfly (k)
                                                                                   and perfect shuffle (k)
                                                                                   permutations

                (2) E    (3)
                               E(1) (4) E(1)  (4)
                                               -1
     P0= E(1)         (1)
                          feed-back


                                                             (4)   E(1)   (4) E(1)   (4) E(1)   (4) E(1)
                                                      0000                                                     0000
                                                      0001                                                     0001
                                                      0010                                                     0010
                                                      0011                                                     0011
                                                      0100                                                     0100
      …uses only the




                                                                                                                      Output
                                                      0101                                                     0101
                                              Input



                                                      0110                                                     0110
      perfect shuffle (k)                            0111
                                                      1000
                                                                                                               0111
                                                                                                               1000
      permutation                                     1001
                                                      1010
                                                                                                               1001
                                                                                                               1010
                                                      1011                                                     1011
                                                      1100                                                     1100
                                                      1101                                                     1101
                                                      1110                                                     1110
                                                      1111                                                     1111


                                                                                        (Omega) network
Example: shuffle and butterfly decomposition

                                            Decomposition using constructors:
 shuffle n(k)
    {bn, … bk+1, bk, bk-1, … b2, b1}      “ad-hoc”

                 n(k)                      n(k) = Ln/2 n/2; Rn/2 n/2 ; L
   {bn, … bk+1, bk-1, bk-2, … b1, bk}



 butterfly n(k)                         “ad-hoc”
                                            n(k) = Rn/2 n Ln/2 n/2; n/2 ; L
      {bn, … bk+1, bk, bk-1, … b2, b1}

                   n(k)                 “recursive”

      {bn, … bk+1, b1, bk-1, … b2,bk}
                                            2p(2p) = Qp-1 T2 (1,2)



 …It is easy to see that the ad-hoc
 folding of a “regular” permutation              =                        ;
                                                              ;
        needs a maximum of three
 concatenated “stacked” modules
Folding the shuffle permutation

  If k  n/2, the shuffle “acts” over rows :              row(k)


                                          (k)= row(k)
                                                                    Can be built
                                                                    by stacking
  If k > n/2, the shuffle can be written as:                        “slices”


                       (k) = row(n/2) .col(k-n/2).L    col(2)

   - where col is a column shuffle,
  - and L is the “link” permutation.



                                   Link
  12   8   4=100
                   0


                   1=001


                   2


                   3
Folding a butterfly into a 4x4 array
AC:
REM: the modules can be built by stacking
layers,  n/2, the butterfly  technology rows :
   If k to that planar-optics (k) “acts” over               row(2)
used. In particular, we can think again about
fan-in and fan-out channels… (cf. NHK
company.
                                           (2) = row(2)

   If k > n/2, the butterfly (k) can be written as:

                        (k)= col (k-n/2).L.col(k-n/2)     col(2)


    - where col is a column butterfly,
   - and L exchanges row and column LSB



                                    Link
   12   8   4=100
                    0


                    1=001


                    2


                    3
…back to examples:  network




                       shuffle               shuffle            shuffle                     shuffle




                                                                                   row(2

    pair of PE implement                                                                              90º

                                                                                               
    elemental exchange
    switch                                                                    col(2

                                 Processor arrays
                                 (exchange switches and more)             L
I.3 Indirect Binary 4-Cube

        PE array 1   PE array 2   PE array 3                     PE array 4
        (exchange)   (exchange)   (exchange)                     (exchange)




                                       (2)    (3)                  (4)            -1(4)




                                                      Processor arrays
                                                      (exchange switches and more)
III. 4x4 prototype fiber module. Preliminary tests




            Two holder prototypes: Zirconium, SiO2
                       Pitch: 250±5 m
            Multimode graded index fibers: NA=0,21
                 (core 50m, cladding 126m)
                  Transmission loss: 3dB/km
                        Length: 30 cm
III.1: Preliminary tests on a 4x4 prototype module
AC
REM: the light coming from the non-addressed
  [ Interconnection pattern ]
channels is mainly due to some default    [ Transmittance (one channel) ]
functioning of the neighboring VCSELs which
                     (2)
emits LED light though they are OFF!          45

                                                             40
    Input                               38,45
   (VCSEL                    Output                          35
  854±4nm)                   (CCD)




                                         Transmittance (%)
                                                             30        LED                    LASER
                                                                      regime                  regime
                                                             25

                                                             20

                                                             15

                                                             10

                                                             5

                                                             0
                                                                  6   7        8    9         10       11   12   13
                                                                                        9,5
                                                                          VCSEL driving current (mA)




                                                Max. transmittance 38,45% for I=9,5 mA
AC:
III.2 Alignment tolerances (test performed on a single channel)
The differences on alignment tolerances are
probably due to the non-circular shape of
              (2
the VCSEL mode.          output      Power                                      Horizontal excursion
     input                                meter
                  )                                             0.25




                                                  exit power (mW)
                                                                    0.2


                                                                                              x
                                                                0.15
  VCSEL
  ON
                                                                    0.1

                                                                0.05
                            X,Y, and Z
                            translation                              0
                            stage
    VCSEL array                                                      -105 -90 -75 -60 -45 -30 -15 0    15 30 45 60 75

                                                                                         X (microns)



   No relay optics
          between
     VCSEL array
                                                                     Alignment tolerances
         and fiber                                                   (half peak power)
    module input

                                                                                      x  50 m
                                                                                      y  70 m
IV. Conclusion                  AC:
Multi-function modules: the use of optical fiber modules fits well
  [ Present research ]
with the all optical approach; for instance, one can imagine a module
with several different interconnection patterns, but also other
    Input/output like optical delay lines:
“optical-functions”alignment of modules
However, in all-optical networks the “switches” may be very fast
                • Microlenses, Fibers with round ends.
(electro optical devices, not MEMS), because the delay time for
avoiding the drop of ATM cells is ?? for a typical Gigabit network!!!
                • Modules built from fiber bundles.
               • Active alignment.


    Demonstrator architectures using smart pixel arrays (2x2 or 4x4 electronic switches)




                                             0
                                             1   Optical
                                             2
                                                 interconnection
                                             3
[ Future research directions ]



         Guided-wave interconnects can be “modulated” and integrated !




  Multi-interconnection modules
           • “Mixed” interconnections, and other optical functions
           • Circuit switching for all optical networks
           • Packet switching in a buffered architecture with globally controlled stages


  Integrated plane-to-plane multistage paradigm
           • using permutation “slices” for intra-chip massive, regular interconnections.
Multi-permutation module
AC:
Rem: Dynamic alignment is
     Interleaved permutations
tightly coupled with dynamic into the same module: multi-permutation/switch module
reconfiguration of the
interconnect.
Cf. Naruse’s presentation.           A small controlled mechanic or optical perturbation
                                    can produce a drastic change of the interconnection
                                                pattern from input to output.
                                    (…optical switches does not need to be “local” –i.e,2x2)




                                                            actuators           outputs

      Use of MEMS technology?
                                                   inputs




      “Normal” directional coupling between waveguides?
Transparent circuit switching by TDM interconnections

                                   control



                                                                        “all-optical” multistage
                                                                              architecture
                                                                      …optical switches does not
                                                                       need to be “local” (2x2)
          PE array




                     { (1) , i}      { (2), i}   { (3), i}
                     bi-module      bi-module      bi-module



    We are now building a demonstrator using mechanical displacement of modules
           containing a by-pass interconnection and cube interconnections




                                                                “spanned” hypercube with
                                                                  weak-communication
A new paradigm for packet switching in multistage networks

                           module control           module control           module control




                                                                                                                                                            output
                   input




                                                                             PE array
                           PE array




                                                    PE array
                                      { (1) , i}               { (2), i}                                                        { (3), i}
                                      bi-module                bi-module                                                      bi-module


           …globally controlled exchange stages + Intermediate buffers

                                                                                                                                            Selection method: alternate / Backpressure: on / mode: disablehop
                                                                                                                         1

                                                                                                                                                                                                                          4
                                                                                                                        0.9                                                                                               4




                                                                                    Normalized Throughput (bandwidth)
                                                                                                                        0.8
                                                                                                                                                                                                                          3

                                                                                                                        0.7




                                                                                                                                                                                                                              Length of buffers
                                                                                                                                                                                                                          3
                                                                                                                        0.6
                                                                                                                                                                                                                          2

                                                                                                                        0.5
                                                                                                                                                                                                                          2
                                                                                                                        0.4                                                                                               1
                                                                                                                                                                                                                          0
                                                                                                                        0.3
                                                                                                                                                                                                                          1
                                                                                                                        0.2

                                                                                                                                      64x64 Crossbar
                                                                                                                        0.1           64x64 MIN
                                                                                                                                      64x64 GS-MIN                                                                        0
                                                                                                                         0
                                                                                                                              0       0.1       0.2      0.3         0.4   0.5      0.6     0.7      0.8        0.9   1
                                                                                                                                                 Input request probability (per unit time)
Integrated multistage architecture?


                           waveguide “permutation slices”


                                                                    WG


                                                         Normal
                                                        coupling
                                                        photonic
                                                        structure



    - 3d IC integration of regular interconnected circuits

    - a nice application for photonic bandgap coupling structures