An Introduction to Parrot

Document Sample
An Introduction to Parrot Powered By Docstoc
					An Introduction
   to Parrot
     Dan Sugalski
    dan@sidhe.org




                    January 28,2004
     Overview




What‟s it all about
               Purpose

• Optimized for Dynamic Languages
• Perl 5, Python, Ruby specifically
• Run really, really fast
• Or at least as fast as reasonable under
  the circumstances
• Easily extendable
• Easily embeddable
• Play Zork
        History




How we got where we are
           OSCON 2000

• Infamous mug pitching incident
• Perl 6 started
• Language and software developed
  separately
    Perl 6 -- not too much bigger

•   That hasn‟t lasted
•   Allison‟s talking about that one
•   The start was smallish, though
•   Fix the annoyances
•   Amazing how many things turned out to
    be annoying
      Big language umbrella

• Not much semantic difference between
  Perl 5, Python, and Ruby
• Perl 6 was obviously going to borg them
  and a bit more
• Even ML and Haskell haven‟t been safe
• More concepts have gone in as time
  has progressed
        Parrot went for them all

•   Yeah, we were getting bored
•   Had to do something
•   We liked Ruby and even Python
•   We hated having multiple interpreters
    around
     Parrot and the Parrot Prank

•   2001 April Fools Joke
•   Perpetrated by Simon Cozens
•   Parrot -- New language
•   Perl & Python Amalgam
•   Pretty funny as these things go
               Timeline

• The project came first
• Then, the Parrot Joke
• We grabbed the name
            Non-Purpose

• Don‟t care about non-dynamic
  languages
• Not much, at least
• Other people can worry
• Engineering tradeoffs favor dynamic
  languages
      True language neutrality is
              impossible

•   Vicious sham
•   All engines have a bias
•   Even the hardware ones
•   Processors these days really like C
     Architecture




How it‟s supposed to
         look
             Buzzwords

• Register based, object-oriented,
  language agnostic, threaded, event-
  driven, async I/O capable virtual
  machine
• No, really
             Software goals

•   Fast
•   Safe
•   Extendable
•   Embeddable
•   Maintainable
       Administrative goals

• Resource Efficient
• Controllable
• Not suck when used as an apache
  module
• Cautious about whole-system impact
          Driving assumptions

•   C function calls are inexpensive
•   L1 & L2 caches are large
•   Memory bandwidth is limited
•   CPU pipeline flushes are expensive
•   Interpreter must be fast
•   JIT a bonus, not a given
       Interpreter Core in Pictures

           Frame                                   Frame
           Stack               User                Stack
                               Stack


           Integer registers             String registers

Lexicals               Interpreter Core                     Globals

            Float registers              PMC registers


                               Control
           Frame               Stack               Frame
           Stack                                   Stack
                  Parser

• Source goes in, AST comes out
• Built in part on perl 6 rules engine
• Pluggable parser architecture
  Compile and optimize (IMCC)

• Turns the output of the parser into
  executable code
• Optional optimizing step
• Register coloring algorithms provided
  here
                Execution

•   Interpreter
•   JIT
•   C code
•   Native executables
               Base Engine

•   Bytecode driven
•   Platform-neutral bytecode
•   Register-based system
•   Stacks
•   Continuation-passing style
                Bytecode

• Directly executable
• Resembles native executable format
  •   Code
  •   Constants
  •   Metadata
  •   No BSS, though
      Designed for efficiency

• Directly executable
• mmap()ped in
• Only complex constants (strings, PMCs)
  need fixup
• Converts on size and/or endian
  mismatch
           Platform Neutrality

•   If native format, used directly
•   Otherwise endian-swapped
•   Off-line utlity to convert
•   Only difference is speed hit on startup
                Registers

• All operations revolve around VM
  registers
• Essentially CPU registers
• Four types
  •   Integer
  •   Float
  •   String
  •   PMC
• 32 of each
              Registers

• Parrot‟s one RISC concession
• Non-load/store must operate on
  registers or constants
• JIT maps VM registers to platform
  registers if there are some
• Otherwise pure (and absolute) memory
  addressing to VM registers
                  Stacks

• Six stacks
• One general purpose typed stack
• Four register backing stacks
  • Push/pop half register frames in one go
  • Faster than push/pop of frames to general
    stack
• One control stack
                 Stacks

• Bit of a misnomer
• Really tree of stack frames
• Confusing, though
    Continuation Passing Style

• Used for calling conventions
• Parrot makes heavy use of
  continuations
• If you don‟t know they‟re there you‟ll not
  care
• All Ruby‟s fault, really
• Hidden from HLL code
     Parrot‟s data




Where the magic lives
          Data isn‟t passive

• Lots of functionality hidden in data
• Partly OO
• Or as OO as you get in C
               Strings

• Language neutral
• Encapsulate language behavior,
  encoding, and character set
• Annoyingly complex
Basic String Diagram

       Buffer Info

       Encoding

        Charset

       Language

         Flags
              Encoding

• Represents how the bits are turned into
  „characters‟
• Code points, really
• Even for non-unicode encodings
• Handles transformations from/to storage
            Character Set

• Which characters the code points
  represent
• Basic character manipulation happens
  here
• Case mangling, substrings
• Transformations to other character sets
              Language

• Nuances of sorting and case mangling
• Interpretation of most asian text when
  using Unicode
• Ignorable if you don‟t care
                  Unicode

•   Parrot does Unicode
•   Used as pivot encoding/charset
•   IBM‟s ICU library
•   Didn‟t want to write another badly done
    unicode library
        Efficiency concerns

• Multiple encodings/charsets means less
  conversion
• Transform data only when needed
• Strings are mutable
• COW system for space/speed efficiency
              The PMC

• Represents a HLL variable
• Language agnostic
• Everything pivots off PMCs
PMC diagram

      Vtable

      Flags

      Cache

   Data Pointer

     Metadata

    GC handle

  Synchronization
                The Vtable

•   How all the functionality is implemented
•   Almost everything defers to PMCs
•   Large part of interpreter logic in PMCs
•   Allows fast operator overloading and
    tying
        Some vtable operations

•   Addition             •   Loading
•   Subtraction          •   Storing
•   Multiplication       •   Comparison
•   Division             •   Truth
•   Bitwise operations   •   Type conversion
                         •   Logical operations
                         •   Finalization
 Vtable functions may be Parrot

• How languages implement user
  operator overloading
• Used for perl-style tying
• Usable for operator wrapping
          PMCs are typed

• Types can change
• Allows customized behavior
• Cuts out some overhead
          All PMCs indexable

•   As array or hash
•   Operations may be delegated
•   PMC may be both hash and array
•   Scalar as well
         Multimethod dispatch

•   Core interpreter functionality
•   Used for many PMC operations
•   Beats hand-rolling it
•   Dispatch surprisingly fast
          Magic all hidden

• User code never knows about magic
• Allows transparent behaviour changes
• One big pivot point for dispatch
               Objects

• Standard but optional object system
• Standard object protocols
• Standard object opcodes
   Everything can be an object

• Objects have attributes
• Objects can have methods call on them
• All PMCs have get/set attribute vtable
  entries
• All PMCs have a method call entry
• Therefore, all PMCs are objects
   Objects are cross-language

• Obey the protocols and use the facilities
  and you‟re fine
• Can even inherit across object systems
• Parrot will enforce some invariance
      Object system optional

• Okay to roll your own
• Don‟t have to interoperate
• Load up your own ops and go for it
         Base support for objects

•   Scoped method caches
•   Selective cache invalidation
•   Signature based dispatch in core
•   Op support
    •   Property and attribute access
    •   Method call
    •   Subclassing
    •   can, is, and does
   Assembly Language




Because hand-generating
  bytecode is annoying
               Sample

  set N0, 10
  set N1, 0
loop:
  print "Hello, world!\n"
  add N1, N1, 1 # Could be “inc N1”
  ne N0, N1, loop
  end
           Straightforward

• Destination, source, source
  add DEST, SOURCE1, SOURCE2
• VAX is not dead
• Some magic during assembly
           Ops pre-exploded

•   No actual add op
•   add_i_I_ic, add_i_I_i, add_p_i_i
•   Etc…
•   Assembler chooses right op
•   No runtime type checking needed
•   No runtime JIT code analysis needed
        Ops pre-exploded

• Little extra code needed
• Ops source has custom macro
  preprocessor
• Reduces maintenance load
             Add example

inline op add(out INT, in INT, in INT) {
  $1 = $2 + $3;
  goto NEXT();
}
              Add example

opcode_t *
Parrot_add_i_i_i (opcode_t *cur_opcode, struct
   Parrot_Interp * interpreter) {
  IREG(1) = IREG(2) + IREG(3);
  return (opcode_t *)cur_opcode + 4;
}
Parrot_add_i_ic_i (opcode_t *cur_opcode,
   struct Parrot_Interp * interpreter) {
IREG(1) = cur_opcode[2] + IREG(3);
  return (opcode_t *)cur_opcode + 4;
}
               Very CISCy

•   I like assembly
•   Wanted it to be easily targeted
•   Wanted to be easy to hand-write
•   Good fit to compiler output
•   CISC fits interpreters better
           Rich instruction set

•   Side-effect of interoperability
•   Nifty side effects
•   Very fast dispatch
•   Much lower JIT overhead
       Extensible instruction set

•   Loadable on demand
•   Provides fast access to code
•   Allows language-specific opcodes
•   Even writable in parrot bytecode
•   Blurs opcode/function/method lines
                  PIR

• Parrot Intermediate Language
• Slightly higher level than assembly
• Runs through the optimizer
                 PIR

• Assembly without the annoyances
• Infinite number of registers
• Function header and parameter setup
   Sample (Same as assembly)

$N0 = 10
$N1 = 0
Loop:
 print "Hello, world!\n"
N1 = N1 + 1
 ne N0, N1, Loop
 end
               Assembly++

•   Locals
•   Register allocation and coloring
•   Automatic sub creation
•   Simple expressions
•   Calling-convention aware
                PIR Example 2

.sub _MAIN prototyped   .sub _printme prototyped
  .param pmc argv         .param int Max
  .local int count        .local int Current
  count = argv[0]         Current = 0
  _printme(count)       Loop:
  end                     print "Hello, world\n”
.end                      inc Current
                          ne Current, Max, Loop
                          .pcc_begin_return
                          1
                          .pcc_end_return
                        .end
           Register allocation

•   Infinite number of temps
•   Lifetimes are traced and managed
•   Automatic spilling
•   Single nastiest register task
Toys and Tools




  It‟s Alive!
                 Demos

•   Ncurses demo
•   Parrot Basic demo
•   Parrot CGI demo
•   Real Work demo
     Functioning Languages

• The gag languages
  • Befunge
  • BF
  • Ook!
        Functioning Languages

• The real languages
  •   Forth
  •   BASIC
  •   Scheme
  •   DecisionPlus
        Functioning Languages

• The unfinished languages
  •   Perl 5
  •   Perl 6
  •   Python
  •   Ruby
         Security




Because sometimes people
        just suck
 Security requirements: Handle

• Untrustworthy code
• Malicious code
• Badly written code
         Protection Categories

•   Resource usage
•   Access
•   Mistrusted bytecode
•   Isolated Interpreters
          Resource usage

• Memory, CPU, IO, and time quotas
• Individually settable
• May be enabled and disabled on the fly
  with sufficient privilege
                Access

• Restrictions on what code can do
• Introduces a VMS-style privilege system
• Areas of higher and lower privilege
         Mistrusted bytecode

•   Assumes malformed bytecode
•   Verifies all arguments
•   Verifies jump destinations
•   Much slower
        Isolated interpreters

• Can run code in a separate interpreter
• Controlled environment
         Quickies




Putting a limit on boredom
                Events

• Async event system built in
• One shared, integrated event loop
• Everything can use it
                   IO

•   All IO asynchronous
•   Synchronous wrappers provided
•   Integrated with event system
•   Under-the-hood thread games where
    needed
               Threads

• Designed to be threaded from the
  ground up
• Not the POSIX thread model, alas
• Interpreters too heavy-weight
• No guarantees of user safety, just
  interpreter safety
Parrot Development




Always ongoing
   Getting and installing Parrot

• Point releases
  • Whenever “Big Things” get done
  • Get good workout b efore release
• Snapshots
  • Three times a day
  • For folks without easy CVS access
  • http://cvs.perl.org/snapshots/parrot/
      Getting and installing Parrot

• CVS
  • Full anon access
  •   :pserver:anonymous@cvs.perl.org:/cvs/public


• Rsync
  • From latest CVS tree
  •   rsync -av --delete cvs.perl.org::parrot-HEAD parrot
                 Builds on

• Many Unices
  •   Linux
  •   Mac OS X
  •   *BSD
  •   Solaris
  •   AIX
• WinXP
  • Visual Studio
  • Cygwin
    Regular automated testing

• Tinderbox system
• Regular checkout, build, and testing
• http://tinderbox.perl.org/tinderbox/bdsho
  wbuild.cgi?tree=parrot
         Parrot Mailing lists

• Parrot-internals@perl.org
  • Was perl6-internals
  • Most of the action
• Parrot-compilers@perl.org
• @parrotcode.org soon, hopefully
Questions?




  ?