Perl 6 Internals by bzs12927


									 Perl 6 Internals

     Dan Sugalski
       TPC 5.0

“Here there be dragons”
    The big goals of perl 6's internals
 Speed
 Extendibility

 Cleanliness

 Compatibility

 Modularity

 Thread Safety

 Flexibility
          Some global decisions
 The core will be in C. (Like it or not, it's
  appropriate for code at this level)
 The core must be modular, so pieces can be

  swapped out without rebuilding
 It must be fast

 Long-term binary compatibility is a must

 Your average perl coder or extension writer

  shouldn't need any info about the guts
 Things should generally be thought out,

  documented, and engineered
          The quick overview

Runtime   engine


Parser            Compiler     Optimizer            Interpreter

      Tree                     Unoptimized
                 Precompiled     Bytecode
                  The parser
 Where the whole thing starts
 Generally takes source of some sort and turns it

  into a syntax tree
        The Bytecode Compiler
 Turns a syntax tree into bytecode
 Performs some simple optimization
               The optimizer
 Takes the plain bytecode from the compiler and
  abuses it heavily
 An optional step, generally skipped for compile-

  and-go execution
 Should be able to work on small parts of a

  program for JIT optimization
                 The Interpreter
 Takes compiled (and possibly optimized)
  bytecode and does something with it
 Generally that something is execute, but it might

  also be:
     Save to disk
     Translate to another format (.NET, Java bytecode)
     Compile to machine code
         The Parser

“Double, double, toil and trouble
Fire burn, and cauldron bubble”
               Parser goals
 Extendible in perl
 More powerful than what we have now

 Retargetable

 Self-contained and removable
           Parsing perl isn't easy
 May well be one of the toughest languages to
  properly parse
 If we get perl right other languages are easy. Or at

  least easier
 We have the full power of perl to draw on to do

  the parsing (Including the regex engine and
  Damian's Bizarre Idea de Jour)
The Compiler

“Mmmmm, tasty!”
     From syntax tree to bytecode
 The compiler takes a syntax tree and turns it into
 Very little optimization is done here.

 Optimization is expensive and optional

 Pretty straightforward—this isn't rocket science
     The Optimizer

     “We can rebuild it.
Make it better, faster, stronger”
               The Optimizer
 Takes plain bytecode and makes it faster
 Does all the sorts of things that you expect an

  optimizer to do—code motion, loop unrolling,
  common subexpression work, etc.
 Will be an iterative process

 This will be interesting, as perl's a pain to

 An optional step, of course
    Things that make optimizing perl
 Active data
 Runtime redefinitions of everything

 Really, really late binding (Waiting for Godot

 Perl programmers are used to more predictable

  runtime characteristics than, say, C programmers.
 The Interpreter

“Polly want a cracker?”
              Interpreter goals
 Fast
 Tuned for perl

 Language neutral where possible

 Event capable

 Sandboxable

 Asynchronous I/O built in

 Built with an eye towards TIL and/or native code

 Better debugging support than perl 5
The perl 6 interpreter is software CPU
 Complete with registers and an assembly
 This can make translating perl 6 bytecode into

  native machine code easier
 There's a lot of literature on building optimzing

  compilers that can be leveraged
 While more complex than a pure stack-based

  machine, it's also faster
 Opcode dispatch needs to be faster than perl 5

 Opcode functions can be written in perl
                   CPU specs
 64 int, float, string, and PMC registers
 A segmented multiple stack architecture

 Interrupt-capable (for events)

 Pretty much completely position independent—

  everything is referenced via register, pad entry, or
              The regex engine
 The regex engine is going to be part of the perl 6
  CPU, not separate as it is now
 A good incentive to get opcode dispatch fast

 Makes expanding the regex engine a bit easier

 Details will be hidden as a set of regex opcodes
    A few words on the stack system
 Each register file has an associated stack
 All registers of a particular type can be pushed

  onto or popped off the stack in one go
 Individual registers or groups of registers can be

  pushed or popped
 The stacks are all segmented so we're not relying

  on finding contiguous chunks of memory for
 There's also a set of call and scratch stacks

“Could you say that a little differently?”
             What is bytecode?
 A distilled version of a program
 Machine language for the PVM

 Can contain a lot of 'extra' information, including

  full source
 Designed to be platform independent

 Should be mostly mappable as shared data

  (modulo the fixup sections)
         Data Structures

“Vtables and strings and floats, oh my!”
                        Generically called a
Vtable Pointer           PMC
    Flags               Bigger than Perl 5's

 Data Pointer            base data structure
 Integer Value          Synchronization data

  Float Value            built-in
                        Same for all variable
    GC Data             GC data is not part of

                         base structure
 Built off the base PMC structure
 Use the integer and float areas as caches

 Data pointer points off to string, large int, or large

 Vtable functions determine how it all works
 Built off the base PMC structure
 Data pointer points to array data

 All perl 6 arrays are typed

 May have an array of scalars, strings, integers, or

 Array only takes up enough memory to hold their

 Built off the base PMC structure
 Data pointer points to array data

 All perl 6 hashes are typed

 May have a hash of scalars, strings, integers, or

 Hashes only takes up enough memory to hold

  their types
 Hashing function is overridable

 Buffer Start
                     Strings are sort of
Buffer Length
                     Perl 6 can mix and
String Length
                      match string data
                      (Unicode, ASCII,
 String Size
                      EBCDIC, etc)
                     New string types can
   Type               be loaded on the fly
               String handling
 Perl 6 has no 'built-in' string support—all string
  support is via loadable libraries
 There'll be Unicode, ASCII, and EBCDIC

  support provided (at least) to start

Buffer Pointer        Bigints and bigfloats
   Length              share the same header
                      Arbitrary-length
                       floating point and
                       integer numbers are
                      Perl automagically

                       upgrades ints and
                       floats when needed
 All variable data access is done through a table of
  functions that the variable carries around with it
 This allows us faster access, since code paths are

  specialized for just the functions they need to
 Isolates us from the implementation of variables

 Allows special purpose behaviour (like perl 5's

  magic) to be attached without cost to the rest of
               Vtables (cont'd)
 Makes thread safety easier
 A little bit more overhead because of the extra

  level of indirection, but the smaller functions
  make up for that
 Vtable functions can be written in perl. (Each

  class with objects blessed into it will have at least
 There may be more than one vtable per package
    Vtables hide data manipulation
 Pretty much all the code to handle data
  manipulation will be done via variable vtables
 Ths allows the variable implementation to change

  without perl needing to know
 Allows far more flexibility in what you can make

  a variable do
 Shortens the code path for data functions and

  trims out extraneous conditionals
            For example:
 Fetching the string value of a scalar
For scalars with strings:        For int-only scalar:
String *get_str(PMC *my_PMC) {   String *get_str(PMC *my_PMC) {
  return my_PMC->data_pointer;     my_PMC->data_pointer =
}                                    make_string(my_PMC->integer);
                                   my_PMC->vtable =
                                   return my_PMC->data_pointer;
Memory Management

“Now where did I put that?”
              Getting headers
 All the fixed-size things (PMCs, string/number
  headers) get allocated from arenas
 All headers, with the exception of PMCs (maybe)

  are moveable by the garbage collector
 Non-PMC header allocation is very fast

 PMC allocation is only mostly fast
            Buffer Management
 Anything that isn't a fixed size gets allocated
  from the buffer pools
 All buffered data, with the exception of data

  allocated in special pools, is moveable by the
  garbage collector
 Because of GC, allocation is very quick
Garbage Collection

 “Bring out yer dead!”
The perl 6 GC is a copying collector
 Everything except PMCs is moveable in Perl 6
 PMCs might be moveable too

 We get a compact memory heap out of this,

  which allows for fast allocation
 Perl 6 will release empty memory back to the

  system when it can
 Refcounts are used only to note object lifetimes,

  not for GC
 Refcounts, for the most part, are dead
     GC considerations for Objects
 Garbage collection and object death are now
  separate things
 Perl's guarantee of timely object death is stronger

 We still don't guarantee perfect collection (but it

  sucks less)
 We still refcount for real perl references, but only

  2 bits are used
 Objects with more than two simultaneous

  references won't get collected until a full dead
  variable scan is made
            Extensions beware!
 Since we have no refcounts, extensions must tell
  perl when they hold on to PMCs
 Not a huge deal, as we piggy-back on the cross-

  interpreter PMC tracking we use for threads
 No more struct PMC; in extensions...
Extending Perl 6
           Extensions Made Easier
 Perl 6 will have a real API
 The API is multilevel

     Simple  for embedders
     More complex for extension authors
     Pretty messy for vtable or opcode writers
   Binary compatibility is a very strong
 Guaranteed stable and binary compatible for the
  life of perl 6
 Very simple API

     Create interpreter
     Destroy interpreter
     Parse source
     Run code
     Register native functions
 Much simpler interface to perl's internals
 The gory details are hidden

 Stable binary compatibility is a very strong goal

     We  may add functions or options, but we won't take
      them away
     Extensions built for perl 6.0.1 should still run with
      perl 6.8.12 without rebuilding
 Manipulating perl data should be much easier
 If you have to resort to Inline to wrap a library

  then it means we've not got it right
              Extensions (cont)
 Inline, or something like it, is probably going to
  be the standard for extending perl
 XS, when you have to resort to it, will be far less

  nasty than it is now
    Homegrown Opcodes and Vtables
 This is part of the grubby inside of perl 6
 You can use any of the internal routines of perl

 If you do, though, you may run into backward-

  compatibility issues at some point. (If it's not part
  of the embedding, utility, or extension API, we
  make no promises)
 There's no guarantee that calling conventions

  won't change.
 No guarantees that perl 6.4 will even use vtables

  or opcodes
                   Utility library
   Perl 6 will provide a set of utility routines to
    handle common tasks
     Stringmanipulation
     Encoding changes (Shift-JIS to Unicode, EBCDIC to
     Conversion routines (string to int or float)
     Extended precision math (int and float)
   These will be stable, like the rest of the API
     Variations on a Theme

“Tocatta and Fuge in perl minor by Wall”
    The source doesn't have to be perl
 The parser isn't obligated to be parsing perl
 Input source could be Python, Ruby, Java, or

 The full perl parser is optional
     The interpreter doesn't have to
 The interpreter is the destination for bytecode, but
  it doesn't have to interpret it
 It might save directly to disk

 It might translate the bytecode into an alternate

  form—Java bytecode, .NET code, or executable
  code, for example
 The interpreter might translate to machine code

  on the fly, as a sort of JIT compiler. (Well, really
  a TIL, but...)

To top