F#: Another ML compiler for
.NET
Don Syme, 24/4/2002
Overview
Why?
F# = Caml.NET
Language Choices (Basic)
Language Choices (Interop)
Usability (Optimization, Packaging)
Part 1: Motivation & A Taste
Why F#? Background Is it efficient? Is it suitable for
cross-language interop?
Need a compiler to find out.
Generics SML.NET not quite suitable.
Type parameters for MS-IL, C# etc.
Need a quality compiler to
ILX set an example.can’t use this
Written in Caml,
“Standardize” encodings for closures and
from C#, or anywhere else.
data by “adding” them to MS-ILcompiler to make this
Need a
accessible.
Abstract IL Couldn’t integrate this into the
Toolkit for manipulating very easily
ILCLR even if we wanted to. An
ML compiler would have
ILVerify been useful.
Verifier/validator for IL (written in Caml)
Why F#? Background
AbsIL useful for all of these.
Future?
IL analysis? (security? optimization?)
IL transformation?
Language design? (systems programming? XML?
concurrency?)
Hence Caml.NET, now F#
Why F#?
F# = Caml.NET
Why are ML languages so great?
Type checking
Simple bag of constructs
Type inference lets you hack out correct code quickly and still
maintain it
What’s often wrong?
Traditionally hung up on performance
Slow, strange compilers, no debuggers, no profilers etc.
No libraries, no interop
Why is Caml so great?
Simple, easy-to-understand compiler
Top-to-bottom design choices that makes everything “fit”
My hope would be is to repeat this assuming a .NET world
underneath
Unfortunately it won’t be quite that easy…
F# Overview
The aim:
Not for windows or ASP.NET programming
But for primarily ML programming, with .NET in mind
All ML code immediately accessible from .NET
Semantics and operational behaviour of ML code/objects must be easily
understood by C# programmers
Language: Interop:
Simple, extensible core language
Leave room for extension
Not .NET Consumer
can’t easily import everything,
may need to write a little C#
Tools:
Fast separate compilation Not .NET Producer
Simple compiler ( unit) ->'a array =
let len = length arr in
for i = 0 to len - 1 do
f arr.(i)
done
let rec map f x =
match x with
| [] -> []
| (h::t) -> f h :: map f t
type („a,„b) tree = type („a,„b) mtree =
| Empty | Empty
| Node of („a,‟b) node | Node of („a,‟b) mnode
and („a,‟b) node = and („a,‟b) mnode =
{ key: „a; { key: „a;
val: „b; mutable val: „b;
left: („a, „b) tree; left: („a, „b) tree;
right: („a, „b) tree; right: („a, „b) tree;
height: int } height: int }
F# = Caml.NET
Polymorphic hashing,
equality, comparison Polymorphic binary I/O
val (=) : „a -> „a -> bool val output_val : out_channel -> „a -> unit
val (==) : „a -> „a -> bool val input_val : in_channel -> „a
val compare : „a -> „a -> int
val ( „a -> bool
val hash : „a -> int i.e. blast the object graph to disk
i.e. hash and equality “automatically available”
for non-cyclic Caml types
Multi-pattern matching
Signature files
pervasives.mli:
val abs : int -> int
let find r =
val string_of_float : float -> string match x with
... | A | B | C _ -> x
type out_channel | D(_,ty1,_)
val open_out_bin: string -> out_channel | E(ty1) -> combine ty1 x
val output_value: „a -> out_channel -> unit
F# OCaml.NET
No objects (better to fit with .NET)
No labels/defaults (there must be better
solutions…)
No functors (much complexity, little added
value)
Also
No ocamllex/ocamlyacc
No ocamlp4 (macro processor)
No ocamldep (dependency analyzer)
Using the compiler
bin\fsc.exe foo.ml
-c Compile only (produce .cno/.cni)
-a Build a DLL
-g Debug. Can run against same
library (unless you want to debug
the library)
-O Enable cross-module optimization
--unverifiable Faster closures, no stupid casts,
different library needed
High fidelity, Binary compatibility & Versioning
Most ML compilers cannot create DLLs at all
Or can only create DLLs whose interface is C or COM
High Fidelity:
Can I access an ML DLL from an ML EXE in a completely transparent
way?
Binary Compatibility: Can I change the internals of a DLL and use it in
place of existing DLLs?
OCaml: No.
F#: Yes. MS-IL compiled interfaces are stable
Caveat: cross-module inlining must not be used by client DLLs (i.e. do not
ship .cnx files to clients)
Versioning: Can you add functionality to a DLL and use it in place of
existing DLLs?
OCaml: No.
F#: Some, e.g. can add (visible) bindings, can add (visible) types.
Even with cross-module inlining.
Part 2: Interop, Language
Design
Language Design Choices
Immutable Unicode strings and wchars
Signatures are compilation unit boundaries NOT
•Arities specified by parentheses
•Not pretty, but efficient
module-value constraints
•Can be helpful documentation
Can hide “generated” ML types
•Gives mutually recursive modules cheaply
Can’t constrain polymorphism
Can and should reveal arities Value x2 is more polymorphic in
the module than the signature
type mystring type mystring = MyString of string list
type myrecd type myrecd = { a: int; b: string }
type data type data = OtherData.data
type csdata type csdata = (# “CSharpProgram.data”)
type csdata2 type csdata2 = CSData of (# “CSharpProgram.data”)
val x : int list let x = ([] : int list) .ml
.mli val x2 : int list let x2 = ([] : „a list)
val f1 : int -> int -> int let f1 x y = x + y
val f2 : int -> (int -> int) let f2 x y = x + y
val f3 : int -> int -> int let f3 x = print “hello”; (fun y -> x + y)
val f4 : int -> (int -> int) let f4 x = print “hello”; (fun y -> x + y)
Interop
F# from C#
C# from F# -- Not yet done
MS-IL from F# -- Gives baseline interop
Also to consider:
F# from F# (done: full fidelity)
Any ILX language from F# (not done: good fidelity
possible)
Interop (The SML.NET approach)
int int
string option string
Foo option class Foo
C#/.NET Types
SML Types
Interop (The F# Approach)
C#/.NET Types
class Foo
Foo
int byte
char single
(= class FSharp.list)
double (= float)
(= class FSharp.tree)
’a list ’a tree
F# Types
F# from C#
Calling code and opaque types
F# module Pervasives (pervasives.mli)
•Every ML type is a C# type
val abs : int -> int
val string_of_float : float -> string •Every ML top-binding is accessible
... •No signature file needed to access
type out_channel
val open_out_bin: string -> out_channel
val output_value: „a -> out_channel -> unit
C# module Test (test.cs)
test.ml:
let n = abs(-3);;
using Pervasives;
let s = string_of_float(3.1415);
class Test {
let out = open_out_bin(“out”);
static void Main() {
output_value(3.1415);
int n = Pervasives.abs(-3);
string s = Pervasives.string_of_float(3.1415);
out_channel out = Pervasives.open_out_bin(“out”);
Pervasives.output_value(out,3.1415);
}
}
“Curried” function values take tuples
F# from C#
Accessing data (records and unions)
module Il (il.mli) F# records compile to classes and
type assembly =
can be accessed immediately
{ assemMainModule: modul; F# datatypes are C# types
assemAuxModules: modules } •They conform to ILX standard
type types
type type_def = •Accessed via helper functions
test.ml:
| CLASS of class_def (IsCLASS, GetCLASS) etc.
Printf.printf "Num types = %d\n"
(List.length
| INTERFACE of interface_def
| VALUETYPE of valuetype_def •Independent(dest_tdefs
of ILX representation
assem.assemMainModule.modulTypeDefs));
| ...
Here is an example
static bool isClass(Il.type_def t) { return t.IsCLASS(); }
class Test {
static void Main() {
Il.types types = assem.assemMainModule.modulTypeDefs;
Here is an example FSharp.list tys = Il.dest_tdefs(types);
of a polymorphic type Console.WriteLine
("Num types = {0}",List.length(tys));
}
}
Nb. ML types not very OO. May work on th
F# from C#
Passing Function Values
F# function types become System.ILX.Func1
ILX chooses the representation
module List (list.mli) •Not yet invisible to C# code
Val filter: („a -> bool) -> „a list -> „a list
No generics, function values pass “object”
But there is an implicit conversion from a delegate
type to the type used by ILX for function values
static object isClass(object t) { return (object) ((Il.type_def) t).IsCLASS(); }
Console.WriteLine
("Num classes = {0}",
(List.length
(List.filter System.Func is a delegate type
(new System.Func(isClass),
(Il.dest_tdefs(assem.assemMainModule.modulTypeDefs))))));
F# from C#
Passing Function Values
module List (list.mli)
Val filter: („a -> bool) -> „a list -> „a list
With generics
static bool isClass(Il.type_def t) { return t.IsCLASS(); }
Console.WriteLine
("Num classes = {0}",
(List.length
Parameters can be inferred
(List.filter
(new System.Func(isClass),
(types)))));
C# from F#
Calling static members is easy, just quote the .NET type name and member using the
“.” notation:
let findDLLs dir =
(* call a static member in the System.IO.Directory class *)
if (Directory.Exists dir) then
let files = Directory.GetFiles(dir, "*.dll") in
Arr.to_list files
else []
Instance members are accessed using an extended “.” notation. Sometimes type
annotations are needed to resolve the type used for the “.” notation. These type
annotations are propagated left-to-right, outside-in.
let searchFile (pat:string) file =
match (try Some (Assembly.LoadFrom(file)) with _ -> None) with
| Some a ->
let modules = a.GetModules() in Without the type annotation
let pat = pat.ToUpper() in you get a “please supply a
...
type annotation” error here
Sometimes casts are needed to resolve overloading. Currently use “(cast expr : type)”
C# from F#
Can create objects (“new Type(…)” or just
“Type(…)”).
Can create delegates (“new EventHandler(…)”).
When creating delegates provide an ML function of
the right curried type
Can create value types and use .NET properties.
Cannot mutate value types.
MS-IL from F#
Embedded MS-IL
Cheap-shot way of implementing primitives
Parsed and included as part of the IL stream
Can be inlined, optimized, even instantiated
Also type & exception representations
let (+) (x:int) (y:int) = (# "add" x y : int)
let sin (x:float) =
(# "call default float64 [mscorlib]System.Math::Sin(float64)" x : float)
type obj = (# "class [mscorlib]System.Object" )
exception Not_found = (# "class [mscorlib]System.NotFoundException" )
Part 3:Compiler, Perf etc.
Optimizations & Perf
No optimizations
Top level function bindings still become methods
No inlining at all
Data layout unchanged
Local optimizations
Eliminate unused bindings
Inline a little
Remove tuples when immediately destroyed
Cross-module optimizations
Same as local except propagated across
F# Compiler Architecture
Parser
Typechecker
Optimizer
ILXGEN
ILX -> Generic IL
+ ilxlib.dll
Generic IL -> IL CLR V1.1
+ fslib.dll
CLR V1.0 + ilxlib.dll
+ fslib.dll
Some interesting bits
Polymorphic comparison IComparable,
IStructured
Polymorphic equality Object::Equals
Polymorphic hashing IStructured::GetHashCode,
Object::GetHashCode
Must generate new virtuals for each new type
Problem: Good SExpr hash algorithms don’t traverse whole
term...
Problem : value types get boxed
ILX Choices:
Datatype representations (many possibilities)
Closure representations (three choices)
Some interesting bits
Embedded MS-IL + inlining is very useful
Almost no primitives in compiler
Nb. inlining generic IL takes some care...
module Pervasives (pervasives.ml)
let (+) (x:int) (y:int) = (# "add" x y : int)
let (-) (x:int) (y:int) = (# "sub" x y : int)
let (*) (x:int) (y:int) = (# "mul" x y : int)
let (/) (x:int) (y:int) = (# "div" x y : int)
module Array (array.ml)
let length (arr: 'a array) = (# "ldlen" arr : int)
let get (arr: 'a array) (n:int) =
(# "ldelem.any !0" type ('a) arr n : 'a)
let set (arr: 'a array) (n:int) (x:'a) =
(# "stelem.any !0" type ('a) arr n x)
let zero_create (n:int) = (# "newarr !0" type ('a) n : 'a array)
Some interesting bits
Type based conditional pragmas
Type based conditional pragmas give cheap way of generating
good code
Optimizer chooses right one when possible
For library use only
let (=) (x : 'a) (y : 'a) = (inbuilt_poly_equality x y)
when 'a = int = (# "ceq" x y : bool )
when 'a = sbyte = (# "ceq" x y : bool )
when 'a = int16 = (# "ceq" x y : bool )
when 'a = int32 = (# "ceq" x y : bool )
when 'a = int64 = (# "ceq" x y : bool )
when 'a = byte = (# "ceq" x y : bool )
when 'a = uint16 = (# "ceq" x y : bool )
when 'a = uint32 = (# "ceq" x y : bool )
when 'a = uint64 = (# "ceq" x y : bool )
when 'a = float = (# "ceq" x y : bool )
when 'a = char = (# "ceq" x y : bool )
Some interesting bits
Mutable locals for library procedures
Why waste weeks implementing optimizations when you can get
80% of the effect In 1 hour?
Obvious restrictions
let rev l =
let mutable res = [] in
let mutable curr = l in
while nonnull curr do
let h::t = curr in
res <- h :: res;
curr <- t;
done;
res
Perf: A large symbolic processing app,
(large input)
Managed F# compiler
fscmanaged.exe
(larger is better)
14 11.9
12.8
Object, Debug,
12 Verifiable, Nonopt
Object, Debug,
10 Verifiable, Localopt
7.4
7.1 Object, Debug,
1000/Time (ops/s)
8 Verifiable, Crossopt
6 4
Generic, Debug,
Verifiable, Crossopt
4 2.8 Generic, Verifiable,
Crossopt
2 Generic,
Unverifiable+FastClo,
Crossopt
0
Flavour
Perf: A large symbolic processing app.
(small input, no install-time compilation)
Managed F# compiler
fscmanaged.exe
(larger is better)
190
200 187
Object, Debug,
Verifiable, Nonopt
143
137 Object, Debug,
150 125 125 Verifiable, Localopt
Object, Debug,
Verifiable, Crossopt
1000/Time (ops/s)
100 Generic, Debug,
Verifiable, Crossopt
50 Generic, Verifiable,
Crossopt
Generic,
Unverifiable+FastClo,
0 Crossopt
Flavour
Perf: Tailcall or not
100
90
80
70
60
Time (s) 50 with
40 without
30
20
10
0
Object Generic
Summary
Essentially reached my aims
Usable compiler for writing accessible ML libraries
Performance not brilliant but good enough
Accessing .NET from ML now done, by extending the “.”
syntax and using the simple “add type annotations until
overloading is resolved”
Results
.NET compilers can be simple and (I think) useful
Proof of .NET generics interop
Performance testing for generics
First ML compiler with high fidelity across DLLs, and good
versioning/binary compat. properties?