The Promise of Statically Typed Functional Programming: Broader Context and Language Comparison

Konstantin Läufer (co-presenter)
George K. Thiruvathukal (co-presenter)
Joe Kaylor
Emerging Technologies Lab
Loyola University Chicago

Tuesday, 24 April 2012

Applied Mathematics Colloquium
Loyola University Chicago

Friday, 20 April 2012

Workshop on Functional Programming in Quantitative Finance
The Stevanovich Center for Financial Mathematics
The University of Chicago

Overview

Languages and paradigms

Computational models

Type systems

Specific functional languages

Conclusion

Early History of Programming Languages

1940s: machine languages -> binary code
1950s: assembly languages -> mnemonics and symbols
late 1950s: 3GLs -> abstraction and portability
- FORTRAN
- ALGOL
- COBOL
- LISP

Some Paradigm-Defining Programming Languages

1957: FORTRAN -> imperative/scientific programming (Backus)
1958: ALGOL -> imperative style with support for recursive functions (Backus et al)
1964: APL -> array programming (Iverson)
1967: Simula -> Algol-style object-oriented programming (Dahl and Nygaard)
1972: C -> imperative style/system programming (Kernigan and Ritchie)
1972: Prolog -> logic programming (Colmerauer)
1972: Smalltalk -> dynamic OO programming (Kay et al)
1973: Actors -> concurrency/task parallelism (Hewitt, Agha, Hoare et al)
1978: ML -> statically typed functional programming (Milner et al)
1983: C++ -> C-style OO programming (Stroustrup)

How to choose from this Tower of Babel?

Dominant Programming Paradigms

imperative: explicit mutable state
functional: (closed or recursive) functions over (usually) immutable values
object-oriented: (public) behavior with (private) state
actors: run to completion, concurrent behavior with local/transient state
logic: queries over facts and rules

So why are things so imperative?

Computer architecture (even now with multicore) tends to emphasize stored-program computing with global shared memory.
The first theoretical models of computing, largely attributable to Turing and von Neumann, form the basis for this view.

Each deserves a look.

Turing Machine, 1936

Alan Turing’s “abstract machine” that predates modern programmable computers.

A tape which is divided into cells, one next to the other.
A head that can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time.
A table of instructions: (s_i, symbol at current position, s_j, replacement symbol, L or R)
A state register that stores the state of the Turing table.
All programming languages today are said to be Turing-equivalent, because they can all be reduced to the Turing machine as defined here.
Compare these concepts to the ideas of von Neumann.

von Neumann, EDVAC Report, 1945

Von Neumann was influential in many endeavors (mathematics, physics, game theory, and computing) and was involved in the Manhattan project and the EDVAC computer (successor to the ENIAC of Eckhert and Mauchly).

Three ideas that have shaped modern computer engineering:

“First: since the device is a computor (sic), it will have to perform the elementary operations of arithmetics….”
“Second: the logical control of the device is the proper sequencing of its operations (by…a control organ).”
“Third: Any device which is to carry out long and complicated sequences of operations…must have a considerable memory.”

These views of computing are identical and were independently conceived by Babbage in the design of the Analytical Engine.

How to Choose?!?

Choose by trying to match the paradigm to the problem.
Some solutions benefit from multiple paradigms and/or languages.
Avoid the hammer/nail syndrome of the late 1990s. More soon.

Zoom Out: Fundamental Computational Models

Turing machine/von-Neumann architecture: model of the hardware; low-level, unnatural
lambda calculus: model of the computational behavior; simple, mathematical, natural
recursion: direct model of underlying mathematical theory

These three models are equivalent in computational power.

Church-Turing thesis: these three models express anything that is algorithmically computable.

Lambda Calculus

<Exp> ::= <ident>
        | lambda <ident> . <Exp>  -- function abstraction
        | <Exp> <Exp>             -- function application
        | <constant>         -- needed "only in practice"

 (\f (x, y) -> (f x, f y)) (*2) (2, 3) -> (4, 6)

Sufficiently powerful to define everything within.

By encoding.

Yes, including recursion.

Make it Safe: Type Systems in Programming Languages

A type system is a mechanism to classify program fragments according to the type of value they compute. Can be used to ensure values are used as intended.
A type error is a type-related erroneous program behavior.
Type safety is the use of a type system to prevent type errors.

2 + 3 -> 5

2 + "3" -> error (possible in C)

3[4] -> error (possible in C with cast)

Type System Dimensions

static versus dynamic
strong versus weak
implicit/inferred versus explicit

Static Type Safety

The use of a static and strong type system to prevent type errors at run time.
Benefits
- early detection of logical errors "when things don't fit together"
- do not need to keep type information around at run time -> performance!!
Challenge
- keep it from feeling like a straightjacket to the programmer

Timeline of the Language/Type System Pendulum

Since mid 1980s: statically typed OO languages: C++, Java, C#
- want type safety and performance
- language monoculture trap: started to focus on libraries and frameworks instead of languages
Since late 1990s: dynamic languages: Perl, Python, Ruby
- wanted productivity (expressiveness, flexibility, extensibility, etc.)
- build confidence in correctness through extensive testing (TDD)
- performance? avoid bottleneck by using as glue for C/C++ libraries
Back since late 2000s: statically typed functional languages: ML and its descendants F#, Haskell, and OCaml; Scala
- want type safety, performance, and agility
- want better support for multi-core hardware

ThoughtWorks Technology Radar: Languages Section

Languages

Main Statically Typed Contenders: Haskell, F#, Scala

Haskell: ML descendant, but lazy and without mutable state, no OO
Scala: Java-based, mutable and immutable state, eager, functional on top of OO
F#: OCaml descendant, mutable and immutable state, eager, OO on top of functional

All have powerful type systems, but Haskell the most.

All have support for concurrency.

Interesting Dynamically Typed Contenders

Clojure: focus on concurrency and high performance
Erlang: focus on scalability and soft real-time support
JavaScript: new front ends (e.g., CoffeeScript), becoming a popular target VM

Running Example: Process Tree

Transform this

laufer   20736 25090  0 Apr16 pts/5    00:00:00 /bin/bash
...
laufer   21274 20736  0 Apr16 pts/5    00:00:01 emacs -nw test2bb.txt
laufer   21281 21274  0 Apr16 ?        00:00:00 /usr/bin/aspell -a -m -B --encoding=utf-8
...
laufer   25090     1  0 Apr15 ?        00:01:18 guake -OO /usr/lib/guake/guake.py
...
laufer   28849 25090  0 11:24 pts/9    00:00:00 /bin/bash

to this

 25090: guake -OO /usr/lib/guake/guake.py
  20736: /bin/bash
   21274: emacs -nw test2bb.txt
    21281: /usr/bin/aspell -a -m -B --encoding=utf-8
  28849: /bin/bash

Edge reversal!

In Java

private final Map<Integer, String> pMap = new HashMap<Integer, String>();
private final Map<Integer, List<Integer>> tMap = new HashMap<Integer, List<Integer>>();

    String line = null;
    while ((line = in.readLine()) != null) {
        final Process p = parser.parseString(line);
        pMap.put(p.pid, p.cmd);
        if (! tMap.containsKey(p.ppid))
            tMap.put(p.ppid, new ArrayList<Integer>(CHILD_LIST_SIZE));
        tMap.get(p.ppid).add(p.pid);                

// now print tMap as a tree

Languages with name-based typing require a name for every type! Verbose, not agile.

In a Typical Dynamic Language

pMap = new HashMap
tMap = new HashMap

    line = null;
    while ((line = in.readLine()) != null) {
        p = processParser.parseString(line);
        pMap.put(p.pid, p.cmd);
        if (! tMap.containsKey(p.ppid))
            tMap.put(p.ppid, new ArrayList(CHILD_LIST_SIZE));
        tMap.get(p.ppid).add(p.pid);                

// now print tMap as a tree

Fictitious language. (I kind of leapfrogged over dynamic languages.)

Type erasure of Java example.

In F# Using .NET Collections

let cMap = new Dictionary<int, String>()
let tMap = new Dictionary<int, ICollection<int>>()
let bOut = new StreamWriter(new BufferedStream(Console.OpenStandardOutput()))

let mutable line = Console.ReadLine()
while line <> null do
    let (words, pid, ppid) = // ...parse from line...
    cMap.[pid] <- line.Substring(iCmd)
    if not (tMap.ContainsKey(ppid)) then
        tMap.[ppid] <- new List<int>(8)
    tMap.[ppid].Add(pid)
    line <- Console.ReadLine()    
    
// now print tMap as a tree

Looks like a C# program.

Performs like a C# program!

In F# Using F#'s Built-In Collections

let rec lines = seq {
    let line = Console.ReadLine()
    if line <> null then yield line ; yield! lines
}

let parse (line: string) = // ...

let ps = lines |> Seq.map parse |> List.ofSeq
let cMap = ps |> List.map (fun (pid, _, cmd) -> (pid, cmd)) |> Map.ofList
let tMap = ps |> Seq.ofList |> Seq.groupBy (fun(_, ppid, _) -> ppid) |> Map.ofSeq

// now print tMap as a tree

Compose computations at the collection level.

The only mutable state is I/O.

About 50% performance overhead over previous version. In part because these maps are tree-based.

In Scala

val lines = scala.io.Source.fromInputStream(System.in).getLines
val out = new BufferedWriter(new OutputStreamWriter(System.out), IO_BUF_SIZE);
val parse = parseLine(lines.next())

val ps = lines map parse toList
val pmap = ps map { case (pid, _, cmd) => (pid, cmd) } toMap
val tmap = ps groupBy (_._2)

// now print tMap as a tree

Compose computations at the collection level.

The only mutable state is I/O.

In Haskell

printProcessTree :: [String] -> [String]
printProcessTree (header : lines) = 
  let 
    ps = map (parser header) lines
    pmap = IntMap.fromList ps
    tmap = IntMap.fromListWith (++) $ map (\(_, (pid, ppid, _)) -> (ppid, [pid])) ps
    showTrees l i = -- ...convert tmap to strings...
  in
    showTrees 0 0

main :: IO()
main = interact $ unlines . printProcessTree . lines

Compose computations at the collection level.

The only mutable state is I/O.

But even this is hidden in interact using dependency injection.

Monadic Style in Haskell

Different example: echo back input lines and print how many were there.

mainLoop :: Int -> IO ()
mainLoop count = count `seq` do
  eof <- isEOF
  if eof
    then
      putStrLn $ show count
    else do
      line <- getLine
      putStrLn line
      mainLoop $ count + 1

main :: IO ()
main = mainLoop 0

Looks like imperative code.

Mutable state of I/O is safely partitioned into I/O monad.

Higher-Kinded Typing in Haskell (and Scala)

Haskell has no subtype polymorphism, duck typing, or the like.

But...

fmap (*2) (Just 3) -> Just 6

let t = Node 1 [Node 2 [], Node 3 [Node 4 [], Node 5 []]]
fmap (*2) t -> Node 2 [Node 4 [], Node 6 [Node 8 [], Node 10 []]]

...it has typeclasses for systematic function overloading.

class Functor f where
  fmap :: Functor f => (a -> b) -> f a -> f b

Example of an abstraction over type constructors (generic types): f is Maybe a, Tree a, etc.

Scala: scalaz

F#: upstream in F* (experimental research language)

Basic Intrinsic Criteria

Foundational soundness
Adherence to design principles (Maclennan 1986)
Readability
Writability/Productivity
Reliability
Cost
Portability
Generality
REPL for exploratory programming
Support for internal and/or external domain-specific languages

All three languages score very similarly on all of these!

Matt's Extrinsic Criteria from Consulting Firm

Current expertise in workforce
Level of local area expertise (Can we hire around here?)
Cost of productive developer ("We don't want the best if we can get good enough for half the price")
Cost of training current workforce
Cost of migrating infrastructure
Cost of developer tools
Performance and scalability
Ability to offshore
Community strength (blogs, meetups, conventions)
Visibility of language (CIOs like saying they are part of up-and-coming languages)
Support (Documentation, Consultants, paid ticket support)

Other Interesting Criteria

Performance
Support for concurrent and parallel programming
Ecosystem
- tools
- "batteries included" (standard libraries)
- target virtual machine/interoperability
Major projects
Community including books, industry backing, etc.
Other tools
"Use what your friends use"

Popularity of Languages

Top languages on GitHub

based on site statistics; Haskell 16, Scala 17, F# 41

Tiobe index at www.tiobe.com

based on queries of eight search engines; Haskell 37, F# 39, Scala 45

Performance

shootout.alioth.debian.org

language                        25%     median  75%
C++ GNU g++     1.00    1.00    1.10    1.26    1.68    2.28    2.28
Java 7 -server  1.40    1.40    1.59    1.90    2.14    2.97    4.76
Scala           1.39    1.39    1.94    2.78    3.44    5.70    10.98
Haskell GHC     1.53    1.53    2.60    2.80    4.36    7.00    15.15
C# Mono         1.60    1.60    2.62    3.08    7.12    13.88   14.21
Clojure         2.05    2.05    3.58    5.19    8.28    12.07   12.07
F# Mono         2.97    2.97    3.16    5.33    8.92    17.57   37.80
Erlang HiPE     5.17    5.17    7.99    10.79   15.44   26.61   41.54
Ruby 1.9        4.67    4.67    11.74   53.29   101.33  235.71  356.61

our own process tree benchmarks

factor 1:  C, C++ (cstdio)
factor 2:  C#, C++ (iostream), F# (.NET collections)
factor 3:  Java (real time: 2.5), F# (built-in collections)
factor 6:  Scala (real time: 5)
factor 20: Haskell

Both using Mono on Linux.

Assessment of Main Contenders: Haskell

very powerful type system
extensive standard library
parser combinators for external DSLs
cohesive, active community
development tools not very mature

Assessment of Main Contenders: Scala

parser combinators for external DSLs
actors
typesafe stack: akka middleware, play web app framework, spray web services toolkit
high project activity
Java interoperability
good development tools
active community

Assessment of Main Contenders: F#

CLR performance advantages over JVM for scientific/financial computing
numerical computing, units of measure
- reference with example
fslex/fsyacc lexer/parser generators for external DSLs
asynchronous computing (actors), data and task parallelism
- example: parallel matrix multiplication
supported product status
.NET interoperability
excellent development tools
cross-platform support getting there
growing community

Conclusion

Why statically typed functional languages?
- early detection of errors
- performance
- natural choice of paradigm
Why Scala?
- excellent pragmatic choice overall

The Promise of Statically Typed Functional Programming: Broader Context and Language Comparison

Overview

Languages and paradigms

Computational models

Type systems

Specific functional languages

Conclusion

Early History of Programming Languages

Some Paradigm-Defining Programming Languages

Dominant Programming Paradigms

So why are things so imperative?

Turing Machine, 1936

von Neumann, EDVAC Report, 1945

How to Choose?!?

Zoom Out: Fundamental Computational Models

Lambda Calculus

Make it Safe: Type Systems in Programming Languages

Type System Dimensions

Static Type Safety

Timeline of the Language/Type System Pendulum

ThoughtWorks Technology Radar: Languages Section

Main Statically Typed Contenders: Haskell, F#, Scala

Interesting Dynamically Typed Contenders

Running Example: Process Tree

In Java

In a Typical Dynamic Language

In F# Using .NET Collections

In F# Using F#'s Built-In Collections

In Scala

In Haskell

Monadic Style in Haskell

Higher-Kinded Typing in Haskell (and Scala)

So How to Choose?

Basic Intrinsic Criteria

Matt's Extrinsic Criteria from Consulting Firm

Other Interesting Criteria

Popularity of Languages

Performance

Assessment of Main Contenders: Haskell

Assessment of Main Contenders: Scala

Assessment of Main Contenders: F#

Conclusion

Questions, Comments, Discussion...