The Promise of Statically Typed Functional Programming: Broader Context and Language Comparison
Konstantin Läufer (co-presenter)
George K. Thiruvathukal (co-presenter)
Joe Kaylor
Emerging Technologies Lab
Loyola University Chicago
Tuesday, 24 April 2012
Applied Mathematics Colloquium
Loyola University Chicago
Friday, 20 April 2012
Workshop on Functional Programming in Quantitative Finance
The Stevanovich Center for Financial Mathematics
The University of Chicago
Overview
Languages and paradigms
Computational models
Type systems
Specific functional languages
Conclusion
Early History of Programming Languages
1940s: machine languages -> binary code
1950s: assembly languages -> mnemonics and symbols
late 1950s: 3GLs -> abstraction and portability
Some Paradigm-Defining Programming Languages
1957: FORTRAN -> imperative/scientific programming (Backus)
1958: ALGOL -> imperative style with support for recursive functions (Backus et al)
1964: APL -> array programming (Iverson)
1967: Simula -> Algol-style object-oriented programming (Dahl and Nygaard)
1972: C -> imperative style/system programming (Kernigan and Ritchie)
1972: Prolog -> logic programming (Colmerauer)
1972: Smalltalk -> dynamic OO programming (Kay et al)
1973: Actors -> concurrency/task parallelism (Hewitt, Agha, Hoare et al)
1978: ML -> statically typed functional programming (Milner et al)
1983: C++ -> C-style OO programming (Stroustrup)
How to choose from this Tower of Babel?
Dominant Programming Paradigms
imperative: explicit mutable state
functional: (closed or recursive) functions over (usually) immutable values
object-oriented: (public) behavior with (private) state
actors: run to completion, concurrent behavior with local/transient state
logic: queries over facts and rules
So why are things so imperative?
Computer architecture (even now with multicore) tends to emphasize stored-program computing with global shared memory.
The first theoretical models of computing, largely attributable to Turing and von Neumann, form the basis for this view.
Each deserves a look.
Turing Machine, 1936
Alan Turing’s “abstract machine” that predates modern programmable computers.
- A tape which is divided into cells, one next to the other.
- A head that can read and write symbols on the tape and move the tape left and right one (and only one) cell at a time.
- A table of instructions: (si, symbol at current position, sj, replacement symbol, L or R)
- A state register that stores the state of the Turing table.
- All programming languages today are said to be Turing-equivalent, because they can all be reduced to the Turing machine as defined here.
- Compare these concepts to the ideas of von Neumann.
von Neumann, EDVAC Report, 1945
Von Neumann was influential in many endeavors (mathematics, physics, game theory, and computing) and was involved in the Manhattan project and the EDVAC computer (successor to the ENIAC of Eckhert and Mauchly).
Three ideas that have shaped modern computer engineering:
- “First: since the device is a computor (sic), it will have to perform the elementary operations of arithmetics….”
- “Second: the logical control of the device is the proper sequencing of its operations (by…a control organ).”
- “Third: Any device which is to carry out long and complicated sequences of operations…must have a considerable memory.”
These views of computing are identical and were independently conceived by Babbage in the design of the Analytical Engine.
How to Choose?!?
Choose by trying to match the paradigm to the problem.
Some solutions benefit from multiple paradigms and/or languages.
Avoid the hammer/nail syndrome of the late 1990s. More soon.
Zoom Out: Fundamental Computational Models
Turing machine/von-Neumann architecture: model of the hardware; low-level, unnatural
lambda calculus: model of the computational behavior; simple, mathematical, natural
recursion: direct model of underlying mathematical theory
These three models are equivalent in computational power.
Church-Turing thesis: these three models express anything that is algorithmically computable.
Lambda Calculus
<Exp> ::= <ident>
| lambda <ident> . <Exp> -- function abstraction
| <Exp> <Exp> -- function application
| <constant> -- needed "only in practice"
(\f (x, y) -> (f x, f y)) (*2) (2, 3) -> (4, 6)
Sufficiently powerful to define everything within.
By encoding.
Yes, including recursion.
Make it Safe: Type Systems in Programming Languages
A type system is a mechanism to classify program fragments according to the type of value they compute. Can be used to ensure values are used as intended.
A type error is a type-related erroneous program behavior.
Type safety is the use of a type system to prevent type errors.
2 + 3 -> 5
2 + "3" -> error (possible in C)
3[4] -> error (possible in C with cast)
Timeline of the Language/Type System Pendulum
Since mid 1980s: statically typed OO languages: C++, Java, C#
Since late 1990s: dynamic languages: Perl, Python, Ruby
wanted productivity (expressiveness, flexibility, extensibility, etc.)
build confidence in correctness through extensive testing (TDD)
performance? avoid bottleneck by using as glue for C/C++ libraries
Back since late 2000s: statically typed functional languages: ML and its descendants F#, Haskell, and OCaml; Scala
want type safety, performance, and agility
want better support for multi-core hardware
ThoughtWorks Technology Radar: Languages Section
Main Statically Typed Contenders: Haskell, F#, Scala
Haskell: ML descendant, but lazy and without mutable state, no OO
Scala: Java-based, mutable and immutable state, eager, functional on top of OO
F#: OCaml descendant, mutable and immutable state, eager, OO on top of functional
All have powerful type systems, but Haskell the most.
All have support for concurrency.
Interesting Dynamically Typed Contenders
Clojure: focus on concurrency and high performance
Erlang: focus on scalability and soft real-time support
JavaScript: new front ends (e.g., CoffeeScript), becoming a popular target VM
Running Example: Process Tree
Transform this
laufer 20736 25090 0 Apr16 pts/5 00:00:00 /bin/bash
...
laufer 21274 20736 0 Apr16 pts/5 00:00:01 emacs -nw test2bb.txt
laufer 21281 21274 0 Apr16 ? 00:00:00 /usr/bin/aspell -a -m -B --encoding=utf-8
...
laufer 25090 1 0 Apr15 ? 00:01:18 guake -OO /usr/lib/guake/guake.py
...
laufer 28849 25090 0 11:24 pts/9 00:00:00 /bin/bash
to this
25090: guake -OO /usr/lib/guake/guake.py
20736: /bin/bash
21274: emacs -nw test2bb.txt
21281: /usr/bin/aspell -a -m -B --encoding=utf-8
28849: /bin/bash
Edge reversal!
In Java
private final Map<Integer, String> pMap = new HashMap<Integer, String>();
private final Map<Integer, List<Integer>> tMap = new HashMap<Integer, List<Integer>>();
String line = null;
while ((line = in.readLine()) != null) {
final Process p = parser.parseString(line);
pMap.put(p.pid, p.cmd);
if (! tMap.containsKey(p.ppid))
tMap.put(p.ppid, new ArrayList<Integer>(CHILD_LIST_SIZE));
tMap.get(p.ppid).add(p.pid);
// now print tMap as a tree
Languages with name-based typing require a name for every type! Verbose, not agile.
In a Typical Dynamic Language
pMap = new HashMap
tMap = new HashMap
line = null;
while ((line = in.readLine()) != null) {
p = processParser.parseString(line);
pMap.put(p.pid, p.cmd);
if (! tMap.containsKey(p.ppid))
tMap.put(p.ppid, new ArrayList(CHILD_LIST_SIZE));
tMap.get(p.ppid).add(p.pid);
// now print tMap as a tree
Fictitious language. (I kind of leapfrogged over dynamic languages.)
Type erasure of Java example.
In F# Using .NET Collections
let cMap = new Dictionary<int, String>()
let tMap = new Dictionary<int, ICollection<int>>()
let bOut = new StreamWriter(new BufferedStream(Console.OpenStandardOutput()))
let mutable line = Console.ReadLine()
while line <> null do
let (words, pid, ppid) = // ...parse from line...
cMap.[pid] <- line.Substring(iCmd)
if not (tMap.ContainsKey(ppid)) then
tMap.[ppid] <- new List<int>(8)
tMap.[ppid].Add(pid)
line <- Console.ReadLine()
// now print tMap as a tree
Looks like a C# program.
Performs like a C# program!
In F# Using F#'s Built-In Collections
let rec lines = seq {
let line = Console.ReadLine()
if line <> null then yield line ; yield! lines
}
let parse (line: string) = // ...
let ps = lines |> Seq.map parse |> List.ofSeq
let cMap = ps |> List.map (fun (pid, _, cmd) -> (pid, cmd)) |> Map.ofList
let tMap = ps |> Seq.ofList |> Seq.groupBy (fun(_, ppid, _) -> ppid) |> Map.ofSeq
// now print tMap as a tree
Compose computations at the collection level.
The only mutable state is I/O.
About 50% performance overhead over previous version. In part because these maps are tree-based.
In Scala
val lines = scala.io.Source.fromInputStream(System.in).getLines
val out = new BufferedWriter(new OutputStreamWriter(System.out), IO_BUF_SIZE);
val parse = parseLine(lines.next())
val ps = lines map parse toList
val pmap = ps map { case (pid, _, cmd) => (pid, cmd) } toMap
val tmap = ps groupBy (_._2)
// now print tMap as a tree
Compose computations at the collection level.
The only mutable state is I/O.
In Haskell
printProcessTree :: [String] -> [String]
printProcessTree (header : lines) =
let
ps = map (parser header) lines
pmap = IntMap.fromList ps
tmap = IntMap.fromListWith (++) $ map (\(_, (pid, ppid, _)) -> (ppid, [pid])) ps
showTrees l i = -- ...convert tmap to strings...
in
showTrees 0 0
main :: IO()
main = interact $ unlines . printProcessTree . lines
Compose computations at the collection level.
The only mutable state is I/O.
But even this is hidden in interact
using dependency injection.
Monadic Style in Haskell
Different example: echo back input lines and print how many were there.
mainLoop :: Int -> IO ()
mainLoop count = count `seq` do
eof <- isEOF
if eof
then
putStrLn $ show count
else do
line <- getLine
putStrLn line
mainLoop $ count + 1
main :: IO ()
main = mainLoop 0
Looks like imperative code.
Mutable state of I/O is safely partitioned into I/O monad.
Higher-Kinded Typing in Haskell (and Scala)
Haskell has no subtype polymorphism, duck typing, or the like.
But...
fmap (*2) (Just 3) -> Just 6
let t = Node 1 [Node 2 [], Node 3 [Node 4 [], Node 5 []]]
fmap (*2) t -> Node 2 [Node 4 [], Node 6 [Node 8 [], Node 10 []]]
...it has typeclasses for systematic function overloading.
class Functor f where
fmap :: Functor f => (a -> b) -> f a -> f b
Example of an abstraction over type constructors (generic types): f is Maybe a, Tree a, etc.
Scala: scalaz
F#: upstream in F* (experimental research language)
So How to Choose?
Basic Intrinsic Criteria
Foundational soundness
Adherence to design principles (Maclennan 1986)
Readability
Writability/Productivity
Reliability
Cost
Portability
Generality
REPL for exploratory programming
Support for internal and/or external domain-specific languages
All three languages score very similarly on all of these!
Matt's Extrinsic Criteria from Consulting Firm
Current expertise in workforce
Level of local area expertise (Can we hire around here?)
Cost of productive developer ("We don't want the best if we can get good enough for half the price")
Cost of training current workforce
Cost of migrating infrastructure
Cost of developer tools
Performance and scalability
Ability to offshore
Community strength (blogs, meetups, conventions)
Visibility of language (CIOs like saying they are part of up-and-coming languages)
Support (Documentation, Consultants, paid ticket support)
Other Interesting Criteria
Performance
Support for concurrent and parallel programming
Ecosystem
Major projects
Community including books, industry backing, etc.
Other tools
"Use what your friends use"
Popularity of Languages
- based on site statistics; Haskell 16, Scala 17, F# 41
Tiobe index at www.tiobe.com
- based on queries of eight search engines; Haskell 37, F# 39, Scala 45
Assessment of Main Contenders: Haskell
very powerful type system
extensive standard library
parser combinators for external DSLs
cohesive, active community
development tools not very mature
Assessment of Main Contenders: Scala
parser combinators for external DSLs
actors
typesafe stack: akka middleware, play web app framework, spray web services toolkit
high project activity
Java interoperability
good development tools
active community
Assessment of Main Contenders: F#
CLR performance advantages over JVM for scientific/financial computing
numerical computing, units of measure
fslex/fsyacc lexer/parser generators for external DSLs
asynchronous computing (actors), data and task parallelism
supported product status
.NET interoperability
excellent development tools
cross-platform support getting there
growing community