3. The Imperative and Object-Oriented Paradigms in Scala

In this chapter, we discuss the imperative and object-oriented programming paradigms with examples in Scala.

3.1. Options for running Scala code

In this section, we discuss the different options for running Scala code, including applications and tests.

  • The simplest way to run Scala code fragments is through the Scala REPL (read-eval-print loop). We can launch the Scala REPL and then evaluate definitions and expressions:

    $ scala
    Welcome to Scala 3.2.0 (17.0.4.1, Java OpenJDK 64-Bit Server VM).
    Type in expressions for evaluation. Or try :help.
    
    scala> 3 + 4
    res0: Int = 7
    
    scala> def f(x: Int) = x + 2
    f: (x: Int)Int
    
    scala> f(3)
    res1: Int = 5
    
    scala> val z = f(4)
    z: Int = 6
    
    scala> Seq(1, 2, 3).map(f)
    res2: Seq[Int] = List(3, 4, 5)
    

    This is a very effective, painless way to conduct initial explorations. The drawback of this approach is a lack of support for managed dependencies, which are required for more advanced work. In that case, starting the Scala REPL through sbt as discussed below is a much better choice. Managing the Scala/Java classpath manually is discouraged.

    You can also run simple scripts (with optional command-line arguments) directly through the scala interpreter. A main method or @main annotation is required, e.g.:

    $ cat > blah.scala
    def main(args: Array[String]) = println(args.toList)
    $ scala blah.scala 1 2 3
    List(1, 2, 3)
    
  • In a Scala IDE such as IntelliJ IDEA, we can run Scala applications (classes/objects with a main method) and Scala tests from within the IDE. To pass command-line arguments to an application, we have to create a suitable run configuration.

  • It is best to use sbt (the Scala Build Tool) for projects with one or more external dependencies because of sbt’s (and similar build tools’) ability to manage these dependencies in a declarative way:

    $ sbt test
    $ sbt run
    $ sbt "run arg1 arg2 ..."
    $ sbt "runMain my.pkg.Main arg1 arg2 ..."
    $ sbt test:run
    

    In addition, sbt allows you to start a REPL that exposes the code in your project and its managed dependencies. This is the preferred way to explore existing libraries:

    $ sbt console
    

    You can also pull in the additional dependencies from the test scope:

    $ sbt test:console
    

    If you want to bypass your own code in case of, say, compile-time errors, you can use one of these tasks:

    $ sbt consoleQuick
    $ sbt test:consoleQuick
    

    In conjunction with a text editor, sbt’s triggered execution for testing will significantly shorten the edit-compile-run/test cycle, for example:

    $ sbt
    ...
    > ~ test
    
  • In general, irrespective of your choice of development environment, a convenient way to do exploratory programming beyond the basic REPL is to start with a single test. There, you can develop your ideas and interact with the library APIs you want to explore. For simple testing, you can intersperse assertions within your code or use the testing support provided by the chosen testing framework, e.g., JUnit or ScalaTest. So you can start exploring something in a test and then move it into your production code (main folder) when appropriate. The list performance example illustrates this approach.

  • Finally, to turn an sbt-based Scala application into a script (console application) you can run outside sbt, you can use the sbt-native-packager plugin. To use this plugin, add this line to the end of build.sbt:

    enablePlugins(JavaAppPackaging)
    

    and this one to project/plugins.sbt:

    addSbtPlugin("com.typesafe.sbt" % "sbt-native-packager" % "1.7.5")
    

    Then, after any change to your sources, you can create/update the script and run it from the command line like so:

    $ sbt stage
    ...
    $ ./target/universal/stage/bin/myapp-scala arg1 arg2 ...
    

3.2. The role of console applications

Console applications have always been an important part of the UNIX command-line environment. The typical console application interacts with its environment in the following ways:

  • zero or more application-specific command-line arguments for passing options to the application: app arg1 arg2 ...

  • standard input (stdin) for reading the input data

  • standard output (stdout) for writing the output data

  • standard error (stderr) for displaying error messages separately from the output data

Applications written in this way can function as composable building blocks using UNIX pipes. Using these standard I/O mechanisms is much more flexible than reading from or writing to files whose names are hardcoded in the program.

E.g., the yes command outputs its arguments forever on consecutive output lines, the head command outputs a finite prefix of its input, and the wc command counts the number of characters, words, or lines:

yes hello | head -n 10 | wc -l

You may wonder how the upstream (left) stages in the pipeline know when to terminate. Concretely, how does the yes command know to terminate after head reads the first ten lines. When head is done after reading and passing through the specified number of lines, it closes its input stream, and yes will receive an error signal called SIGPIPE when it tries to write further data to that stream. The default response to this error signal is termination. For more details on SIGPIPE, see this StackExchange response.

We can also use the control structures built into the shell. E.g., the following loop prints an infinite sequence of consecutive integers starting from 0:

n=0 ; while :; do echo $n ; ((n=n+1)) ; done

These techniques are useful for producing test data for our own applications. To this end, we can redirect output to a newly created file using this syntax:

n=0 ; while :; do echo $n ; ((n=n+1)) ; done > testdata.txt

If testdata.txt already exists, it will be overwritten when using this syntax. We can also append to an existing file:

... >> testdata.txt

Similarly, we can redirect input from a file using this notation:

wc -l < testdata.txt

There is a close relationship between UNIX pipes and functional programming: When viewing a console application as a function that transforms its input to its output, UNIX pipes correspond to function composition. The pipeline p | q corresponds to the function composition q o p.

3.2.1. Console applications in Scala

The following techniques are useful for creating console applications in Scala. As in Java, command-line arguments are available to a Scala application as args of type Array[String].

We can read the standard input as lines using this iterator:

val lines = scala.io.Source.stdin.getLines()

This gives you an iterator of strings with each item representing one line. When the iterator has no more items, you are done reading all the input. (See also this concise reference.)

To break the standard input down further into words, we can use this recipe:

val words = {
  import scala.language.unsafeNulls
  lines.flatMap(l => l.split("(?U)[^\\p{Alpha}0-9']+"))
}

The result of l.split(regex) is an array of strings, where some of the strings or the entire array could possibly be null. While flatMap is supposed to preserve the element type of the transformed iterator, splitting the lines in this way could introduce null references. Because we require explicit typing of null references (by adding "-Yexplicit-nulls" to the compiler options in build.sbt), the Scala compiler considers this code incorrect and indicates an error unless we locally enable this potentially unsafe use of implicit null references.

By default, the Java virtual machine converts the SIGPIPE error signal to an IOException. In Scala, print and println print to stdout, which is is an instance of PrintStream. This class converts any IOException to a boolean flag accessible through its checkError() method. (See also this discussion for more details.)

Therefore, to use a Scala (or Java) console application in a UNIX pipeline as an upstream component that produces an unbounded (potentially infinite) output sequence, we have to monitor this flag when printing to stdout and, if necessary, terminate execution.

For example, this program reads one line at a time and prints the line count along with the line read. After printing, it checks whether an error occured and, if necessary, terminates execution by exiting the program:

var count = 0
for line <- lines do
  count += 1
  println((count, line))
  if scala.sys.process.stdout.checkError() then sys.exit(1)

3.2.2. The importance of constant-space complexity

Common application scenarios involve large volumes of input data or infinite input streams, e.g., sensor data from an internet-of-things device. To achieve the nonfunctional requirements of reliability/availability and scalability for such applications, it is critical to ensure that the application does not exceed a constant memory footprint during its execution.

Concretely, whenever possible, this means processing one input item at a time and then forgetting about it, rather than storing the entire input in memory. This version of a program that echoes back and counts its input lines has constant-space complexity:

var count = 0
for line <- lines do
  count += 1
  println(line)
  if scala.sys.process.stdout.checkError() then sys.exit(1)
println(line + " lines counted")

By contrast, this version has linear-space complexity and may run out of space on a large volume of input data:

var count = 0
val listOfLines = lines.toList
for line <- listOfLines do
  count += 1
  println(line)
  if scala.sys.process.stdout.checkError() then sys.exit(1)
println(line + " lines counted")

In sum, to achieve constant-space complexity, it is usually best to represent the input data as an iterator instead of converting it to an in-memory collection such as a list. Iterators support most of the same behaviors as in-memory collections.

To observe a program’s memory footprint over time, we would typically use a heap profiler. For programs running in the Java Virtual Machine (JVM), we can use the standalone version of VisualVM.

For example, the following heap profile (upper right section of the screenshot) shows a flat sawtooth pattern, suggesting constant space complexity even as we are processing more and more input items. By contrast, if the sawtooth pattern were sloping upward over time, space complexity would increase as we are processing our input, suggesting some function that grows in terms of the input size n.

_images/heapprofile.png

3.3. Choices for testing Scala code

There are various basic techniques and libraries/frameworks for testing Scala code.

The simplest way is to intersperse assertions within your code. This is particularly effective for scripts and worksheets:

val l = List(1, 2, 3)
assert { l.contains(2) }

The following testing libraries/frameworks work well with Scala.

  • The familiar JUnit can be used directly.

  • ScalaCheck is a testing framework for Scala that emphasizes property-based testing, including universally quantified properties, such as “for all lists x and y, the value of (x ++ y).length is equal to x.length + y.length

  • ScalaTest is a testing framework for Scala that supports a broad range of test styles including behavior-driven design, including integration with ScalaCheck.

  • specs2 is a specification-based testing library that also supports integration with ScalaCheck.

  • MUnit is a newer testing library for Scala.

The echotest example shows some of these libraries in action.

For faster turnaround during development, we can combine these techniques with triggered execution.

3.4. The role of logging

Logging is a common dynamic nonfunctional requirement that is useful throughout the lifecycle of a system. Logging can be challenging because it is a cross-cutting concern that arises throughout the codebase.

In its simplest form, logging can consist of ordinary print statements, preferably to the standard error stream (stderr):

System.err.println("something went wrong: " + anObject)

This allows displaying (or redirecting) error messages separately from output data.

For more complex projects, it is advantageous to be able to configure logging centrally, such as suppressing log messages below a certain log level indicating the severity of the message, configuring the destination of the log messages, or disabling logging altogether.

Logging frameworks have arisen to address this need. Modern logging frameworks have very low performance overhead and are a convenient and effective way to achieve professional-grade separation of concerns with respect to logging.

3.4.1. Logging in Scala

For example, the log4s wrapper provides a convenient logging mechanism for Scala. To use log4s minimally, the following steps are required:

  • Add external dependencies for log4s and a simple slf4j backend implementation:

    "org.log4s" %% "log4s" % "1.8.2",
    "org.slf4j" % "slf4j-simple" % "1.7.30"
    
  • If you require a more verbose (lower severity) log level than the default of INFO, such as DEBUG, add a configuration file src/main/resources/simplelogger.properties with contents:

    org.slf4j.simpleLogger.defaultLogLevel = debug
    
  • Now you are ready to access and use your logger:

    private val logger = org.log4s.getLogger
    logger.debug(f"howMany = $howMany minLength = $minLength lastNWords = $lastNWords")
    

    This produces informative debugging output such as:

    [main] DEBUG edu.luc.cs.cs371.topwords.TopWords - howMany = 10 minLength = 6 lastNWords = 1000
    

3.5. Defining domain models in imperative and object-oriented languages

In imperative and object-oriented languages, the basic type abstractions are

  • addressing: pointers, references

  • aggregation: structs/records, arrays

    • example: node in a linked list

  • variation: tagged unions, multiple implementations of an interface

    • example: mutable set abstraction

      • add element

      • remove element

      • check whether an element is present

      • check if empty

      • how many elements

    • several possible implementations

  • (structural) recursion: defining a type in terms of itself, usually involves aggregation and variation

    • example: a tree interface with implementation classes for leaves and interior nodes

  • genericity (type parameterization): when a type is parametric in terms of one or more type parameters

    • example: collections parametric in their element type

In an object-oriented language, we commonly use a combination of design patterns (based on these basic abstractions) to represent domain model structures and associated behaviors:

3.5.1. Object-oriented Scala as a “better Java”

Scala offers various improvements over Java, including:

More recent versions of Java, however, have started to echo some these advances:

  • lambda expressions

  • default methods in interfaces

  • local type inference

  • streams

We will study these features as we encounter them.

The following examples illustrate the use of Scala as a “better Java” and the transition to some of the above-mentioned improvements:

3.6. Modularity and dependency injection

Note

To wrap your head around this section, you may want to start by recalling/reviewing the stopwatch example from COMP 313/413 (intermediate object-oriented programming). In that app, the model is rather complex and has three or four components that depend on each other. After creating the instances of those components, you had to connect them to each other using setters. Does that ring a bell? In this section and the pertinent examples, we are achieving basically the same goal by plugging two or more Scala traits together declaratively.

3.6.1. Design goals

We pursue following design goals tied to the nonfunctional code quality requirements:

  • testability

  • modularity for separation of concerns

  • reusability for avoidance of code duplication (“DRY”)

In particular, to manage the growing complexity of a system, we usually try to decompose it into its design dimensions, e.g.,

  • mixing and matching interfaces with multiple implementations

  • running code in production versus testing

We can recognize these in many common situations, including the examples listed below.

In object-oriented languages, we often use classes (and interfaces) as the main mechanism for achieving these design goals.

3.6.2. Scala traits

Scala traits are abstract types that can serve as fully abstract interfaces as well as partially implemented, composable building blocks (mixins). Unlike Java interfaces (prior to Java 8), Scala traits can have method implementations (and state). The Thin Cake idiom shows how traits can help us achieve our design goals.

Note

We deliberately call Thin Cake an idiom as opposed to a pattern because it is language-specific.

We will rely on the following examples for this section:

First, to achieve testability, we can define the desired functionality, such as common.IO, as its own trait instead of a concrete class or part of some other trait such as common.Main. Such traits are providers of some functionality, while building blocks that use this functionality are clients, such as``common.Main`` (on the production side) and PrintSpec (on the testing side). Specifically, in the process tree example, we use PrintSpec to test common.IO in isolation, independently of common.Main.

To avoid code duplication in the presence of the design dimensions mentioned above, we can again leverage Scala traits as building blocks. Along some of the dimensions, there are three possible roles:

  • provider, e.g., the specific implementations MutableTreeBuilder, FoldTreeBuilder, etc.

  • client, e.g., the various main objects on the production side, and the TreeBuilderSpec on the testing side

  • contract, the common abstraction between provider and client, e.g., TreeBuilder

Usually, when there is a common contract, a provider overrides some or all of the abstract behaviors declared in the contract. Some building blocks have more than one role. E.g., common.Main is a client of (depends on) TreeBuilder but provides the main application behavior that the concrete main objects need. Similarly, TreeBuilderSpec also depends on TreeBuilder but provides the test code that the concrete test classes (Spec) need. This arrangement enables us to mix-and-match the desired TreeBuilder implementation with either common.Main for production or TreeBuilderSpec for testing.

The following figure shows the roles of and relationships among the various building blocks of the process tree example.

_images/ProcessTreeTypeHierarchy.png

The iterators example includes additional instances of trait-based modularity in its imperative/modular package.

Note

For pedagogical reasons, the process tree and iterators examples are overengineered relative to their simple functionality: To increase confidence in the functional correctness of our code, we should test it; this requires testability, which drives the modularity we are seeing in these examples. In other words, the resulting design complexity is the cost of testability. On the other hand, a more realistic system would likely already have substantial design complexity in its core functionality for separation of concerns, maintainability, and other nonfunctional quality reasons; in this case, the additional complexity introduced to achieve testability would be comparatively small.

3.6.3. Trait-based dependency injection

In the presence of modularity, dependency injection (DI) is a technique for supplying a dependency to a client from outside, thereby relieving the client from the responsibility of “finding” its dependency, i.e., performing dependency lookup. In response to the popularity of dependency injection, numerous DI frameworks, such as Spring and Guice, have arisen.

The Thin Cake idiom provides basic DI in Scala without the need for a DI framework. To recap, common.Main cannot run on its own but declares by extending TreeBuilder that it requires an implementation of the buildTree method. One of the TreeBuilder implementation traits, such as FoldTreeBuilder can satisfy this dependency. The actual “injection” takes place when we inject, say, FoldTreeBuilder into common.Main in the definition of the concrete main object fold.Main.