Better types for a better software

Developing software with complex business logic and many requirements can be challenging. In this article, we will explore different scala programming techniques that help us manage complexity on a type level and protect us from making specific coding mistakes.

Post image

Publish date

18 Aug, 2022

Tags

Author

Ventsislav Buhlev

Ventsislav Buhlev

Founder & Software Engineer

Motivation

In software engineering, complexity can come from different places. On the one hand, we have technical challenges related to programming languages, frameworks, libraries, communication protocols, hardware limitations, etc.

But on the other hand, businesses often have complex workflows with many rules, policies, and exceptions. Sometimes, these business processes are so complex that even an army of developers can have difficulty creating and maintaining a working system.

In such complex problem domains, using a statically typed programming language to model high-level business concepts is a great way to ensure code quality and improve developers' productivity in the long run.

Scala is a statically typed language with a powerful type system that makes it easy to define new concepts without much boilerplate code. It allows you to code in both object-oriented and functional programming paradigms. Another great advantage of Scala is that it runs on the JVM, which makes it possible to reuse existing Java libraries.

In this article, we will start with a simple code example, and we will outline the problems related to it. Then step by step, we will improve the code to make it better.

Our main goal is to employ Scala's type system and compiler to do more work for us while keeping our code concise and readable. We want to make it impossible to make certain types of programming mistakes.

An everyday example

To provide a more personalized experience or for other reasons, most web applications have a concept of a user and a registration flow. Usually, this is a two-phase process.

First, users provide their name, email address, and other personal data. Then, as a second step, they must verify their email by clicking on a link automatically sent to them. Until the user opens the link, his account is considered unverified, and he is not allowed to access some or all of the application's functionality.

We can implement this flow using the following data model and function signatures.

case class User(
    id: Long,
    firstName: String,
    lastName: String,
    email: String,
    isEmailVerified: Boolean,
)

def register(firstName: String, lastName: String, email: String): User = ???
def sendVerificationEmail(user: User): Unit = ???
def verify(user: User): Unit = ???

At first glance, we have all the necessary data and functions to support the business requirements. But when we think about it, it is easy to make programming mistakes by passing invalid data.

For example, nothing prevents us from providing an invalid email address string or having two thousand characters long first name with no last name. It is also possible to have a user with an empty email, which is verified!

One way we can solve this is by applying defensive programming techniques. First, before we do any work, we validate the input, and if something is wrong, we throw an exception. If the provided data is correct, we execute the actual behavior.

In addition, we can write documentation, create automated tests to verify edge cases and rely on code reviews for additional security. Although this is a lot of manual labor and requires considerable discipline, we can achieve great results with these practices.

Automated tests are a perfect tool to validate code's correctness and prevent regression, but they have difficulties. Most notably, tests check code only against programmed scenarios. What about other possibilities we could not think of during implementation?

People are responsible for creating and maintaining test code, which sometimes is more complicated than implementing the actual feature. In that sense, what guarantees that our tests are bug-free and will remain without defects during the project's lifetime? Have you seen colleagues fix broken unit tests by deleting or disabling them?

Imagine it is Friday, and you are heading on a long vacation with your family. At the same time, you must do a code review on a boring feature with many trivial lines of code. In such circumstances, it is hard not to fall asleep, say the least, to spot a problem with the code.

Another typical situation is when you are behind schedule, and your management does not want to move the release date. You do not want to be responsible for the missed deadline and will likely cut some corners. Even if you do not, such situations are very stressful, and it is easy to forget something.

Despite everyone's best efforts, manual approaches that rely on discipline are prone to errors, and eventually, mistakes will happen.

On the other hand, the compiler already checks our code for syntax errors and other logical mistakes. For example, your code will not compile if you pass an object of the wrong type, invoke a method that does not exist, modify immutable data, etc. The compiler will not fall asleep or forget a rule while checking your code, which makes it a very reliable tool.

Make illegal states impossible to represent

The first thing we will do is replace primitive types with higher-level definitions that represent concepts from our business domain. Our implementation will rely on two simple ideas.

  • First, we will not allow constructing an object with invalid data.
  • Second, we will make the object's state immutable.

Put otherwise, if we can create objects only with correct data, and if their fields do not change, they will always remain valid. We can be sure of an object's correctness, and whenever we use it, we do not have to write any code to verify its state.

Consider this example:

sealed abstract case class EmailAddress(email: String)

object EmailAddress {
  def create(email: String): EmailAddress = {
    require(isValid(email))
    new EmailAddress(email) {}
  }

  private def isValid(email: String): Boolean = ???
}

Let's break it down:

  • sealed makes the class extendable only in this file.
  • case class makes its fields immutable and tells the compiler to generate additional functions like hashCode, equals, copy and apply constructors.
  • abstract and case class instruct the compiler not to generate the copy and apply methods in the companion object.
  • require() throws an illegal argument exception if the provided argument evaluates to false.

Effectively, this means that the only way we can create an instance of this class is through the EmailAddress.create factory method, which throws an exception if we provide an invalid email string. This pattern is called smart constructor1 and has additional benefits2.

Similarly, we can define more types and replace the primitives with them.

sealed abstract case class UserID(value: Long)
sealed abstract case class UserName(first: String, last: String)

case class User(
    id: UserID,
    name: UserName,
    email: EmailAddress,
    isEmailVerified: Boolean,
)

With this definition, we do not have to worry about instances with invalid names or emails. Also, we do not have to write additional code to protect us from such errors. The only place we do this is in the factory methods.

Notice that we also use a custom type to represent the user's identifier. You might think this is too much but consider this example.

case class CreditAccount(id: Long, userId: Long)
def loadUser(id: Long) = ???

val account: CreditAccount = CreditAccount(1, 2)
loadUser(account.id) // error, it should be userId
val combinedId = account.userId * account.id // what, why ???

The first error, an obvious typo, happens more often than you think. I have seen instances of this problem pass unit tests and code reviews!

Next, for the second error, you might wonder who would ever try to do arithmetics with identifiers. Unfortunately, it happens sometimes, and there is nothing to prevent people from shooting themselves in the foot. If something does not make sense, it is better to make it impossible to represent it in the code.

Actually, in the user id case, we might want only to "wrap" the primitive type without any validations during construction. Scala's value classes4 allow us to define such wrappers, which are used only during compile time. At runtime, we have only the instance of the wrapped value. To use this feature, we can declare the UserId type like this:

case class UserId(value: Long) extends AnyVal

With our new definition, when implementing sendVerificationEmail(user: User), we do not have to concern anymore if the user's email is valid. However, we still have to verify that the field isEmailVerified is false. Sending a verification email to a verified user is a logical mistake.

Before we see how to solve this on a type level, let us first introduce Algebraic Data Types.

Algebraic Data Types or ADTs

Sometimes, it is more convenient to think about types as sets. For example, the string type defines the set of all possible strings. Our EmailAddress forms a subset containing only valid email addresses.

We can define new types by applying Cartesian Product between existing sets. A product between UserId x UserName x EmailAddress x Boolean gives us our User type. In Scala, we can represent product types with case classes or tuples. Usually, a case class is more convenient to work with, but we can define the same thing with a tuple.

Another way we can compose types is by making a union between sets. As an example, consider JSON values. The collection of all valid JSON values consists of all strings, numbers, booleans, arrays, JSON objects, and the null object. These are called sum or union types, and in Scala, we can define them with this syntax:

sealed trait JsValue
case object JsNull extends JsValue
final case class JsString(s: String) extends JsValue
final case class JsNumber(num: Double) extends JsValue
final case class JsBool(value: Boolean) extends JsValue
final case class JsObject(map: Map[String, JsValue]) extends JsValue
final case class JsArray(arr: List[JsValue]) extends JsValue

Let's break down this example:

  • First, we define the interface for our new type.
  • Then, we enumerate all possible cases of the union, which extend the base interface.
  • The final keyword forbids further extension for classes.
  • As usual, the sealed keyword makes this trait extensible only in this file.

Effectively this means that all JsValue variants exist only in this file. Likewise, the compiler has all the necessary information to assist us when we want to handle all possibilities of this hierarchy.

In this example, JsValue is just a marker interface, but if we want, we can add methods to it. They can be either abstract or have an implementation.

Notice also that JsObject and JsArray are defined recursively and use JsValue in their definition, which follows the JSON format's specification. This feature allows us to represent other recursive data structures like Trees and Lists.

Scala has a switch-like construct called pattern matching, which is very useful when working with union types.

def serialize(v: JsValue): String = v match {
  case JsNull => ???
  case JsString(s) => ???
  case JsInt(num) => ???
  case JsBool(value) => ???
  case JsObject(map) => ???
  case JsArray(arr) => ???
}

With this construct, we get a compiler warning if we do not handle all possible cases of a union type. We can also configure the compiler to treat these as errors to prevent compilation from succeeding.

The set of all primitive types, together with the operations for composing them, form an "algebra" in a mathematical sense. Hence, the types derived through union and product are called Algebraic Data Types.

Model simple state machines with ADTs

Now, let's get back to our business requirements, where we have the concept of verified and unverified users. We can think about this in terms of a simple state machine. When created, users start in an unverified state. We can only transition to a verified state by clicking on the link.

We can define User as a union between verified and unverified users.

sealed trait User
final case class VerifiedUser(id: UserID, name: UserName, email: EmailAddress) extends User
final case class UnverifiedUser(id: UserID, name: UserName, email: EmailAddress) extends User

Then we can use the more specific types to "embed" the business requirements directly into the functions signatures.

def register(firstName: String, lastName: String, email: String): UnverifiedUser = ???
def sendVerificationEmail(user: UnverifiedUser): Unit = ???
def verify(user: UnverifiedUser): VerifiedUser = ???
def resetPassword(user: VerifiedUser): Unit = ???

Now, if our code compiles, we can feel more confident that things work as expected. As a bonus, our code has become more readable, and the types serve as documentation. That is cool, considering we have not written that much code.

Error handling and exceptions

So far, we have seen that by replacing primitive types with concepts from the business domain, we can skip some error handling code and rely on the language compiler to prevent undesired behavior.

Nevertheless, we can not eliminate all error handling. At some point, we will have to deal with potentially invalid data from the outside world.

Let's get back to our factory method that creates email addresses. It looks like this:

def create(email: String): EmailAddress = {
    require(isValid(email)) // throws exception
    new EmailAddress(email) {}
}

The method throws an illegal argument exception when we pass an invalid email address string. It is a standard way of handling data validation, but as we shall see, there are other ways. First, let us discuss some problems related to exceptions.

The biggest downside is that exceptions are not type-safe! Nothing in the method's signature tells us which errors might get thrown. The only thing we can do is describe exceptions in the documentation. Also, we can not force others to handle these exceptions. In any case, we will have to rely on the discipline of others to read the documentation and catch the exceptions because the compiler will not do anything about it.

Exceptions break the normal flow of control. They "bubble" up the call stack until a try-catch block catches them. Often this is far from where the exception originated, making it difficult to reason about the issue and recover from it.

In addition, exception handling can add a lot of boilerplate to our code. Imagine validating a web form. The standard behavior is to perform the validation when the form is submitted and then show all errors at once. With exceptions, we will have to use multiple try-catch blocks for each input field and then combine the errors into a list. In the end, if the list is empty, we want to execute the actual logic. Otherwise, we will have to throw another exception with the combined errors. Maybe the code will end up similar to this:

def registerUser(firstName: String, lastName: String, email: String): UnverifiedUser = {
  var errors: Seq[String] = Seq.empty
  var userEmail: UnverifiedEmail = null
  try {
    userEmail = UnverifiedEmail.create(email)
  } catch {
    case IllegalArgumentException => errors += "invalid.email"
  }

  var name: UserName = null
  try {
    name = UserName.create(firstName, lastName)
  } catch {
    case IllegalArgumentException => errors += "invalid.names"
  }

  if (errors.nonEmpty) {
    throw new InvalidFormError(errors)
  } else {
    val unverifiedUser = saveInDB(name, email)
    sendVerificationEmail(unverifiedUser)
    unverifiedUser
  }
}

I have to admit that this is not the best boilerplate illustration simply because most modern frameworks have some way of dealing with data validation. Nevertheless, I am confident that everybody knows what I mean.

Despite their deficiencies, exceptions are a helpful tool when used appropriately. I am not against them in general. I only want to present other options to the reader.

Explicit errors

Conceptually, there is nothing "exceptional" when users enter invalid data. On the contrary, we expect such mistakes, and a good software application is supposed to show error feedback and assist the user. Since this is normal, why don't we model this with types and include it in our functions' signatures?

To keep it simple, we will represent errors as a simple case class that holds a string error key. Depending on your business requirements, there might be a more suitable representation. For example, sometimes it is convenient to have all your errors modeled as an ADT and to use pattern matching.

final case class Error(key: String)

To include errors in our type signatures, we have two options. We can either add more arguments or change the return type.

Adding success and error callback arguments will look like this:

def createEmail(
  email: String,
  onSuccess: EmailAddress => Unit,
  onError: Error => Unit): Unit = ???

This way, the compiler will force us to provide success and failure callbacks whenever we create an email. In addition, we know the error type.

Unfortunately, we have paid a great price for that. Our API has become more cumbersome to use. Instead of one parameter, now we have three, and all functions return no results!

Imagine how registerUser will look with this callback style. We will have to change its signature similarly, and then the implementation will have to provide two callbacks to both Email.create and UserName.create. The code should end up very similar to our exceptions-based implementation.

Overall this programming style is not very convenient and can easily lead to something called callback hell, which does not sound very pleasant.

Our other option is to return errors as part of the result. We can do this by defining an ADT that is a union between success and error cases.

sealed trait ErrorOrResult[+R]
final case class ErrorResult(errors: Seq[Error]) extends ErrorOrResult[Nothing]
final case class SuccessResult[+R](result: R) extends ErrorOrResult[R]

If you are still following, you will notice some new syntax, which I will try to explain very briefly.

In Scala, we can define generic types that take types as parameters within square brackets. We also call them type-constructors because they are used to create specific types. For example, we can construct List[Int] and List[User] using the List generic.

By default, there is no relation between the constructed types, although such might exist between the type parameters. For instance, Int extends Number, but there is no relationship between List[Int] and List[Number]. In Scala, we can change this simply by prepending a plus symbol to the type parameter declaration class List[+T]. That way, we persist the subtype relationship for the generic instances, and now List[Int] is a subtype of List[Number]. In technical terms, we have used variance annotation to make the type parameter contravariant5.

Notice that our failure case does not have a type parameter and extends the base trait with a fixed type. Nothing is a specific type with no instances that it is a subtype of every other class in Scala. Because of this and the contravariant type parameter, we can pass ErrorResult instances everywhere we need ErrorOrResult and no matter the actual type parameter.

Now we can modify createEmail to return an error or result of EmailAddress. The body is pretty much the same, except that the error is returned instead of thrown as an exception.

def createEmail(email: String): ErrorOrResult[EmailAddress] = {
  if (isValid(email))
    SuccessResult(new EmailAddress(email) {})
  else
    ErrorResult(Seq(Error("errors.invalid.email")))
}

At first glance, the type signature looks much better. Similar to the callback example, the compiler will force us to handle both cases when we deal with the result.

However, the most significant difference is that composing functions with such signatures is much easier. In functional programming, many well-known patterns allow you to work with container types like ErrorOrResult. Functors, Monads, and Applicatives, to name a few.

We will not explain how this works because it is not the simplest of topics, and it is out of the scope of this blog post. Instead, we will only show an example of registerUser that uses the popular Scala Cats library. I hope this will spark your curiosity and make you explore this topic further.

def register(
    firstName: String,
    lastName: String,
    email: String): ErrorOrResult[UnverifiedUser] = {
  (UserName.create(firstName, lastName), EmailAddress.create(email))
    .mapN(saveUserInDB)
    .map(sendVerificationEmail)
}

private def saveUserInDB(
    userName: UserName,
    emailAddress: EmailAddress): UnverifiedUser = ???

private def sendVerificationEmail(user: UnverifiedUser): UnverifiedUser = ???

I am sure that readers, who are not familiar with functional programming, will appreciate the clarity of this implementation. Some might even find it hard to believe it works, but it does, and we have tests to prove it6!

You can look at the footnotes for further references7.

It is important to note that Cats already provides a Validated type8 and an Applicative for it. The ErrorOrResult type in this post is just for illustrative purposes.

Summary

This article showed how static code typing helps you enforce business rules across code bases. Each demonstrated technique eliminates particular programming mistakes by making them impossible to write without compile error.

We saw how replacing primitive types with more concrete concepts, controlling an object's construction and immutability, eliminates the need to verify its state.

Then, we introduced Algebraic Data Types and used them to model the states of our User entity. We used concrete state types everywhere we had such requirements. UnverifiedUser in our verify function and VerifiedUser when initiating a forgotten password flow.

Next, we discussed exceptions, their type safety, and other inconveniences.

Finally, we showed how to include errors in function signatures, which forces developers to handle both success and error cases.

These techniques improved the type safety of our code base. They also reduced the amount of code necessary to defend from incorrect usage of our functions.

In addition, both our production and test code becomes more readable and self-documenting.

Note that these techniques do not substitute for automated tests, code reviews, or other good practices. Instead, they save us time and allow us to focus on the essence of the software and its business logic.

Other related topics

We have barely scratched the surface of this topic, but I hope you see how static typing can be valuable when modeling complex business logic. There is much more to it, but covering everything in a single article is unattainable. Because of this, I will outline a few more subjects worth exploring.

ADTs are a powerful concept, and I suggest you look at more examples like Either, Option, Try, List, NonEmptyList, Tree, etc.

We can think about the ErrorOrResult type as something that enhances our function by adding the ability to return errors or results. These "abilities" are often called "effects" in FP jargon. Similarly, we can add more effects, like the effect of asynchronicity or the effect of executing code in another thread. Different patterns in FP help us deal with effects and functions defined in terms of those effects. FP is a vast topic, but it is definitely worth the time to learn9.

Custom types are great as long as they represent valuable concepts in your problem domain. Understanding requirements and then deciding what objects to have is a challenging endeavor. Domain-Driven Design(DDD) is a collection of principles, practices, and patterns that help developers craft an elegant system design closely following business requirements10.

Dependency Injection is a handy technique to decouple code components. The most popular DI frameworks for Java are Spring and Guice. They create objects and wire dependencies at runtime. This approach minimizes boilerplate but comes at the cost of losing compile time safety. It is also possible to do dependency injection during compilation 11 12.

Footnotes

Footnotes

  1. Creating smart constructors in Scala has some caveats and depends on the language version. Check this for more details.

  2. The first chapter in the book Effective Java by Joshua Bloch is devoted to object creation and destruction. It starts with describing the pros and cons when using static factory methods instead of object constructors.

  3. You can check the cats Eq type class.

  4. More information about value classes in the Scala's documentation.

  5. Variance lets you control how type parameters behave with regards to subtyping.

  6. Here is the repository with our code examples.

  7. A type class is an abstract, parameterized type that lets you add new behavior to any closed type without sub-typing. It is a general and powerful pattern. This is how the mapN method is added to scala tuples. The implementation for mapN comes from an Applicative instance that we have programmed to combine errors when multiple things fail. Check the repository for to see the full examples.

  8. You can check the Cats documentation and this video.

  9. Functional programming in scala is a great introduction book. Scala with cats is great for learning basic FP patterns.

  10. The book Domain modelling made functional by Scott Wlaschin is an excellent introduction to DDD and how to apply it with FP.

  11. Here is a good article about DI in Scala.

  12. Dagger is a fully static, compile-time dependency injection framework for Java, Kotlin, and Android.