Developing software with complex business logic and many requirements can be challenging. In this article, we will explore different scala programming techniques that help us manage complexity on a type level and protect us from making specific coding mistakes.
Publish date
18 Aug, 2022
Tags
scala
static typing
data modelling
error handling
validation
Author
Ventsislav Buhlev
In software engineering, complexity can come from different places. On the one hand, we have technical challenges related to programming languages, frameworks, libraries, communication protocols, hardware limitations, etc.
But on the other hand, businesses often have complex workflows with many rules, policies, and exceptions. Sometimes, these business processes are so complex that even an army of developers can have difficulty creating and maintaining a working system.
In such complex problem domains, using a statically typed programming language to model high-level business concepts is a great way to ensure code quality and improve developers' productivity in the long run.
Scala is a statically typed language with a powerful type system that makes it easy to define new concepts without much boilerplate code. It allows you to code in both object-oriented and functional programming paradigms. Another great advantage of Scala is that it runs on the JVM, which makes it possible to reuse existing Java libraries.
In this article, we will start with a simple code example, and we will outline the problems related to it. Then step by step, we will improve the code to make it better.
Our main goal is to employ Scala's type system and compiler to do more work for us while keeping our code concise and readable. We want to make it impossible to make certain types of programming mistakes.
To provide a more personalized experience or for other reasons, most web applications have a concept of a user and a registration flow. Usually, this is a two-phase process.
First, users provide their name, email address, and other personal data. Then, as a second step, they must verify their email by clicking on a link automatically sent to them. Until the user opens the link, his account is considered unverified, and he is not allowed to access some or all of the application's functionality.
We can implement this flow using the following data model and function signatures.
case class User(
id: Long,
firstName: String,
lastName: String,
email: String,
isEmailVerified: Boolean,
)
def register(firstName: String, lastName: String, email: String): User = ???
def sendVerificationEmail(user: User): Unit = ???
def verify(user: User): Unit = ???
At first glance, we have all the necessary data and functions to support the business requirements. But when we think about it, it is easy to make programming mistakes by passing invalid data.
For example, nothing prevents us from providing an invalid email address string or having two thousand characters long first name with no last name. It is also possible to have a user with an empty email, which is verified!
One way we can solve this is by applying defensive programming techniques. First, before we do any work, we validate the input, and if something is wrong, we throw an exception. If the provided data is correct, we execute the actual behavior.
In addition, we can write documentation, create automated tests to verify edge cases and rely on code reviews for additional security. Although this is a lot of manual labor and requires considerable discipline, we can achieve great results with these practices.
Automated tests are a perfect tool to validate code's correctness and prevent regression, but they have difficulties. Most notably, tests check code only against programmed scenarios. What about other possibilities we could not think of during implementation?
People are responsible for creating and maintaining test code, which sometimes is more complicated than implementing the actual feature. In that sense, what guarantees that our tests are bug-free and will remain without defects during the project's lifetime? Have you seen colleagues fix broken unit tests by deleting or disabling them?
Imagine it is Friday, and you are heading on a long vacation with your family. At the same time, you must do a code review on a boring feature with many trivial lines of code. In such circumstances, it is hard not to fall asleep, say the least, to spot a problem with the code.
Another typical situation is when you are behind schedule, and your management does not want to move the release date. You do not want to be responsible for the missed deadline and will likely cut some corners. Even if you do not, such situations are very stressful, and it is easy to forget something.
Despite everyone's best efforts, manual approaches that rely on discipline are prone to errors, and eventually, mistakes will happen.
On the other hand, the compiler already checks our code for syntax errors and other logical mistakes. For example, your code will not compile if you pass an object of the wrong type, invoke a method that does not exist, modify immutable data, etc. The compiler will not fall asleep or forget a rule while checking your code, which makes it a very reliable tool.
The first thing we will do is replace primitive types with higher-level definitions that represent concepts from our business domain. Our implementation will rely on two simple ideas.
Put otherwise, if we can create objects only with correct data, and if their fields do not change, they will always remain valid. We can be sure of an object's correctness, and whenever we use it, we do not have to write any code to verify its state.
Consider this example:
sealed abstract case class EmailAddress(email: String)
object EmailAddress {
def create(email: String): EmailAddress = {
require(isValid(email))
new EmailAddress(email) {}
}
private def isValid(email: String): Boolean = ???
}
Let's break it down:
sealed
makes the class extendable only in this file.case class
makes its fields immutable and tells the compiler to generate
additional functions like hashCode
, equals
, copy
and apply
constructors.abstract
and case class
instruct the compiler not to generate the copy
and apply
methods in the companion object.require()
throws an illegal argument exception if the provided argument
evaluates to false.Effectively, this means that the only way we can create an instance of this
class is through the EmailAddress.create
factory method, which throws an
exception if we provide an invalid email string. This pattern is called smart
constructor1 and has additional benefits2.
Similarly, we can define more types and replace the primitives with them.
sealed abstract case class UserID(value: Long)
sealed abstract case class UserName(first: String, last: String)
case class User(
id: UserID,
name: UserName,
email: EmailAddress,
isEmailVerified: Boolean,
)
With this definition, we do not have to worry about instances with invalid names or emails. Also, we do not have to write additional code to protect us from such errors. The only place we do this is in the factory methods.
Notice that we also use a custom type to represent the user's identifier. You might think this is too much but consider this example.
case class CreditAccount(id: Long, userId: Long)
def loadUser(id: Long) = ???
val account: CreditAccount = CreditAccount(1, 2)
loadUser(account.id) // error, it should be userId
val combinedId = account.userId * account.id // what, why ???
The first error, an obvious typo, happens more often than you think. I have seen instances of this problem pass unit tests and code reviews!
Next, for the second error, you might wonder who would ever try to do arithmetics with identifiers. Unfortunately, it happens sometimes, and there is nothing to prevent people from shooting themselves in the foot. If something does not make sense, it is better to make it impossible to represent it in the code.
Actually, in the user id case, we might want only to "wrap" the primitive type
without any validations during construction. Scala's value classes4
allow us to define such wrappers, which are used only during compile time. At
runtime, we have only the instance of the wrapped value. To use this feature, we
can declare the UserId
type like this:
case class UserId(value: Long) extends AnyVal
With our new definition, when implementing sendVerificationEmail(user: User)
,
we do not have to concern anymore if the user's email is valid. However, we
still have to verify that the field isEmailVerified
is false. Sending a
verification email to a verified user is a logical mistake.
Before we see how to solve this on a type level, let us first introduce Algebraic Data Types.
Sometimes, it is more convenient to think about types as sets. For example, the string type defines the set of all possible strings. Our EmailAddress forms a subset containing only valid email addresses.
We can define new types by applying Cartesian Product between existing sets. A
product between UserId
x UserName
x EmailAddress
x Boolean
gives us our
User type. In Scala, we can represent product types with case classes or tuples.
Usually, a case class is more convenient to work with, but we can define the
same thing with a tuple.
Another way we can compose types is by making a union between sets. As an example, consider JSON values. The collection of all valid JSON values consists of all strings, numbers, booleans, arrays, JSON objects, and the null object. These are called sum or union types, and in Scala, we can define them with this syntax:
sealed trait JsValue
case object JsNull extends JsValue
final case class JsString(s: String) extends JsValue
final case class JsNumber(num: Double) extends JsValue
final case class JsBool(value: Boolean) extends JsValue
final case class JsObject(map: Map[String, JsValue]) extends JsValue
final case class JsArray(arr: List[JsValue]) extends JsValue
Let's break down this example:
final
keyword forbids further extension for classes.sealed
keyword makes this trait extensible only in this file.Effectively this means that all JsValue
variants exist only in this file.
Likewise, the compiler has all the necessary information to assist us when we
want to handle all possibilities of this hierarchy.
In this example, JsValue
is just a marker interface, but if we want, we can
add methods to it. They can be either abstract or have an implementation.
Notice also that JsObject
and JsArray
are defined recursively and use JsValue
in their definition, which follows the JSON format's specification. This feature
allows us to represent other recursive data structures like Trees and Lists.
Scala has a switch-like construct called pattern matching, which is very useful when working with union types.
def serialize(v: JsValue): String = v match {
case JsNull => ???
case JsString(s) => ???
case JsInt(num) => ???
case JsBool(value) => ???
case JsObject(map) => ???
case JsArray(arr) => ???
}
With this construct, we get a compiler warning if we do not handle all possible cases of a union type. We can also configure the compiler to treat these as errors to prevent compilation from succeeding.
The set of all primitive types, together with the operations for composing them, form an "algebra" in a mathematical sense. Hence, the types derived through union and product are called Algebraic Data Types.
Now, let's get back to our business requirements, where we have the concept of verified and unverified users. We can think about this in terms of a simple state machine. When created, users start in an unverified state. We can only transition to a verified state by clicking on the link.
We can define User
as a union between verified and unverified users.
sealed trait User
final case class VerifiedUser(id: UserID, name: UserName, email: EmailAddress) extends User
final case class UnverifiedUser(id: UserID, name: UserName, email: EmailAddress) extends User
Then we can use the more specific types to "embed" the business requirements directly into the functions signatures.
def register(firstName: String, lastName: String, email: String): UnverifiedUser = ???
def sendVerificationEmail(user: UnverifiedUser): Unit = ???
def verify(user: UnverifiedUser): VerifiedUser = ???
def resetPassword(user: VerifiedUser): Unit = ???
Now, if our code compiles, we can feel more confident that things work as expected. As a bonus, our code has become more readable, and the types serve as documentation. That is cool, considering we have not written that much code.
So far, we have seen that by replacing primitive types with concepts from the business domain, we can skip some error handling code and rely on the language compiler to prevent undesired behavior.
Nevertheless, we can not eliminate all error handling. At some point, we will have to deal with potentially invalid data from the outside world.
Let's get back to our factory method that creates email addresses. It looks like this:
def create(email: String): EmailAddress = {
require(isValid(email)) // throws exception
new EmailAddress(email) {}
}
The method throws an illegal argument exception when we pass an invalid email address string. It is a standard way of handling data validation, but as we shall see, there are other ways. First, let us discuss some problems related to exceptions.
The biggest downside is that exceptions are not type-safe! Nothing in the method's signature tells us which errors might get thrown. The only thing we can do is describe exceptions in the documentation. Also, we can not force others to handle these exceptions. In any case, we will have to rely on the discipline of others to read the documentation and catch the exceptions because the compiler will not do anything about it.
Exceptions break the normal flow of control. They "bubble" up the call stack until a try-catch block catches them. Often this is far from where the exception originated, making it difficult to reason about the issue and recover from it.
In addition, exception handling can add a lot of boilerplate to our code. Imagine validating a web form. The standard behavior is to perform the validation when the form is submitted and then show all errors at once. With exceptions, we will have to use multiple try-catch blocks for each input field and then combine the errors into a list. In the end, if the list is empty, we want to execute the actual logic. Otherwise, we will have to throw another exception with the combined errors. Maybe the code will end up similar to this:
def registerUser(firstName: String, lastName: String, email: String): UnverifiedUser = {
var errors: Seq[String] = Seq.empty
var userEmail: UnverifiedEmail = null
try {
userEmail = UnverifiedEmail.create(email)
} catch {
case IllegalArgumentException => errors += "invalid.email"
}
var name: UserName = null
try {
name = UserName.create(firstName, lastName)
} catch {
case IllegalArgumentException => errors += "invalid.names"
}
if (errors.nonEmpty) {
throw new InvalidFormError(errors)
} else {
val unverifiedUser = saveInDB(name, email)
sendVerificationEmail(unverifiedUser)
unverifiedUser
}
}
I have to admit that this is not the best boilerplate illustration simply because most modern frameworks have some way of dealing with data validation. Nevertheless, I am confident that everybody knows what I mean.
Despite their deficiencies, exceptions are a helpful tool when used appropriately. I am not against them in general. I only want to present other options to the reader.
Conceptually, there is nothing "exceptional" when users enter invalid data. On the contrary, we expect such mistakes, and a good software application is supposed to show error feedback and assist the user. Since this is normal, why don't we model this with types and include it in our functions' signatures?
To keep it simple, we will represent errors as a simple case class that holds a string error key. Depending on your business requirements, there might be a more suitable representation. For example, sometimes it is convenient to have all your errors modeled as an ADT and to use pattern matching.
final case class Error(key: String)
To include errors in our type signatures, we have two options. We can either add more arguments or change the return type.
Adding success and error callback arguments will look like this:
def createEmail(
email: String,
onSuccess: EmailAddress => Unit,
onError: Error => Unit): Unit = ???
This way, the compiler will force us to provide success and failure callbacks whenever we create an email. In addition, we know the error type.
Unfortunately, we have paid a great price for that. Our API has become more cumbersome to use. Instead of one parameter, now we have three, and all functions return no results!
Imagine how registerUser
will look with this callback style. We will have to
change its signature similarly, and then the implementation will have to provide
two callbacks to both Email.create
and UserName.create
. The code should end
up very similar to our exceptions-based implementation.
Overall this programming style is not very convenient and can easily lead to something called callback hell, which does not sound very pleasant.
Our other option is to return errors as part of the result. We can do this by defining an ADT that is a union between success and error cases.
sealed trait ErrorOrResult[+R]
final case class ErrorResult(errors: Seq[Error]) extends ErrorOrResult[Nothing]
final case class SuccessResult[+R](result: R) extends ErrorOrResult[R]
If you are still following, you will notice some new syntax, which I will try to explain very briefly.
In Scala, we can define generic types that take types as parameters within
square brackets. We also call them type-constructors because they are used
to create specific types. For example, we can construct List[Int]
and
List[User]
using the List
generic.
By default, there is no relation between the constructed types, although such
might exist between the type parameters. For instance, Int
extends Number
,
but there is no relationship between List[Int]
and List[Number]
. In Scala,
we can change this simply by prepending a plus symbol to the type parameter
declaration class List[+T]
. That way, we persist the subtype relationship
for the generic instances, and now List[Int]
is a subtype of List[Number]
.
In technical terms, we have used variance annotation to make the type parameter
contravariant5.
Notice that our failure case does not have a type parameter and extends the
base trait with a fixed type. Nothing
is a specific type with no instances
that it is a subtype of every other class in Scala. Because of this and the
contravariant type parameter, we can pass ErrorResult
instances everywhere we
need ErrorOrResult and no matter the actual type parameter.
Now we can modify createEmail
to return an error or result of EmailAddress
.
The body is pretty much the same, except that the error is returned instead of
thrown as an exception.
def createEmail(email: String): ErrorOrResult[EmailAddress] = {
if (isValid(email))
SuccessResult(new EmailAddress(email) {})
else
ErrorResult(Seq(Error("errors.invalid.email")))
}
At first glance, the type signature looks much better. Similar to the callback example, the compiler will force us to handle both cases when we deal with the result.
However, the most significant difference is that composing functions with such signatures is much easier. In functional programming, many well-known patterns allow you to work with container types like ErrorOrResult. Functors, Monads, and Applicatives, to name a few.
We will not explain how this works because it is not the simplest of topics, and
it is out of the scope of this blog post. Instead, we will only show an example
of registerUser
that uses the popular Scala Cats library. I hope this will
spark your curiosity and make you explore this topic further.
def register(
firstName: String,
lastName: String,
email: String): ErrorOrResult[UnverifiedUser] = {
(UserName.create(firstName, lastName), EmailAddress.create(email))
.mapN(saveUserInDB)
.map(sendVerificationEmail)
}
private def saveUserInDB(
userName: UserName,
emailAddress: EmailAddress): UnverifiedUser = ???
private def sendVerificationEmail(user: UnverifiedUser): UnverifiedUser = ???
I am sure that readers, who are not familiar with functional programming, will appreciate the clarity of this implementation. Some might even find it hard to believe it works, but it does, and we have tests to prove it6!
You can look at the footnotes for further references7.
It is important to note that Cats already provides a Validated
type8
and an Applicative for it. The ErrorOrResult
type in this post is just
for illustrative purposes.
This article showed how static code typing helps you enforce business rules across code bases. Each demonstrated technique eliminates particular programming mistakes by making them impossible to write without compile error.
We saw how replacing primitive types with more concrete concepts, controlling an object's construction and immutability, eliminates the need to verify its state.
Then, we introduced Algebraic Data Types and used them to model the states of
our User entity. We used concrete state types everywhere we had such requirements.
UnverifiedUser
in our verify function and VerifiedUser
when initiating a
forgotten password flow.
Next, we discussed exceptions, their type safety, and other inconveniences.
Finally, we showed how to include errors in function signatures, which forces developers to handle both success and error cases.
These techniques improved the type safety of our code base. They also reduced the amount of code necessary to defend from incorrect usage of our functions.
In addition, both our production and test code becomes more readable and self-documenting.
Note that these techniques do not substitute for automated tests, code reviews, or other good practices. Instead, they save us time and allow us to focus on the essence of the software and its business logic.
We have barely scratched the surface of this topic, but I hope you see how static typing can be valuable when modeling complex business logic. There is much more to it, but covering everything in a single article is unattainable. Because of this, I will outline a few more subjects worth exploring.
ADTs are a powerful concept, and I suggest you look at more examples like Either, Option, Try, List, NonEmptyList, Tree, etc.
We can think about the ErrorOrResult type as something that enhances our function by adding the ability to return errors or results. These "abilities" are often called "effects" in FP jargon. Similarly, we can add more effects, like the effect of asynchronicity or the effect of executing code in another thread. Different patterns in FP help us deal with effects and functions defined in terms of those effects. FP is a vast topic, but it is definitely worth the time to learn9.
Custom types are great as long as they represent valuable concepts in your problem domain. Understanding requirements and then deciding what objects to have is a challenging endeavor. Domain-Driven Design(DDD) is a collection of principles, practices, and patterns that help developers craft an elegant system design closely following business requirements10.
Dependency Injection is a handy technique to decouple code components. The most popular DI frameworks for Java are Spring and Guice. They create objects and wire dependencies at runtime. This approach minimizes boilerplate but comes at the cost of losing compile time safety. It is also possible to do dependency injection during compilation 11 12.
Creating smart constructors in Scala has some caveats and depends on the language version. Check this for more details. ↩
The first chapter in the book Effective Java by Joshua Bloch is devoted to object creation and destruction. It starts with describing the pros and cons when using static factory methods instead of object constructors. ↩
You can check the cats Eq type class. ↩
More information about value classes in the Scala's documentation. ↩
Variance lets you control how type parameters behave with regards to subtyping. ↩
Here is the repository with our code examples. ↩
A type class
is an abstract, parameterized type that lets you add new behavior to any closed
type without sub-typing. It is a general and powerful pattern. This
is how the mapN
method is added to scala tuples. The implementation for mapN
comes from an Applicative
instance that we have programmed to combine errors when multiple things fail.
Check the repository
for to see the full examples. ↩
You can check the Cats documentation and this video. ↩
Functional programming in scala is a great introduction book. Scala with cats is great for learning basic FP patterns. ↩
The book Domain modelling made functional by Scott Wlaschin is an excellent introduction to DDD and how to apply it with FP. ↩
Here is a good article about DI in Scala. ↩
Dagger is a fully static, compile-time dependency injection framework for Java, Kotlin, and Android. ↩