Self-validating Domain Model

If you have ever had a dilemma "where" to put validation logic, "when" to validate, "what" should be validated, the tips and practical experiences I will share can help you establish a viable validation strategy for a system of any scale.

This post primarily deals with the validation of business rules that target individual values, such as for example: "Username must be between 3-20 characters long". Common term for this type of pure, stateless validation is input validation. Another type of business rules are those that define whether the value is acceptable in the broader context in which it is used. "Username must be unique" is an example of such a business rule, but their validation is not the focus of this post.

Client-side or server-side validation?

You have probably encountered this question many times, but the choice should not even be questioned. You should do both. These two types of validation are by no means in conflict or exclusive, they are complementary.

It is understood that validation in server code is mandatory, because we should never trust UI. Attackers can easily bypass JavaScript or the entire UI and submit malicious data to the server.

But the client-side validation is equally important and useful. Not only is it a great convenience for users because it saves them time by giving instant feedback for the data entered, client-side validation avoids unnecessary round trips to the server by preventing users from sending invalid data at all.

Form validation is UI validation

Many proponents of the idea of server side–only validation use DRY (Don't Repeat Yourself principle) as the main argument. That may be good reasoning, but only if some important considerations are kept in mind.

The browser is just one of the interfaces or ports through which your system can be used. In terms of validating input, the same rules must apply and exist in those cases as well. For example, the registration form on the website can be one of the ways to create users, in addition to API, or a command-line and other types of tools for creating and importing users.

In the web context, form validation is often considered to be the ultimate validation, on both ends. But form is a UI concern, therefore form validation should not be regarded as server-side validation at all. Also, forms are a feature of the CRUD–style user interface, while modern web and mobile UIs have been trending towards the direction of being task based, allowing user to perform certain action, for example "Mark Todo as Done", "Set a Reminder", "Approve Friend Request", and similar. Forms are not really as common interface elements as they seem.

Most importantly, domain models are typically much more complex and rather different than view models so trying to bind a form directly to the domain model seems irrational.

Some people will find this difficult to hear, but unless you are building traditional server side rendered application, components such as Symfony or Laminas (previously Zend) Form have no place on the back-end, especially not for validation purposes.

So how do we validate input on the server side?

Validation at the domain model level

The systems we build can have complex business logic, hence we usually resort to domain modeling in order to abstract and organize that complexity by creating a web of interconnected objects where each object represents some meaningful unit. But along the way, we somehow forget to appropriately model simple values that have special meaning, such as email address, username, money, address, and similar. Consequently, logic for validating these values is typically scattered and duplicated throughout the code base, wherever such value is dealt with.

This phenomenon has inspired some critics to characterize it as a code smell called Primitive Obsession. Although I justify this criticism, I think the formulation itself is too harsh. We are not obsessed with primitives, we simply neglect the use of objects by taking shortcuts that are available.

Just because we can represent a certain concept as a primitive type, does not mean that we always should. Email address is not a string. Its textual representation is stringy and can be casted to a string, but it is a well-defined structure made up of a mailbox name, an @ symbol and a case-insensitive domain. Username is not a string neither, it isn't any text, but is usually defined by business rules for length and allowed characters. All these concepts have some new, special meaning, therefore they should be modeled accordingly.

Instead of representing some piece of domain knowledge as primitive type, make it a custom type, or in Domain-Driven Design (DDD) terminology, turn it into a self-validating Value Object that encapsulates all the business rules in a single place.

Here's how the definition of a Username value object might look like:

final class Username
{
    private string $username;

    private function __construct(string $username)
    {
        if (!preg_match('/^[a-z0-9_-]{3,20}$/', $username)) {
            throw new \InvalidArgumentException('Username must be alphanumeric string that may include "_" and "–", having a length of 3 to 20 characters');
        }

        $this->username = $username;
    }

    public static function fromString(string $username): self
    {
        return new self($username);
    }

    public function toString(): string
    {
        return $this->username;
    }
}

Validation happens at the construction time through the use of guard clauses that immediately raise an exception if the passed value is not valid according to one or more criteria. This was in a way an answer to the question of where validation should live.

Self-validation, but also another important characteristic of value objects – immutability, are a guarantee that a value object is valid for the entire time of its existence. We no longer have to worry about whether we need to validate a parameter or it has already been validated, and if not, what is the best place to validate without causing duplication.

In terms of effort and time required, creating value objects for all primitive types may seem like over-engineering. But don't forget that with a primitive string, you still need to write validation logic and apply it consistently to all necessary places in the code.

Assertions

When they hear the word "assert" or "assertion", most developers think of testing, specifically the operation represented in xUnit testing frameworks such as PHPUnit. However, the same term exists in the context of validation, where assertions are a more convenient way to implement guard clauses for input validation, by writing expressive statements instead of if/throw structures.

The PHP ecosystem is known for having multiple libraries to solve the same problem. This is also the case with assertions as there are two libraries for which I know:

The second one seems to have been born in response to some shortcomings of the original library, but I personally favor Beberlei's Assert and I think it's quite solid.

Here is a refined version of the Username constructor:

final class Username
{
    private function __construct(string $username)
    {
        Assertion::regex($username, '/^[a-z0-9_-]{3,20}$/', 'Username must be alphanumeric string that may include "_" and "–", having a length of 3 to 20 characters');

        $this->username = $username;
    }
}

The example of the Username value object validation may not be demonstrative enough because it only has one guard clause, yet it is obvious that the main benefit of using assertions is that they significantly reduce the amount of code needed for implementing input validation in your models. Also, the list of built-in assertions is huge, and it is possible to extend it as well.

Custom Assertion class

Although assertions are pure, stateless, general-purpose functions, I prefer to create my own Assertion class for these reasons:

have more control over the exception type that gets raised,
keep the domain "pure" by avoiding direct coupling with the library,
ability to write domain-specific assertions,
shield from potential BC breaks in the library.

Beberlei's Assert gives me that ability, where I can also override the thrown exception:

namespace App\User;

use Assert\Assertion;
use App\User\Exception\InvalidUserInput;

class UserAssertion extends Assertion
{
    protected static $exceptionClass = InvalidUserInput::class;
}

Custom exception type:

namespace App\User\Exception;

use Assert\InvalidArgumentException;

final class InvalidUserInput extends InvalidArgumentException implements UserException
{
}

The decision whether to create a generic custom Assertion class or one per domain concept is always context-dependent and influenced by the amount of customizations you are making and the granularity of exception types you want to have.

Value Objects Obsession

In order to fully switch to this new approach of representing simple domain concepts, the key is to strictly adhere to the use of value objects for properties, constructor parameters, method parameters, etc. Entities are typically comprised of value objects, they are a great example of fully embracing type safety:

class User
{
    protected UserId $id;
    protected Username $username;
    protected EmailAddress $emailAddress;
    protected DateTimeImmutable $createdAt;

    final protected function __construct(UserId $id, Username $username, EmailAddress $emailAddress, DateTimeImmutable $createdAt)
    {
        $this->id = $id;
        $this->username = $username;
        $this->emailAddress = $emailAddress;
        $this->createdAt = $createdAt;
    }

    public static function new(Username $username, EmailAddress $emailAddress)
    {
        return new static(UserId::generate(), $username, $emailAddress, new DateTimeImmutable());
    }
}

Just like in case of value objects, this design ensures that the entity will always be created in a valid state, regardless of the context and part of the system where it is used.

Gluing all the pieces

Despite this complete obsession with value objects, there must be a part of the code in which primitive values from the raw data submitted to the server are converted into value objects.

Commands are an ideal mechanism for abstracting use cases and the place to put the conversion logic:

class RegisterUser
{
    protected Username $username;
    protected EmailAddress $email;

    public function __construct(array $payload)
    {
        UserAssertion::keysExists($payload, [
            'username',
            'email',
        ]);
        $this->username = Username::fromString($payload['username']);
        $this->email = EmailAddress::fromString($payload['email']);
    }

    public function username(): Username
    {
        return $this->username;
    }

    public function email(): EmailAddress
    {
        return $this->email;
    }
}

You can handle this command directly in some action controller, or better yet, capture this procedure in a dedicated command handler which may also perform some additional business rule validation:

class RegisterUserHandler
{
    private UserRepository $userRepository;
    private UniqueUsernameChecker $uniqueUsernameChecker;

    public function __construct(UserRepository $userRepository, UniqueUserEmailChecker $uniqueUsernameChecker)
    {
        $this->userRepository = $userRepository;
        $this->uniqueUsernameChecker = $uniqueUsernameChecker;
    }

    public function handle(RegisterUser $command): void
    {
        if ($this->uniqueUsernameChecker->exists($command->username()) {
            throw UsernameTaken::for($command->username());
        }

        $user = User::new($command->username(), $command->email());

        $this->userRepository->save($user);
    }
}

Command Handler can then be used different contexts, such as Web action:

class RegisterAction implements RequestHandlerInterface
{
    public function handle(ServerRequestInterface $request): ResponseInterface
    {
        $payload = $request->getParsedBody();

        $this->registerUserHandler->handle(new RegisterUser($payload));

        return new JsonResponse(['success' => true]);
    }
}

User-friendly validation messages

You are probably wondering how this domain model–level validation approach reflects on the user experience, in the case of a web application for example. Here's the twist – the end user should not even come into contact with input validation assertions, and therefore not see the assertion messages. The way to ensure this is having rich UI validation optimized for better user experience. That's why client-side validation is crucial.

If you are concerned about DRY, consider that in this case code reuse might result in coupling between back-end and front-end, which is a much greater concern than to strictly adhere to the DRY principle.

At the beginning, I pointed out that domain and view models are different, the same goes for their validation. The two may seem similar, but they have a different purpose and will change for different reasons, so it's completely fine to keep them separate.

Final thoughts

Shift in mindset in terms of consistently modeling simple concepts using value objects has a positive impact on all layers of the system. Code becomes more concise, clear, without noisy if checks in places where we do not expect them. Input validation is centralized within value objects, making it easier to find and change.

Guard clauses in value objects prevent invalid data from penetrating the domain layer. If someone tries to create an entity or aggregate composed of value objects anywhere in the application, type-safe contract guarantees that the resulting object will be valid. Not only are value objects self-validating, they spread this characteristic to the entire domain layer.

Validation at the domain model level is the ultimate solution for server-side validation, it has no alternative. In addition, decent UI validation is the first line of defense, so your system should include both.

Nikola Poša