Doctrine Repositories should be collections without flush

More then a year ago I wrote the Repository Pattern article. This provides a good overview of the repository pattern. The follow-up article; Repository Pattern in Symfony shows an implementation of this pattern. But that implementation is not perfect. There still things we can improve on it.

The premise of this article is to actually think more in terms of collections. And we will also take a look in the UnitOfWork and the flush method that Doctrine provides to us.

Think in collections

It is important to remember that repositories are nothing more than collections of things. Thus, we should think in collections. A Product repository is nothing more than a collection of products.

This allows us to abstract away the persistence details. Whenever we implement our repository, we should think in collections, not in databases or Doctrine DQL.

We only have to think in these persistence details when creating the repository implementations. Outside that, we should act as we have no idea about these details. We only know that we have a repository and that it is a collection. We want to insert, remove or find data in it.

How do we do that? First define the interface to represent our repository as a collection. This is our contract.

interface ProductRepositoryInterface
{
    public function add(Product $product): void;

    public function remove(Product $product): void;

    public function findById(ProductId $productId): ?Product;

    public function size(): int;
}

As you can see. We have a collection of Products. We can add or remove products to or from this collection. We can find a Product by its id, and last but not least, we can get the size of the collection.

This is the most basic interface you ever need. We could add multiple specific find methods if we like to. Or use any other strategy to find data in our repository. As long as you keep in mind, the repository is a collection.

An in-memory repository

The first step you should take is to create an in-memory implementation of your repository. This allows you to see and test that the repository interface you create is actually correct. If you are able to construct a working in-memory repository. Then your repository is a collection.

final class InMemoryProductRepository implements ProductRepositoryInterface
{
    private $products = [];

    public function add(Product $product): void
    {
        $this->products[$product->getId()->toString()] = $product;
    }

    public function remove(Product $product): void
    {
        unset($this->products[$product->getId()->toString()]);
    }

    public function findById(ProductId $productId): ?Product
    {
        if (isset($this->products[$productId->toString()])) {
            return $this->products[$productId->toString()];
        }

        return null;
    }

    public function size(): int
    {
        return count($this->products);
    }
}

If you understand the workings of this simple in-memory repository. Then you are on the right track implementing collection like repositories in your projects.

A Doctrine Repository

Now the real deal. An in-memory repository might be nice and fast. It does not persist its data. The next level is to create a Doctrine implementation.

Now that we understand that repositories should work as a collection. This implementation will be fairly simple.

final class DoctrineProductRepository implements ProductRepositoryInterface
{
    /**
     * @var EntityManagerInterface
     */
    private $entityManager;

    public function __construct(EntityManagerInterface $entityManager)
    {
        $this->entityManager = $entityManager;
    }

    public function add(Product $product): void
    {
        $this->entityManager->persist($product);
    }

    public function remove(Product $product): void
    {
        $this->entityManager->remove($product);
    }

    public function findById(ProductId $productId): ?Product
    {
        $this->entityManager->find(Product::class, $productId->toString());
    }

    public function size(): int
    {
        return $this->entityManager->createQueryBuilder()
            ->select('count(product.id)')
            ->from(Product::class, 'product')
            ->getQuery()
            ->getSingleScalarResult();
    }
}

Now another rule I like to apply is to only inject the Entity Manager Interface in your repositories. We could extend or inject the EntityRepository, but there is no good reason to do so. If only for being lazy.

We do not want unnecessary dependencies. And we can solve every situation with the query builder. Actually, that is what happens in the EntityRepository anyway. But most of the time it’s better to have more control, to actually know what you are doing and what happens.

This does not mean that you can not inject the Entity Repository and uses its methods. As long as you don’t extend from it. You are fine. Use composition over inheritance! And an even better solution might be to create your own base repository. The entity repository was not meant for you to be used anyway. It might expose methods and functionality you don’t need or want in our project.

Another thing you will notice is the lack of flush inside this repository implementation. This is a mistake many developers (including me in the past) make. An important rule is to not flush after every persist or remove in our repository.

The Unit Of Work

The UnitOfWork tracks object changes and commit them transitionally to the database. It also manages the loading from the database.

The persist and remove methods lets the UnitOfWork know that an entity needs to be persisted or removed. This should happen when we remove or add our entity to the ‘collection’. But at this point, the change has not yet been written against the database unless a flush happens. The collection is thus not in constant sync with our database. And this is actually a good thing.

Whenever you flush, the UnitOfWork will commit all the current changes against the configured database. You could add flush after every persist and remove, thus your collection and database are always in sync.

By doing a flush every time, you will not only write the changes you did in this repository. But all changes that are waiting in the UnitOfWork that have not been flushed yet. And this can cause performance issues, or even worse; unexplainable side effects. Ever had those weird issues and bugs? Well by doing flush you have an even higher risk getting those.

When to flush?

You then might wonder. When should I flush?

You only want to actually commit changes after every business transaction, after each use case.

For example; after we have created our product and done all the necessary business logic. Only then we should flush and commit the changes to the database. Only after the transaction is done, it should commit as one.

final class ProductService
{
    /**
     * @var ProductRepositoryInterface
     */
    private $productRepository;

    /**
     * @var EntityManagerInterface
     */
    private $entityManager;

    /**
     * ProductService constructor.
     * @param ProductRepositoryInterface $productRepository
     * @param EntityManagerInterface $entityManager
     */
    public function __construct(ProductRepositoryInterface $productRepository, EntityManagerInterface $entityManager)
    {
        $this->productRepository = $productRepository;
        $this->entityManager = $entityManager;
    }

    public function createProduct(): Product
    {
        // Creating product and doing all business logic regarding this
        $this->productRepository->add($product);
        // Creating product and doing all business logic regarding this
        
        $this->entityManager->flush();
    }
}

You might wonder and question. Why? Now I need to inject the Entity Manager Interface and have to manually flush every time?

You need to think as every use case and business action as being one transaction. Multiple saves and removes can happen in that one transaction. You will be dispatching events, and these will also add or remove data from another repository.

As long as this transaction is not done, it should be cancelable or failable at any point. Because when an error happens in the middle of a transaction. You definitely do not want your database in an invalid state. Or need to create unnecessary complex or bad code to work around this.

The issues might not happen or appear that much in small or dumb applications. But if you create complex and domain driven applications that are easy to understand and reason with. This becomes very important.

Conclusion

If there is one thing you need to remember. Then that is to create your repositories as collections.

Also, remember. In the last example, I used the add method to add data to the repository. But you could still name this save if you prefer that. It does not change the fact that collections should be thought of as collections.

I also want to point out to not extend from the entity repository that Doctrine provide. Always use composition instead of inheritance. This allows you the most freedom and least coupling.

Last but not least. Please do remember to not flush directly in your repositories. Instead, do this manually after each business transaction. You could create an interface where the only method is to flush. This so you don’t accidentally use the other methods provided by the entity manager outside repositories. It is your choice. Think about this.

As always, remember I only provide ideas and insights. You are the person to think about how to implement and use this knowledge in your situation. Think before you code!