From a Monolithic to a Highly Responsive Distributed System

The stand-off between the monolithic and the distributed architecture has started years ago, almost right after microservices premiered as a new approach to architecture in 2011. By now, it is generally accepted that both have benefits and drawbacks. A monolith architecture usually has one code-base and is deployed as one single application. In contrast, a microservices architecture allows us to build up the application with small services, each is developed and deployed independently.

Under the right circumstances, both monolith and microservices can be the ideal option. For a real-world system, however, you do not need to choose between two extremes, but you can rather adopt an approach at whatever point in the range you find the best suited for your product.

In this blog post, we will show how we adapted our architecture from a monolith to a distributed system in several steps, each time adding services to find the balance between the system complexity and responsiveness.

The system at a glance

The system we are working on is a Digital Rights and Rules Manager.

Generally speaking, Digital Rights Management (DRM) is an approach to copyright protection for digital media. A DRM system prevents unauthorized redistribution of digital media and restricts the ways consumers may copy content they’ve purchased.

In its essence, our system generates a key for encryption and stores the encrypted content in the operator’s content delivery network. When users want to play this content, they need a license that contains a key for decryption of the content and policies. These policies define the quality of the content (HD, UHD), set geo-restrictions and time limits, etc. To obtain the license, users go back to our service.

Stage 1: Monolithic Architecture

Initially, the system had a purely monolithic architecture and performed only 2 tasks:

Monolithic Application

This design was good enough to handle multiple tenants with initial features:

  • The application was horizontally scalable, as deploying n-number of copies of the whole application behind the load balancer is rather easy;
  • It was easy to develop and test: we could easily test the whole application end-to-end in one place and deploy a single package.
  • The latency was good, as method calls and shared memory access is much faster than inter-process communication using messaging, RPC, etc.

Yet, when adding new functionalities, we started seeing bottlenecks in this design:

Stage 2. First Step towards Command Query Responsibility Segregation

The logical step was to group reads and writes into separate services, so they could be scaled independently in the so-called Command Query Responsibility Segregation (CQRS) pattern.

As we moved our infrastructure to AWS, we started using the replication feature of PostgreSQL. The database could scale up to 5 read replicas, allowing us to serve more license calls. But this feature didn’t support multi-master. So, there is only one master instance that accepted writes with 5 read-replicas.

Command query separation using read replicas

This design was working quite well for small-to-medium loads and had its advantages:

  • Thanks to read-replicas, the system could serve more licenses.
  • Having read-replicas in different regions, we were able to serve licenses from the nearest data center to the customers.

However, this felt only as a half measure, since the other bottlenecks remained unchanged:

Important Note:
By creating read replicas, we already gave up the important relational database property “consistency”. All read replicas became eventually consistent. We can’t guarantee that read replicas will always have the latest state as the data synchronization from the master to read replica nodes takes time. Starting from this point, the system is eventually consistent!

Stage 3. Event-driven (CQRS) distributed system

As the load from clients grew, we took the event-sourcing approach to address the database issues:

To improve resource utilization on the application side, we switched to reactive frameworks Vertx with Rx-Java to build the applications:

Event-driven CQRS Architecture

As a result, we obtained a truly distributed system that had separate data models for Read and Write operations and a separate database for each service. This design has brought new advantages:

  • Highly Scalable: As Redis is horizontally scalable and it is in the critical path for providing data for query services, the system is now highly scalable than before.
  • Flexibility: As a microservices advantage, each service can be built in a different technology stack
  • Responsiveness: Each service by having its own optimized data model made the system more responsive and performant.
  • Lower Latency: As Redis is used as a persistent data store for query services, the system has out-of-the-box distributed cache.
  • Maintainability: Having separate microservices helped with the separation of concerns that made it easier to maintain the code.
  • Debugging: By having all the events in Kafka, now we can replay the events in the development environment and easily debug issues.
  • Pluggable: At any time, it is possible to add a new service to the system that they can build up their state. Eg: New reporting service in Datawarehouse, etc..
  • Replayable: As each event is stored in Kafka in the exact order they happened, it allows us to replay the events and build up the state.

Things to be taken into consideration

As you can see, each solution worked well for a specific task, and it was a matter of fine-tuning to find the optimal approach for a particular challenge. When adding a new service to your system or splitting an existing one, you should always bear in mind what you are going to achieve and at what cost.

For now, our distributed system works well, while we are always ready to bring it to the next level — to deal with increasing loads and performance requirements — by heeding the emergence of more advanced technologies.

Written by: Iranna Patil and Barbaros Alp Irdeto

Digital security company