The Downsides of Microservice Architecture
Why It’s Not Always the Best Option
Table of contents
The goal of Microservice Architecture(MSA) is to deconstruct a monolithic service into several stand-alone services based on certain criteria, reducing the complexity of each application and the coupling between them. Ultimately, this aims to make the overall system more productive in development and operation.
As system requirements increase, applications tend to grow in size, eventually becoming a massive monolithic application. Such growth comes with disadvantages, hence the drive to break down services into smaller components.
Drawbacks of Monolithic Applications
Low Development Productivity
The more requirements a system has, the more the codebase will grow. This can make it increasingly difficult for developers to grasp every feature, thus making it harder to add new features or modify existing code. Build times also increase. For instance, consider a food delivery service; even a minor code change in the food ordering logic could require rebuilding and redeploying almost the entire application. Slower build times can delay the cycle from code change to redeployment, reducing development productivity. However, if the order service were separated, only the order service would need to be built and redeployed. Moreover, an excessively large codebase can slow down the performance of the IDE, which can also impact development productivity depending on the development environment.
Operational Difficulties with Teams
As the application grows, so does the number of people required to manage it. If a team becomes too large, it may encounter difficulties in consolidating opinions or making decisions. Even if the development and operation are divided among multiple teams, the lack of clear role boundaries can lead to confusion over responsibilities. But if a large service is split into several smaller ones, each team can independently develop and operate its service. There are also additional benefits, such as the freedom to adopt different programming languages or technologies within each team.
Challenges in Fault Isolation
Take a stock trading service, for example. In a monolithic application, if the API returning a user’s stock order history encounters a problem and the application crashes, users will not be able to place orders until recovery, potentially resulting in significant financial loss for the company. However, if the order history service and order service are separated and run as independent applications, a failure in the order history service will prevent users from viewing their order history but wouldn’t affect the ability to place orders, thus potentially reducing financial loss.
Difficulties in Scaling
For example, image processing is CPU-intensive and requires high CPU performance, whereas tasks that retain a lot of data in memory need large memory capacities. However, if an image processing module and a memory-intensive module are running within a single application, it’s impossible to scale each module independently, which could lead to less efficient use of resources.
Additional Considerations in MSA
While an MSA can address some of the issues inherent to monolithic applications, like all technologies, it has its trade-offs. In a nutshell, there’s more work to be done and more to consider, which means development and operational productivity doesn’t simply improve.
Inter-Process Communication (IPC)
In a monolithic system, domain services interact through method calls within a single process. However, in MSA, each service runs as a separate process, turning interactions into IPC(Inter-Process Communication). Typically, these service processes operate on separate networked machines. Therefore, methods for networked inter-process communication must be considered. The most common method would be synchronous ones like HTTP or gRPC. However, because both parties involved in the communication must be up and running for synchronous communication, availability is relatively low. Asynchronous messaging is also an option. Tools based on actor models, like Akka, that operate without a separate message broker are available, offering the advantage of not adding more components to manage, although they have lower availability. Asynchronous messaging via a message broker like Kafka, though increasing in complexity due to the need for clustering to avoid becoming a Single Point of Failure(SPOF), offers higher availability and several benefits, which is why it’s commonly used to implement event-driven microservice architectures. Additionally, when using an at-least-once delivery message broker like Kafka, one must ensure that the application’s message consumers are idempotent. Each IPC method has its pros and cons, so it’s best to choose and use the appropriate method according to your needs. Service discovery might also be necessary to determine the endpoints of different services. Tools like Netflix’s Eureka can be used, but with Kubernetes, a separate service discovery tool is unnecessary. Regardless of the method used, there might be additional code and components to manage, which results in increased development and operational costs compared to monolithic applications.
Distributed Transactions
In monolithic applications, data across multiple subdomains can be changed in an ACID-compliant manner with a single local DB transaction. For example, when an order is filled in a stock trading app, users’ balance can be reduced, the order status can be changed to ‘filled,’ and stock holdings can be increased, all within one local transaction. However, in MSA, each service has its own DB, and data previously stored in one DB is now managed across multiple DBs. If the stock trading service consists of order, stock balance, and accounting services, several local transactions executed in each service need to be bundled into one global transaction to be ACID. Without an understanding of ACID transactions, one might wonder why not simply change data by making HTTP/gRPC requests to other services every time — but this won’t be atomic, meaning that in the event of server or infrastructure failure, data consistency can be compromised, and the transaction context may be lost. There are a few ways to conduct distributed transactions; firstly, using 2PC[1]. The downside to this method is lower availability. Distributed transactions must run successfully from the start to the finish, requiring all participating components to be up and running and to perform the transaction commit request correctly. If even one component is not operational, the entire transaction cannot succeed. Thus, there is a strong dependency on the components involved in the transaction. Another method is the Saga pattern, which loosens the coupling between components participating in the transaction, offering higher availability than 2PC[2] and is widely used nowadays. In a Saga pattern, each local transaction commits sequentially through asynchronous messages. If a problem arises midway, compensating transactions are executed in the opposite order of what has been done so far. However, a Saga pattern is an ACD transaction, without the I(Isolation) in ACID. This means that if multiple transactions run concurrently, they are not isolated and can affect each other. Typical problems that can arise include lost updates, where one transaction overwrites data changed by another transaction, and dirty reads, where one transaction reads data changed by another transaction mid-execution. There are various methods to resolve issues caused by this lack of isolation. You could also use the 2PC only for data requiring high consistency, using it alongside the Saga pattern. Regardless, careful consideration is needed when using the Saga pattern to avoid significant issues.
In addition, when using the Saga pattern or publishing domain events, it’s critical to ensure that event publishing is not omitted. Therefore, Transactional Messaging must be considered to ensure that data changes and message publications occur atomically. Since DBs and message brokers have different transaction mechanisms, it’s impossible to bundle operations on both into one transaction. Thus, if you perform writes separately to both (known as dual-writes), some data may be lost, resulting in inconsistencies. If you’re writing only to the message broker, for instance, you could write data change events to the message broker and then have servers subscribe to update the DB. However, there’s a lag between message publishing and subscription, so users may not be able to see the data they’ve just changed immediately. Although local caching can resolve this issue, it becomes complicated when multiple instances are running. Therefore, it’s preferred to write only to the DB. The Transactional Outbox Pattern can be used to bundle the writing of stored data changes and the publication of events you want to send in one transaction. The CDC(Change Data Capture) is a typical method to implement the Outbox pattern. CDC can be implemented using either polling or transaction log tailing. Polling, which involves periodic DB queries, is simple to implement but can unnecessarily burden the DB and might have a slight delay in capturing data changes. Transaction log tailing, as the name suggests, involves tailing the DB transaction log. While this method is complex and increases the operational processes, it doesn’t burden the DB and can send data change events to the message broker in real-time, offering better scalability. Debezium is a well-known transaction log tailing CDC tool that leverages Kafka Connect. Of course, additional system components may be needed for CDC. For example, if you use Debezium, you’ll need to operate a Kafka Connect cluster.
When using the Saga pattern, depending on the system requirements, you’ll need to decide whether to use an Orchestration Saga or a Choreography Saga. If you choose an orchestration saga, you’ll also need to store and manage the orchestrator state in the saga coordinator. Alternatively, you could use Event Sourcing instead of the Outbox pattern. With event sourcing, events become the source of truth for data, so there’s no need to update a local DB separately when changing data — you only publish events to the Event Store. The Axon Framework is a notable example that uses Event Sourcing instead of the Outbox pattern.[3]
Querying
As mentioned earlier, in monolithic applications, all data is usually stored in one DB, so joining multiple pieces of data can be done with a simple join query. However, in MSA, data is stored in different physical DBs, making join queries impossible. There are mainly two ways to solve this problem. The first is to use the API Composition Pattern, which is essentially joining data in memory. However, this method has clear limitations. In-memory joins are naturally slower than database joins, which can lead to slower response times with large volumes of data. Additionally, each service must fetch only the data it needs for the join, and the required fields may not exist in some services’ data. It’s not feasible to fetch all the data for the join either. Another method is to use CQRS. This involves creating a microservice for data retrieval and subscribing to domain events to store a replica of the necessary data in one database, which is a view database. This way, all the data needed for joins is stored in a single database, making join queries possible. To keep each data replica up to date, whenever the original data changes, each microservice must publish an event to notify the change, and the view service must subscribe to these events and update accordingly. The data change and event publication must be atomic, so an outbox pattern can be utilized. However, if events are published at the application level, bugs in the code or direct database modifications via DML can prevent the event from being published, leading to desynchronization between the original and the replica. In such cases, using CDC which ensures event publication at the infrastructure level could be a solution.
Testing
As mentioned earlier, data interactions between domain services in a monolithic application were simply method calls. However, as domains become separate services, they must interact via Inter-Process Communication(IPC). Communication between services can be synchronous through REST or gRPC or asynchronous messaging through Kafka. Regardless of the method, testing the same scope in MSA requires additional components like a mock server or a library like Spring Cloud Contract because network communication is involved, unlike in a monolithic application where simply mocking objects sufficed. If different teams develop and operate each service, testing each API may involve more complex processes like Consumer-Driven Contract Testing.
Operations
One of the disadvantages of monolithic applications was the difficulty of development and operation, but MSA also increases development and operational costs for different reasons. Firstly, the number of components to manage increases. A single service may be divided into multiple smaller services, and additional components, such as those for query services, may be required. Consider a client that needs various types of data. In a monolithic application, a single API call would suffice for the client to fetch all required data. With MSA, since the necessary data is managed by different services, multiple requests would have to be made. However, this approach is not maintenance-friendly and can negatively affect performance. To solve this, an API that aggregates the data is needed. This could be implemented in one of the existing services, but from a separation of concerns perspective, it’s better to handle this in a separate component. An API Gateway could fulfill this role. Another disadvantage is the distribution of log files. This can be resolved by building a log collection pipeline using, for example, logging libraries or the Elastic Stack. Troubleshooting and monitoring are also relatively more challenging. Using a service mesh like Istio[4] or Linkerd could mitigate this.
Security
While domain service interactions in a monolithic application were method calls within a single process, they are network communications in MSA, requiring more attention to encryption. Istio can be used to apply encryption relatively easily through mTLS without altering the existing code.
Conclusion
Having personally experienced the aspects mentioned above, one might ponder whether adopting MSA is the best approach despite its many advantages. I believe that if the team or the system is small, it might be better to start with a monolithic service and shift to an MSA when the need arises. The time to switch to MSA might be when the cost of developing and operating a monolithic application exceeds that of MSA. Although it’s difficult to quantify this cost, it’s important to identify the right moment for migration. Furthermore, designing for Domain-Driven Design(DDD) and keeping modules well-separated from the beginning can facilitate later splitting into separate services. For example, if multiple entity classes reference each other, it will be quite a challenge to extract each service later on, as references to objects in other JVMs are not possible for sure. So, if considering a transition to MSA, designing for indirect references through primary keys between aggregate roots is advisable. When decomposing a monolith into several microservices, starting from scratch or migrating all at once can be time-consuming and unrealistic in a business context because developing new features or modifying existing ones will likely be the priority over refactoring. Therefore, using the Strangler Pattern[5] or introducing a separate routing layer at the front to handle new business requirements while progressively separating services or APIs may be the best approach.
[1]: 2PC stands for Two-Phase Commit, and in simple terms, it’s a protocol where a coordinator managing distributed transactions requests local transaction commit from the participating components in the first phase and then actually commits the local transactions in the second phase.
[2]: 2PC requires that all components participating in a distributed transaction be operational simultaneously for the transaction to be executed; hence, the availability of 2PC is the product of the availability of each component. On the other hand, with Sagas, if any of the transaction participants goes down, they can continue from where they left off after being restarted, which means the transaction can proceed without all components being operational at the same time, resulting in relatively higher availability. Typically, components involved in Sagas communicate through a message broker.
[3]: https://discuss.axoniq.io/t/the-outbox-pattern/2031
[4]: Istio facilitates the use of various features such as circuit breaking, retries, and telemetry with minimal to no changes to existing code by injecting an Envoy sidecar proxy into each pod. Moreover, with add-ons like Jaeger, Kiali, Prometheus, and Grafana, it significantly simplifies tasks such as distributed tracing, traffic management, and system monitoring, providing a high-level view of communications between microservices.
[5]: The strangler fig is a plant commonly found in tropical rainforests, which entwines itself around trees as it grows, reaching for sunlight above the forest canopy. Eventually, it can completely envelop the tree, and when the host tree dies and decays, it leaves behind a hollow, tree-shaped lattice of vines. The Strangler Pattern in Microservice Architecture(MSA) draws an analogy from this plant, depicting how a strangler application gradually reduces the role of an existing monolith, eventually leading to its demise. As services are incrementally separated from the monolith, they collectively form what is known as the strangler application. This strangler application grows over time until the original monolith diminishes and either disappears or becomes just one of many microservices.
(This posting is imported from my original one written on my another blog in Oct, 2021)