Event-Driven Microservices engineering.

Event-Driven Microservices is architecture basing on events messages for communication between services. This communication type and message brokers give many benefits, and it is a great alternative for synchronous communication microservices. Unfortunately there is no fit-all architecture and also it has its own drawbacks and trade-offs, so it should be built (or even chosen) wisely. This post is a set of short tips and solutions for different kinds of problems in Event-Driven Microservices. In my previous post [link] I compared monolith and mentioned two types of microservices architectures.

Event types & size

Event driven architecture operates on different types of events. Every event has its pros and cons which are often connected to consistency, performance or reliability. Choose wisely. Event message size is also important. It is a common to use small events, medium event or large events. Small event are fast but sometimes doesn’t contain needed information to process some logic. Local data storage in consumer side might help in this case. Another solutions are medium size events which might pass all needed information. Large events pass whole state of model.

Command

It is a simple request to do something which might be validated, accepted or rejected. Commands often have a very specific purpose and have a close relation with the domain they are changing. It is used often with Send/Receive Pattern.

Event

Behave like notifier for changes in entities (using info what has changed and changed values) It is used often with Publish/Subscribe Pattern, what means that the originating service only guarantees that the message was published to the broker.

Document

Notify changes in entities but not like in Event it sends full objects instead of single values. It is used often with Publish/Subscribe Pattern.

Query

Messages for getting data.

Architecture organization

Organizing architecture is hard process. Your decision will have strong impact in the future. Remember that these patterns might be different in single architecture and its internal boundaries. There are few options how we can organize them:

Organizational Composition
Likelihood of Changes
Depending on feature/process SLA
DDD
Strong consistency

Anti-patterns

Faking Synchronous Responses - occurs when wraping asyc request with sync request for waiting for response.
Command Publishing - occurs when publishing command to many services.
Passive-Aggressive Events - occurs when a system sends events to notify other components about changes, but the events are vague and do not provide enough information about what has changed

Patterns

Designing or building event driven architecture might be supported using already tested patterns.

Eventual Consistency

Eventual consistency is a consistency model used in distributed computing systems, where updates made to data items are not immediately reflected in all replicas of the data. Instead, the updates are propagated to the replicas asynchronously, and eventually, all replicas converge to a consistent state. Requires careful management of conflicts that may arise when multiple updates are made to the same data item simultaneously. Eventual consistency in distributed systems is not that scare. This is good approach to avoid distributed transactions. The strong consistency (ACID) can only be achieved by having data in the same place. To achieve eventual consistency, distributed systems typically use mechanisms such as versioning, conflict resolution, and anti-entropy protocols to ensure that updates are propagated and reconciled in a way that eventually leads to a consistent view of the data across all replicas.

Event schema

Event schema concept is based on sending more details in event to prevent fetching that data from another services.

Reorder services

Reorder services to achieve eventual consistency, instead of processing things in parallel do it sequentially. You should use this approach sparingly and inside a specific context. Using it often will rapidly increase the complexity of the overall architecture and the flow of the information because in this case the read model will send events to domain owner. The overall time to reflect the change throughout the system will increase.

Versioning

Use Versioning to indicate if data is up-to-date or not. Version should be managed by aggregate and also should be available in API. Others services should know which version is the current one.

Store state in service

Store state in service (not in database. service.) which needs that state. however, we must avoid the pitfall of applying it blindly in every use case. Otherwise, several copies of the service’s data will spread throughout the architecture, becoming hard to manage and maintain.

End-to-end

Requires synchronous request to request other boundaries for validate data. Introduces synchronous requests but helps with eventual consistency.

Concurrency

Concurrency in event driven is not so easy because it requires careful handling events which can have impact on each other. Good example is two orders event (each ordering 1 item) handling in parallel when in inventory we have only 1 available in stock.

Pessimistic

Pessimistic strategies request access to the resource and will only act based on the response. Distributed lock or in-memory lock (when service is only one) might be used to lock access to items.

Optimistic

Optimistic strategies assume they have access to the resource, and when they don’t, they apologize by acting accordingly.

Avoid by design (end-to-end partitioning)

Handling concurrency by design relies on being able to design the system in a way concurrency is impossible. In event-driven architectures, it is often based on event routing. Solving concurrency by design isn’t always possible or practical. To avoid concurrency we can use routing keys for events or partitions (Kafka) for event/messages with proper key. Then when key is proper then we will omit concurrency problem. It is becuase the item with a given key will always go to single instance of consumer, so there is no concurrency for the same items but for different - there is. With end-to-end partitioning, we can achieve greater performance by enabling parallelism inside and outside of the service and tune the parallelism to take the most out of the physical resources while completely avoiding concurrency and ordering issues. Hotspotting might be sometimes issue.

Different services

To manage concurrency across different services, we should use a higher-level approach, like a Saga.

Transactions

Transactions and its nature might be used for handling concurrency, but they are also limited. The transaction might lock data for an unfeasible amount of time.

Resilience and Reliability

Event-driven services are less likely to suffer from cascading failures due to their decoupled nature.

Load balancing

Creating multiple instances of service and balance load to them.

Rate limiting

Limiting request to service.

Event delivery semantic

Event delivery can have three different semantics: at-least-once, at-most-once, and exactly-once. Exactly-once is the most useful and the hardest to guarantee. At-most-once loses messages but typically enjoys higher performance. At-least-once might generate repeated work, but it’s straightforward to achieve and ensures no messages are lost.

Rewinding the event stream / State restore

It meas to read data from beginning of a stream. For example from Kafka topic where messages are persisted.

Manual acknowledgments

Preventing auto commit message in message brokers.

Outbox / Transactions / Compensation

The service’s state must reflect the event stream. We can achieve this by making the event stream the source of truth, with the outbox pattern, or using transactions and compensating actions. In the outbox pattern, when a service needs to publish a message to a message broker, it first writes the message to an “outbox” table in its local database. This outbox table contains all the messages that need to be published to the message broker, along with metadata such as the message ID, timestamp, and destination topic. A separate process, called the “outbox processor,” is responsible for reading the messages from the outbox table and publishing them to the message broker

Retries

Simple reties with proper parameters.

Circuit breakers

The idea is to detect when a component is experiencing high latency, errors, or other issues, and temporarily stop sending requests to that component.

Bulkhead

The idea is to group related services into isolated compartments, or “bulkheads,” so that if one service fails, it does not affect the performance or availability of other services. Service might be replaced with separate processes, containers, or even servers for each service.

Processor service

Sometimes denormalized event are required. It means events that contains information from multiple events. This might be solved to create some kind of processor service which will join all of those events and send to origin destination.

Event Versioning

Versioning event schema might be good base for backward and forward compatibility.

Downscaler/Upscaler

Middle-services which transforms consumed event to proper versions. Then origin consumer can consume this event easily.

Sagas

Sagas are a sequence of individual operations to manage a long-running process. We can use them to divide a long-running or traditional single database transaction into smaller ones, being more suitable for a distributed environment. Choreography tends to be a typical pattern in event-driven architectures, and it synergizes well with its mindset. Orchestrations might be used for simple cases.

CQRS

We can apply CQRS to segregate writes from reads. This segregation allows an optimized model and approach for each. For example, we have two separate services where first is responsible for managing given data (sending events when data changes) and the second consumes events from first and expose interface for reading this data.

Event sourcing

Event sourcing is a valuable pattern that synergies well with event-driven architectures. It is important to understand the concerns and advantages of event sourcing as they might be useful in some use cases but also produce some complex limitations. Another is command sourcing and is related with storing commands instead of events. Used to persist and query the state of an application by storing a sequence of immutable events that have occurred in the system. Instead of storing the current state of an entity in a traditional database, event sourcing records a series of events that have affected the entity and can be used to reconstruct its current state.

Usage of message brokers

Choosing proper messages broker might be crucial for architecting event driven service. Using Kafka for example and its ability to partitioning and persisting messages might easily implement few mentioned before patterns. The main point of this is to wisely chose good message broker which will be tailored for architecture.

Tracing

Tracking messages between services. OpenTelemetry might be helpfully.

UI connection

Querying data from event-driven architecture might be not so obvious, so there are patterns how to handle that.

Aggregating layer

Service which query multiple services in backend and expose single interface.

BFFs

It is also aggregation layer but for single context. It can abstract the downstream platform functionalities and provide higher segregation of responsibilities, thus avoiding adding too much responsibility to a single aggregating layer.

Micro-frontends

We can decompose a UI application into smaller components to best adapt to the underlying microservice platform. Decomposition can be done at the application, page, and section level; we can adapt each pattern to the most adequate use case.

Event-driven APIs

To implement event-driven APIs, we can use WebSockets, Server-Sent Events, and WebHooks. WebSockets provide a way to implement two-way communication between the UI and the backend. Server-Sent Events and WebHooks are good options for one-way event delivery and push notifications.

Tips

If there is a use case for strong consistency, choosing a synchronous interaction in a request-driven service is often a better choice than using an event-driven one.
Event storming - tool for organizing business flow is really useful but hard to organize.

References:

Practical Event-Driven Microservices Architecture, Hugo Filipe Oliveira Rocha, Apress
Building Event-Driven Microservices, Adam Bellemare, Published by O’Reilly Media, Inc.