Asynchronous Communication in Microservices

Embracing the Art of "Fire-and-Forget"

Feb 24, 2025

Asynchronous communication in microservices is like sending a message in a bottle (Alexa, play message in a bottle by the police)—only with guaranteed delivery, error handling, and slightly less ocean. In this post, we’ll explore how asynchronous communication patterns like message buses and distributed transactions can bring order to the chaos of microservices while avoiding the pitfalls of synchronous RPC.

Introducing Asynchronous Communications

Imagine a world where services don’t impatiently tap their foot while waiting for a reply. That’s asynchronous communication. Unlike the more traditional Remote Procedure Call (RPC), where the client waits (and waits) for a response, asynchronous communication allows messages to be sent to a message bus, where they’re picked up by one or more consumers at their leisure. This is the backbone of event-driven architecture, enabling services to communicate without demanding constant attention.

Key Differences from RPC:

RPC: Direct communication between two hosts, with the client waiting for a response.
Message Bus: Messages can have multiple consumers, use dynamic routing, and store messages in queues for load balancing.

Example:

An e-commerce system processes an order. Instead of the payment service waiting for inventory and shipping confirmations (RPC style), it sends a message to the bus. The inventory and shipping services pick it up asynchronously, ensuring the system continues to operate smoothly.

Asynchronous communication shines in scenarios where:

Messages need to be sent to multiple locations.
Tasks can be processed independently, avoiding bottlenecks.

Message Bus Fundamentals

A message bus is like the postal service of microservices, but with fewer lost packages (hopefully). Here’s how it works:

Anatomy of a Message:

Payload: The business-relevant data.
Headers: Metadata such as:
- Timestamp
- Sender information
- Correlation ID (to track related messages)
- Error handling details

Practical Tip:

Keep messages small—larger payloads harm throughput and increase failure risks. If a file is too big (e.g., an image or a report), offload it to blob storage and include a reference link in the message.

Topics and Queues:

Topics: Define semantics. Consumers subscribe to topics of interest. For example, a "NewOrder" topic might notify payment, inventory, and shipping services simultaneously.
Queues: Temporary storage locations where messages wait to be processed. They support:
- Acknowledgement: Clears messages after successful processing.
- Crash Recovery: Ensures messages aren’t lost during failures.

Example:

A queue can balance the load by distributing incoming "OrderPlaced" messages across multiple instances of the payment service.

Error Handling:

Retry Policies: Implement delays and failure routing to handle transient issues.
Dead Letter Queues: Automatically handle unprocessable messages, allowing analysis and reprocessing later.

Practical Reminder:

Regularly monitor and clean dead letter queues to avoid backlogs that could mask underlying issues.

Designing Asynchronous Communication Flows

Asynchronous communication may sound simple, but designing effective flows requires careful consideration. Key challenges include:

Distributed Transactions:

Coordinating state changes across multiple services is tricky. Chaining events can:

Make routing complex.
Create difficulty in tracking pending transactions.

Example:

An order service initiates a transaction involving payment, inventory reservation, and shipping label generation. If any step fails, the system needs a way to revert changes made by previous steps.

Primary Concerns:

Message Routing: Allows for conditional routing and dynamic flows.
Transaction State: No service knows when the transaction is truly complete without central coordination.
Failure Compensation: Automatically restores the system to a consistent state when things go sideways.

Saga Pattern vs. Routing Slip Pattern

Two popular approaches for managing asynchronous workflows are the Saga Pattern and the Routing Slip Pattern. Let’s compare:

Saga Pattern:

Born at Princeton University in 1987, the Saga Pattern is ideal for long-lived transactions (think hours or days). It involves:

Fine-Grained Transactions: Compensating operations for partial failures.
Centralised State Management: A state machine dictates the flow and enables rollback.
Failure Compensation: Automatically rolls back to a consistent state.

Practical Use Case:

An online travel agency booking flights, hotels, and cars. If the flight booking fails, it triggers compensating transactions to cancel the hotel and car reservations.

Routing Slip Pattern:

Think of it as a manufacturing line for messages:

Precomputed Steps: The routing slip specifies what happens at each step.
Linear Processing: Each service processes the message and passes it along.
No Centralisation: Each service operates independently, improving performance but losing central control.

Practical Use Case:

A warehouse system where an order undergoes packing, quality checks, and labelling. Each step is handled sequentially without central coordination.

Comparing the Two Patterns

Key Insight:

In practice, you might combine both patterns to address specific challenges. For example, a Saga could coordinate high-level workflows while Routing Slips manage granular tasks within each workflow.

Conclusion

Asynchronous communication is the lifeblood of scalable, resilient microservices. Whether you’re building a messaging system with a service bus or implementing complex distributed transactions, remember:

Keep messages small and efficient.
Use dead letter queues to handle errors effectively.
Choose the right pattern (Saga or Routing Slip) based on your system’s needs.

Final Tip:

Document your messaging flows thoroughly. Diagrams and clear annotations help teams understand the system and troubleshoot issues efficiently.

And most importantly, embrace the art of "fire-and-forget" while ensuring your systems don’t actually forget anything important. With robust asynchronous design, your microservices will be ready to tackle anything—even if they’re not doing it synchronously.

Stay tuned for the next instalment in the series, where we’ll dive into the joys and challenges of real-time event streaming!

Align and Refine

Discussion about this post