Event-Driven Architecture vs Request-Response: Choosing the Right Communication Pattern
The choice between request-response and event-driven communication patterns is one of the most consequential architectural decisions in distributed system design. It determines how services couple to each other, how failures propagate, how the system scales under load, and how difficult it is to trace the flow of data through the system when things go wrong. Most teams treat it as a technology choice — Kafka versus REST — when it is primarily a design choice about how services should relate to each other.
Request-response communication — where a service calls another service and waits for a reply — is synchronous, directly coupled, and immediately consistent. The calling service knows whether the operation succeeded or failed before it proceeds. The called service knows who is calling and what they need. The call-and-response pattern maps naturally to how most engineers think about program flow, which is why it is the dominant communication pattern in most systems despite not being the best choice for many of the problems it is applied to.
When Request-Response Fails
The failure modes of synchronous request-response communication in distributed systems are well-documented and consistently underestimated until they occur in production. The calling service’s availability is coupled to the called service’s availability. When the payment service is slow, the checkout service is slow. When the user service is unavailable, every service that calls the user service during a request is unavailable for the duration of that request.
Retry logic, circuit breakers, and timeouts address some of these failure modes. They do not change the fundamental property that a synchronous call creates a dependency chain whose reliability is bounded by the least reliable service in the chain. A five-service call chain where each service has 99.9% availability has a chain reliability of 99.5% — not because any individual service is unreliable, but because the dependencies multiply.
Event-Driven Communication and Its Trade-offs
Event-driven communication — where a service publishes an event describing something that happened, and other services react to the event — decouples services temporally and functionally. The order service publishes an OrderPlaced event. The inventory service consumes it to reserve stock. The email service consumes it to send a confirmation. The order service does not know which services consume its events. The consuming services are not blocked waiting for a synchronous response.
The operational advantages are real: services can be deployed, scaled, and restarted independently. A slow consumer does not slow the producer. A failing consumer can catch up from the event log after it recovers. The system degrades gracefully when individual components fail because the event log preserves the work that would otherwise be lost.
The trade-offs are equally real. Eventual consistency — where different services’ views of the same data converge over time rather than being immediately synchronized — is appropriate for some business requirements and inappropriate for others. A system that needs to verify inventory availability before confirming an order cannot use eventual consistency for the inventory check. The confirmation must be synchronous.
Debugging event-driven systems requires tracing events across multiple services and potentially multiple event logs. The sequence of events that led to a specific outcome is distributed across the logs of every service that participated. Distributed tracing infrastructure — correlation IDs propagated through events, centralized log aggregation, event replay capabilities — is necessary for event-driven systems to be operationally manageable.
The Hybrid Reality
Most production systems use both patterns appropriately rather than committing to one paradigm universally. User-facing request paths that require immediate feedback — search queries, checkout flows, authentication — use request-response because the user is waiting and needs an answer. Background operations that do not require immediate consistency — sending confirmation emails, updating search indexes, generating reports, triggering webhooks — use event-driven communication because the temporal decoupling is valuable and the eventual consistency is acceptable.
The architectural question is which operations belong in each category. Operations where the user is blocked waiting for a result belong in request-response. Operations where the result is not immediately required and the system benefits from decoupling the initiating service from the processing service belong in event-driven.
Applying request-response to operations that should be event-driven produces systems with unnecessarily tight coupling and cascading failure modes. Applying event-driven communication to operations that require immediate consistency produces systems with difficult-to-debug eventual consistency problems and incorrect business behavior. Getting the categorization right is the design work. The technology choices follow from the design.