When transitioning from a single monolithic application to an integrated, distributed system (via APIs, ESB, or iPaaS), developers often make fatal assumptions. In 1994, L. Peter Deutsch and others at Sun Microsystems formulated the "8 Fallacies of Distributed Computing". Believing these fallacies leads to fragile integrations:
Because the network is not reliable, our integration architectures must be designed for failure.
In a monolithic system, data is stored in a single database. We rely on ACID transactions (Atomicity, Consistency, Isolation, Durability). If a bank transfer fails halfway, the database rolls everything back instantly.
In a distributed integration, Service A (Order) and Service B (Billing) have separate databases. We cannot use ACID across the internet. Instead, we rely on BASE (Basically Available, Soft state, Eventual consistency). The system might be temporarily inconsistent (e.g., the order is placed, but billing hasn't deducted the funds yet), but it will *eventually* become consistent.
To handle distributed transactions safely without ACID, architects use the Saga Pattern (a sequence of local transactions where a failure triggers compensating transactions to undo the previous steps).
If Service A calls Service B, and Service B is completely down, Service A will get a fast "Connection Refused" error. That is easy to handle.
However, what if Service B is not down, but is experiencing a CPU spike and taking 30 seconds to respond? Service A will wait. If 1,000 users make requests, 1,000 threads in Service A will be stuck waiting for Service B. Soon, Service A runs out of memory and crashes too. This is called a Cascading Failure.
To prevent this, we use the Circuit Breaker Pattern.
Figure 1: The Circuit Breaker State Machine
Modeled after an electrical circuit breaker in your house, this software pattern prevents catastrophic meltdowns:
CircuitBreakerOpenException. This gives Service B time to recover instead of hammering it with traffic.Using modern libraries like Resilience4j in microservices, developers do not code this logic from scratch. They apply a configuration wrapper around their API calls.
# Resilience4j Circuit Breaker Configuration Example
resilience4j.circuitbreaker:
instances:
billingServiceBackend:
registerHealthIndicator: true
slidingWindowSize: 100 # Evaluate the last 100 calls
failureRateThreshold: 50 # Trip if 50% of calls fail
slowCallRateThreshold: 50 # Trip if 50% of calls are too slow
slowCallDurationThreshold: 2000ms # "Slow" means taking longer than 2 seconds
permittedNumberOfCallsInHalfOpenState: 10 # Let 10 requests through to test recovery
waitDurationInOpenState: 30000ms # Wait 30 seconds before testing (Half-Open)