ADR-0034: Circuit Breaker Pattern for API Resilience
Status
Accepted - 2025-01-26
Context
TVL Platform integrates with external APIs (Hostaway, Airbnb, VRBO, Stripe) that may experience outages, slow responses, or rate limiting.
Decision
Circuit Breaker Pattern using opossum library to prevent cascade failures and enable graceful degradation.
Rationale
- Fail Fast: Stop calling failing API (prevent timeout pile-up)
- Auto Recovery: Automatically retry after cooldown period
- Observability: Metrics on open/closed/half-open states
- Graceful Degradation: Return cached data or default response
Alternatives Considered
Alternative 1: Retry with Exponential Backoff Only
Rejected - No failure detection, continues hitting broken API
Alternative 2: Manual Circuit Breaker
Rejected - Error-prone, no observability, reinventing wheel
Alternative 3: Service Mesh (Istio)
Rejected - Overkill for MVP, complex infrastructure
Implementation
1. Install Opossum
pnpm add opossum
pnpm add -D @types/opossum
2. Create Circuit Breaker Wrapper
// src/integrations/circuitBreaker.ts
import CircuitBreaker from 'opossum';
export interface CircuitBreakerOptions {
  timeout: number; // Request timeout (ms)
  errorThresholdPercentage: number; // % failures before opening
  resetTimeout: number; // Time before half-open attempt (ms)
  rollingCountTimeout: number; // Rolling window for error calculation (ms)
  rollingCountBuckets: number; // Number of buckets in window
}
export const DEFAULT_CIRCUIT_BREAKER_OPTIONS: CircuitBreakerOptions = {
  timeout: 10000, // 10 seconds
  errorThresholdPercentage: 50, // Open if 50% failures
  resetTimeout: 30000, // Try again after 30s
  rollingCountTimeout: 10000, // 10s rolling window
  rollingCountBuckets: 10, // 1s buckets
};
export function createCircuitBreaker<T>(
  fn: (...args: any[]) => Promise<T>,
  options: Partial<CircuitBreakerOptions> = {}
): CircuitBreaker<any[], T> {
  const breaker = new CircuitBreaker(fn, {
    ...DEFAULT_CIRCUIT_BREAKER_OPTIONS,
    ...options,
  });
  // Event listeners for observability
  breaker.on('open', () => {
    logger.warn({ breakerName: fn.name }, 'Circuit breaker opened');
  });
  breaker.on('halfOpen', () => {
    logger.info({ breakerName: fn.name }, 'Circuit breaker half-open (testing)');
  });
  breaker.on('close', () => {
    logger.info({ breakerName: fn.name }, 'Circuit breaker closed (healthy)');
  });
  breaker.on('timeout', () => {
    logger.error({ breakerName: fn.name }, 'Circuit breaker timeout');
  });
  return breaker;
}
3. Wrap External API Calls
// src/integrations/hostaway/HostawayConnector.ts
import { createCircuitBreaker } from '../circuitBreaker';
export class HostawayConnector implements ChannelConnector {
  private listPropertiesBreaker: CircuitBreaker<any[], Property[]>;
  private createBookingBreaker: CircuitBreaker<any[], Booking>;
  constructor(credentials: HostawayCredentials) {
    // Wrap each external API call
    this.listPropertiesBreaker = createCircuitBreaker(
      this.listPropertiesImpl.bind(this),
      { timeout: 5000 } // 5s timeout for list
    );
    this.createBookingBreaker = createCircuitBreaker(
      this.createBookingImpl.bind(this),
      { timeout: 10000 } // 10s timeout for create
    );
  }
  async listProperties(): Promise<Property[]> {
    try {
      return await this.listPropertiesBreaker.fire();
    } catch (error) {
      if (error.code === 'EOPENBREAKER') {
        // Circuit is open - return cached data
        logger.warn('Circuit open, returning cached properties');
        return await this.getCachedProperties();
      }
      throw error;
    }
  }
  private async listPropertiesImpl(): Promise<Property[]> {
    const response = await axios.get(`${this.baseURL}/listings`, {
      headers: { Authorization: `Bearer ${this.accessToken}` },
    });
    return response.data.result.map(this.transformProperty);
  }
  async createBooking(data: BookingData): Promise<Booking> {
    try {
      return await this.createBookingBreaker.fire(data);
    } catch (error) {
      if (error.code === 'EOPENBREAKER') {
        // Circuit is open - queue for later
        logger.error('Circuit open, queueing booking creation');
        await this.queueBookingCreation(data);
        throw new Error('Booking queued due to API unavailability');
      }
      throw error;
    }
  }
  private async createBookingImpl(data: BookingData): Promise<Booking> {
    const response = await axios.post(`${this.baseURL}/reservations`, {
      listingId: data.propertyId,
      guestName: data.guestName,
      // ... other fields
    });
    return this.transformBooking(response.data.result);
  }
}
4. Graceful Degradation Strategies
// Strategy 1: Return cached data
async getCachedProperties(): Promise<Property[]> {
  const cached = await redis.get(`properties:${this.orgId}`);
  return cached ? JSON.parse(cached) : [];
}
// Strategy 2: Queue operation for later
async queueBookingCreation(data: BookingData): Promise<void> {
  await webhookQueue.add('retry.booking.create', {
    data,
    channel: 'hostaway',
    retryAt: new Date(Date.now() + 60000).toISOString(), // Retry in 1 min
  });
}
// Strategy 3: Return partial data
async getAvailability(propertyId: string): Promise<Availability[]> {
  try {
    return await this.availabilityBreaker.fire(propertyId);
  } catch (error) {
    if (error.code === 'EOPENBREAKER') {
      // Return conservative "unavailable" to prevent overbooking
      logger.warn('Circuit open, returning unavailable status');
      return [{ available: false, reason: 'API unavailable' }];
    }
    throw error;
  }
}
Circuit Breaker States
┌─────────────┐
│   CLOSED    │  ← Normal operation
│  (healthy)  │
└──────┬──────┘
       │ 50% errors
       ↓
┌─────────────┐
│    OPEN     │  ← Failing fast (no API calls)
│  (failing)  │
└──────┬──────┘
       │ 30s cooldown
       ↓
┌─────────────┐
│  HALF-OPEN  │  ← Testing with 1 request
│  (testing)  │
└──────┬──────┘
       │
       ├─ Success → CLOSED
       └─ Failure → OPEN
Observability
Metrics Collection
// src/monitoring/circuitBreakerMetrics.ts
export function collectCircuitBreakerMetrics(breaker: CircuitBreaker) {
  setInterval(() => {
    const stats = breaker.stats;
    metrics.gauge('circuit_breaker.requests', stats.fires);
    metrics.gauge('circuit_breaker.failures', stats.failures);
    metrics.gauge('circuit_breaker.successes', stats.successes);
    metrics.gauge('circuit_breaker.timeouts', stats.timeouts);
    metrics.gauge('circuit_breaker.cache_hits', stats.cacheHits);
    // State (0 = closed, 1 = open, 2 = half-open)
    const state = breaker.opened ? 1 : breaker.halfOpen ? 2 : 0;
    metrics.gauge('circuit_breaker.state', state);
  }, 10000); // Every 10s
}
Dashboard (Grafana)
Circuit Breaker Health:
- State timeline (closed/open/half-open)
- Error rate (%)
- Timeout rate (%)
- Fallback usage (cache hits)
Consequences
Positive
- ✅ Prevents Cascade Failures: Stop hitting broken API
- ✅ Faster Recovery: Auto-retry after cooldown
- ✅ Graceful Degradation: Return cached/default data
- ✅ Observability: Metrics on breaker states
Negative
- ❌ Delayed Error Detection: 10s rolling window delay
- ❌ False Positives: May open on transient errors
Mitigations
- Tune errorThresholdPercentageper API (50% default)
- Use cached data or queuing for critical operations
- Monitor breaker open events (alert if >5 min)
Validation Checklist
- Circuit breakers wrap all external API calls
- Fallback strategies defined (cache, queue, default)
- Metrics collected for all breakers
- Grafana dashboard for breaker states
- Alerts for breaker open >5 minutes