ADR-0030: Transactional Outbox Pattern for Event Reliability
Status
Accepted - 2025-01-26
Context
TVL Platform publishes domain events (booking created, property updated) that must be reliably delivered even if event bus fails during transaction.
Decision
Transactional Outbox Pattern - Write events to outbox table within same database transaction, then publish asynchronously.
Rationale
- Atomicity: Event write + business logic in single transaction
- No Lost Events: Event persisted even if message bus fails
- Idempotency: Deduplication via event ID
- Ordering: Guaranteed order per aggregate (property, booking)
Alternatives Considered
Alternative 1: Direct Event Publishing
Rejected - Lost events if publish fails after DB commit (dual-write problem)
Alternative 2: Event Sourcing
Rejected - Overkill for MVP, complex migration from CRUD model
Alternative 3: Change Data Capture (CDC)
Rejected - Requires Debezium/Kafka, too heavy for MVP
Implementation
1. Outbox Table Schema
CREATE TABLE outbox (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  aggregate_type VARCHAR(255) NOT NULL, -- 'booking', 'property'
  aggregate_id UUID NOT NULL,
  event_type VARCHAR(255) NOT NULL, -- 'booking.created', 'property.updated'
  payload JSONB NOT NULL,
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  published_at TIMESTAMPTZ,
  INDEX idx_outbox_unpublished (created_at) WHERE published_at IS NULL
);
2. Insert Event in Transaction
// src/services/bookings/createBooking.ts
import { db } from '@/db';
export async function createBooking(input: CreateBookingInput) {
  return await db.transaction(async (trx) => {
    // 1. Insert booking
    const [booking] = await trx.insert(bookings).values({
      id: crypto.randomUUID(),
      guestName: input.guestName,
      checkIn: input.checkIn,
      checkOut: input.checkOut,
    }).returning();
    // 2. Insert outbox event (same transaction!)
    await trx.insert(outbox).values({
      aggregateType: 'booking',
      aggregateId: booking.id,
      eventType: 'booking.created',
      payload: booking,
      metadata: { userId: input.userId, traceId: input.traceId },
    });
    return booking;
  });
}
3. Outbox Publisher (Background Job)
// src/jobs/outboxPublisher.ts
import { Queue, Worker } from 'bullmq';
// Poll every 1 second
const publisherQueue = new Queue('outbox-publisher', { connection });
await publisherQueue.add(
  'poll-outbox',
  {},
  { repeat: { every: 1000 } } // 1s interval
);
const worker = new Worker('outbox-publisher', async () => {
  const events = await db
    .select()
    .from(outbox)
    .where(isNull(outbox.publishedAt))
    .orderBy(asc(outbox.createdAt))
    .limit(100);
  for (const event of events) {
    try {
      // Publish to event bus (Redis Streams - ADR-0031)
      await eventBus.publish(event.eventType, event.payload);
      // Mark as published
      await db
        .update(outbox)
        .set({ publishedAt: new Date() })
        .where(eq(outbox.id, event.id));
    } catch (error) {
      logger.error({ error, eventId: event.id }, 'Failed to publish event');
      // Retry on next poll
    }
  }
}, { connection });
Event Ordering Guarantees
Per-Aggregate Ordering
// Publish events in created_at order per aggregate
const events = await db
  .select()
  .from(outbox)
  .where(
    and(
      isNull(outbox.publishedAt),
      eq(outbox.aggregateType, 'booking'),
      eq(outbox.aggregateId, bookingId)
    )
  )
  .orderBy(asc(outbox.createdAt));
Global Ordering (Not Guaranteed)
Events from different aggregates may be published out of order (acceptable for DDD).
Cleanup Strategy
-- Delete published events older than 7 days
DELETE FROM outbox
WHERE published_at IS NOT NULL
  AND published_at < now() - INTERVAL '7 days';
Run via cron job (daily).
Consequences
Positive
- ✅ No Lost Events: Persisted with business data
- ✅ Atomicity: Single transaction = consistent state
- ✅ Retries: Unpublished events retried automatically
- ✅ Debugging: Audit trail of all events
Negative
- ❌ Latency: 1s polling delay (eventual consistency)
- ❌ Duplicate Events: Subscribers must be idempotent
- ❌ Storage Growth: Requires cleanup job
Mitigations
- Acceptable 1s delay for most events (not real-time)
- Event consumers use idempotency keys (ADR-0028)
- Daily cleanup job for published events >7 days
Validation Checklist
-  Outbox table with partial index on published_at IS NULL
- Events inserted in same transaction as business logic
- Outbox publisher runs every 1s
- Cleanup job deletes published events >7 days
- Event consumers handle duplicates