Skip to main content

ADR-0030: Transactional Outbox Pattern for Event Reliability

Status

Accepted - 2025-01-26


Context

TVL Platform publishes domain events (booking created, property updated) that must be reliably delivered even if event bus fails during transaction.


Decision

Transactional Outbox Pattern - Write events to outbox table within same database transaction, then publish asynchronously.

Rationale

  1. Atomicity: Event write + business logic in single transaction
  2. No Lost Events: Event persisted even if message bus fails
  3. Idempotency: Deduplication via event ID
  4. Ordering: Guaranteed order per aggregate (property, booking)

Alternatives Considered

Alternative 1: Direct Event Publishing

Rejected - Lost events if publish fails after DB commit (dual-write problem)

Alternative 2: Event Sourcing

Rejected - Overkill for MVP, complex migration from CRUD model

Alternative 3: Change Data Capture (CDC)

Rejected - Requires Debezium/Kafka, too heavy for MVP


Implementation

1. Outbox Table Schema

CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL, -- 'booking', 'property'
aggregate_id UUID NOT NULL,
event_type VARCHAR(255) NOT NULL, -- 'booking.created', 'property.updated'
payload JSONB NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ,
INDEX idx_outbox_unpublished (created_at) WHERE published_at IS NULL
);

2. Insert Event in Transaction

// src/services/bookings/createBooking.ts
import { db } from '@/db';

export async function createBooking(input: CreateBookingInput) {
return await db.transaction(async (trx) => {
// 1. Insert booking
const [booking] = await trx.insert(bookings).values({
id: crypto.randomUUID(),
guestName: input.guestName,
checkIn: input.checkIn,
checkOut: input.checkOut,
}).returning();

// 2. Insert outbox event (same transaction!)
await trx.insert(outbox).values({
aggregateType: 'booking',
aggregateId: booking.id,
eventType: 'booking.created',
payload: booking,
metadata: { userId: input.userId, traceId: input.traceId },
});

return booking;
});
}

3. Outbox Publisher (Background Job)

// src/jobs/outboxPublisher.ts
import { Queue, Worker } from 'bullmq';

// Poll every 1 second
const publisherQueue = new Queue('outbox-publisher', { connection });

await publisherQueue.add(
'poll-outbox',
{},
{ repeat: { every: 1000 } } // 1s interval
);

const worker = new Worker('outbox-publisher', async () => {
const events = await db
.select()
.from(outbox)
.where(isNull(outbox.publishedAt))
.orderBy(asc(outbox.createdAt))
.limit(100);

for (const event of events) {
try {
// Publish to event bus (Redis Streams - ADR-0031)
await eventBus.publish(event.eventType, event.payload);

// Mark as published
await db
.update(outbox)
.set({ publishedAt: new Date() })
.where(eq(outbox.id, event.id));
} catch (error) {
logger.error({ error, eventId: event.id }, 'Failed to publish event');
// Retry on next poll
}
}
}, { connection });

Event Ordering Guarantees

Per-Aggregate Ordering

// Publish events in created_at order per aggregate
const events = await db
.select()
.from(outbox)
.where(
and(
isNull(outbox.publishedAt),
eq(outbox.aggregateType, 'booking'),
eq(outbox.aggregateId, bookingId)
)
)
.orderBy(asc(outbox.createdAt));

Global Ordering (Not Guaranteed)

Events from different aggregates may be published out of order (acceptable for DDD).


Cleanup Strategy

-- Delete published events older than 7 days
DELETE FROM outbox
WHERE published_at IS NOT NULL
AND published_at < now() - INTERVAL '7 days';

Run via cron job (daily).


Consequences

Positive

  • No Lost Events: Persisted with business data
  • Atomicity: Single transaction = consistent state
  • Retries: Unpublished events retried automatically
  • Debugging: Audit trail of all events

Negative

  • Latency: 1s polling delay (eventual consistency)
  • Duplicate Events: Subscribers must be idempotent
  • Storage Growth: Requires cleanup job

Mitigations

  • Acceptable 1s delay for most events (not real-time)
  • Event consumers use idempotency keys (ADR-0028)
  • Daily cleanup job for published events >7 days

Validation Checklist

  • Outbox table with partial index on published_at IS NULL
  • Events inserted in same transaction as business logic
  • Outbox publisher runs every 1s
  • Cleanup job deletes published events >7 days
  • Event consumers handle duplicates

References