Skip to main content

ADR-0029: Async Webhook Processing with BullMQ

Status

Accepted - 2025-01-26


Context

Webhook processing involves database writes, API calls, and business logic that may take >5 seconds, exceeding webhook timeout limits (10-30s).


Decision

BullMQ job queue for async webhook processing with immediate HTTP 200 response.

Rationale

  1. Fast Response: Return HTTP 200 in <100ms (prevents retries)
  2. Reliability: Persistent queue (Redis Streams)
  3. Retries: Exponential backoff (already decided in ADR-0003)
  4. Observability: Job status tracking, dead letter queue

Alternatives Considered

Alternative 1: Synchronous Processing

Rejected - Slow response (5-30s), triggers partner retries, blocks HTTP thread

Alternative 2: Pub/Sub (Redis)

Rejected - No persistence (lost on restart), no retry guarantees

Alternative 3: AWS SQS

Rejected - Vendor lock-in, higher latency (200-500ms), no local dev


Implementation

// src/queues/webhookQueue.ts
import { Queue, Worker } from 'bullmq';
import Redis from 'ioredis';

const connection = new Redis(process.env.REDIS_URL, { maxRetriesPerRequest: null });

export const webhookQueue = new Queue('webhooks', { connection });

// src/routes/webhooks/hostaway.ts
app.post('/webhooks/hostaway', async (req, reply) => {
// 1. Verify signature (ADR-0027)
verifyWebhookSignature(req.body, req.headers['x-hostaway-signature']);

// 2. Check idempotency (ADR-0028)
const hash = hashPayload(req.body);
if (await isProcessed(hash)) {
return reply.status(200).send({ received: true });
}

// 3. Enqueue job (return 200 immediately)
await webhookQueue.add('hostaway.booking.created', {
payload: req.body,
hash,
receivedAt: new Date().toISOString(),
});

return reply.status(200).send({ received: true });
});

// src/workers/webhookWorker.ts
const worker = new Worker(
'webhooks',
async (job) => {
const { payload, hash } = job.data;

try {
await processHostawayWebhook(payload);
await markProcessed(hash); // Mark after successful processing
} catch (error) {
logger.error({ error, jobId: job.id }, 'Webhook processing failed');
throw error; // Trigger retry
}
},
{
connection,
concurrency: 10, // Process 10 webhooks concurrently
}
);

Job Configuration

Retry Policy (from ADR-0003)

await webhookQueue.add(
'hostaway.booking.created',
{ payload: req.body },
{
attempts: 5,
backoff: {
type: 'exponential',
delay: 2000, // 2s, 4s, 8s, 16s, 32s
},
removeOnComplete: 100, // Keep last 100 completed jobs
removeOnFail: 500, // Keep last 500 failed jobs
}
);

Concurrency Tuning

// High-priority webhooks (bookings, payments)
const bookingWorker = new Worker('webhooks.bookings', handler, {
concurrency: 20,
});

// Low-priority webhooks (property updates)
const propertyWorker = new Worker('webhooks.properties', handler, {
concurrency: 5,
});

Error Handling

Dead Letter Queue (DLQ)

worker.on('failed', async (job, error) => {
if (job.attemptsMade >= job.opts.attempts) {
// Move to DLQ after all retries exhausted
await dlqQueue.add('webhook.failed', {
originalJob: job.data,
error: error.message,
failedAt: new Date().toISOString(),
});

// Alert on-call engineer
await alertSlack({
channel: '#alerts',
text: `Webhook processing failed after ${job.attemptsMade} attempts`,
});
}
});

Consequences

Positive

  • Fast Response: <100ms HTTP 200 (prevents partner retries)
  • Reliability: Persistent queue survives restarts
  • Scalability: Horizontal scaling with multiple workers
  • Observability: BullMQ Board UI for job monitoring

Negative

  • Redis Dependency: Queue requires Redis uptime
  • Eventual Consistency: Webhook ack != processing complete

Mitigations

  • Use Upstash Redis with 99.99% SLA
  • Expose /webhooks/status/:jobId for partners to check status
  • Monitor queue depth and processing time

Validation Checklist

  • BullMQ queue configured with exponential backoff
  • HTTP 200 returned in <100ms
  • Dead letter queue for failed jobs
  • Worker concurrency tuned (10-20)
  • BullMQ Board UI enabled

References