ADR-0029: Async Webhook Processing with BullMQ
Status
Accepted - 2025-01-26
Context
Webhook processing involves database writes, API calls, and business logic that may take >5 seconds, exceeding webhook timeout limits (10-30s).
Decision
BullMQ job queue for async webhook processing with immediate HTTP 200 response.
Rationale
- Fast Response: Return HTTP 200 in <100ms (prevents retries)
- Reliability: Persistent queue (Redis Streams)
- Retries: Exponential backoff (already decided in ADR-0003)
- Observability: Job status tracking, dead letter queue
Alternatives Considered
Alternative 1: Synchronous Processing
Rejected - Slow response (5-30s), triggers partner retries, blocks HTTP thread
Alternative 2: Pub/Sub (Redis)
Rejected - No persistence (lost on restart), no retry guarantees
Alternative 3: AWS SQS
Rejected - Vendor lock-in, higher latency (200-500ms), no local dev
Implementation
// src/queues/webhookQueue.ts
import { Queue, Worker } from 'bullmq';
import Redis from 'ioredis';
const connection = new Redis(process.env.REDIS_URL, { maxRetriesPerRequest: null });
export const webhookQueue = new Queue('webhooks', { connection });
// src/routes/webhooks/hostaway.ts
app.post('/webhooks/hostaway', async (req, reply) => {
  // 1. Verify signature (ADR-0027)
  verifyWebhookSignature(req.body, req.headers['x-hostaway-signature']);
  // 2. Check idempotency (ADR-0028)
  const hash = hashPayload(req.body);
  if (await isProcessed(hash)) {
    return reply.status(200).send({ received: true });
  }
  // 3. Enqueue job (return 200 immediately)
  await webhookQueue.add('hostaway.booking.created', {
    payload: req.body,
    hash,
    receivedAt: new Date().toISOString(),
  });
  return reply.status(200).send({ received: true });
});
// src/workers/webhookWorker.ts
const worker = new Worker(
  'webhooks',
  async (job) => {
    const { payload, hash } = job.data;
    try {
      await processHostawayWebhook(payload);
      await markProcessed(hash); // Mark after successful processing
    } catch (error) {
      logger.error({ error, jobId: job.id }, 'Webhook processing failed');
      throw error; // Trigger retry
    }
  },
  {
    connection,
    concurrency: 10, // Process 10 webhooks concurrently
  }
);
Job Configuration
Retry Policy (from ADR-0003)
await webhookQueue.add(
  'hostaway.booking.created',
  { payload: req.body },
  {
    attempts: 5,
    backoff: {
      type: 'exponential',
      delay: 2000, // 2s, 4s, 8s, 16s, 32s
    },
    removeOnComplete: 100, // Keep last 100 completed jobs
    removeOnFail: 500, // Keep last 500 failed jobs
  }
);
Concurrency Tuning
// High-priority webhooks (bookings, payments)
const bookingWorker = new Worker('webhooks.bookings', handler, {
  concurrency: 20,
});
// Low-priority webhooks (property updates)
const propertyWorker = new Worker('webhooks.properties', handler, {
  concurrency: 5,
});
Error Handling
Dead Letter Queue (DLQ)
worker.on('failed', async (job, error) => {
  if (job.attemptsMade >= job.opts.attempts) {
    // Move to DLQ after all retries exhausted
    await dlqQueue.add('webhook.failed', {
      originalJob: job.data,
      error: error.message,
      failedAt: new Date().toISOString(),
    });
    // Alert on-call engineer
    await alertSlack({
      channel: '#alerts',
      text: `Webhook processing failed after ${job.attemptsMade} attempts`,
    });
  }
});
Consequences
Positive
- ✅ Fast Response: <100ms HTTP 200 (prevents partner retries)
- ✅ Reliability: Persistent queue survives restarts
- ✅ Scalability: Horizontal scaling with multiple workers
- ✅ Observability: BullMQ Board UI for job monitoring
Negative
- ❌ Redis Dependency: Queue requires Redis uptime
- ❌ Eventual Consistency: Webhook ack != processing complete
Mitigations
- Use Upstash Redis with 99.99% SLA
- Expose /webhooks/status/:jobIdfor partners to check status
- Monitor queue depth and processing time
Validation Checklist
- BullMQ queue configured with exponential backoff
- HTTP 200 returned in <100ms
- Dead letter queue for failed jobs
- Worker concurrency tuned (10-20)
- BullMQ Board UI enabled