ADR-0035: Structured Logging for Integration Events
Status
Accepted - 2025-01-26
Context
Integration events (webhooks, API calls, sync jobs) require detailed logging for debugging failures, audit trails, and compliance.
Decision
Structured JSON logging with correlation IDs for tracing requests across systems.
Rationale
- Machine-Readable: Parse logs with tools (Loki, Datadog)
- Correlation: Trace request from webhook → queue → DB
- Compliance: Audit trail for GDPR, SOC 2
- Debugging: Search by requestId, orgId, channel
Alternatives Considered
Alternative 1: Plain Text Logging
Rejected - Hard to parse, no structured search
Alternative 2: Custom Binary Format
Rejected - Not standard, requires custom parsers
Alternative 3: OpenTelemetry Logs
Deferred - OTLP logs not widely adopted yet, use for traces/metrics only
Implementation
1. Log Format (JSON)
{
  "timestamp": "2025-01-26T14:30:00.123Z",
  "level": "info",
  "message": "Webhook received",
  "service": "webhook-processor",
  "environment": "production",
  "correlationId": "550e8400-e29b-41d4-a716-446655440000",
  "orgId": "org-123",
  "accountId": "acc-456",
  "channel": "hostaway",
  "eventType": "booking.created",
  "duration": 125,
  "statusCode": 200,
  "error": null
}
2. Winston Logger Configuration
// src/logging/logger.ts
import winston from 'winston';
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: 'tvl-platform',
    environment: process.env.NODE_ENV,
  },
  transports: [
    // Console (local dev)
    new winston.transports.Console({
      format: winston.format.combine(
        winston.format.colorize(),
        winston.format.simple()
      ),
    }),
    // File (production)
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
    new winston.transports.File({ filename: 'logs/combined.log' }),
  ],
});
export default logger;
3. Correlation ID Middleware
// src/middleware/correlationId.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { v4 as uuidv4 } from 'uuid';
export async function correlationIdMiddleware(
  req: FastifyRequest,
  reply: FastifyReply
) {
  // Use existing correlation ID or generate new
  const correlationId = (req.headers['x-correlation-id'] as string) || uuidv4();
  // Attach to request
  req.correlationId = correlationId;
  // Return in response headers
  reply.header('X-Correlation-ID', correlationId);
}
// Register middleware
app.addHook('onRequest', correlationIdMiddleware);
4. Webhook Logging
// src/routes/webhooks/hostaway.ts
app.post('/webhooks/hostaway', async (req, reply) => {
  const startTime = Date.now();
  logger.info({
    message: 'Webhook received',
    correlationId: req.correlationId,
    channel: 'hostaway',
    eventType: req.body.event_type,
    webhookId: req.body.id,
    signature: req.headers['x-hostaway-signature'],
  });
  try {
    // Verify signature
    if (!verifyWebhookSignature(req.body, req.headers['x-hostaway-signature'])) {
      logger.warn({
        message: 'Invalid webhook signature',
        correlationId: req.correlationId,
        channel: 'hostaway',
        eventType: req.body.event_type,
      });
      return reply.status(401).send({ error: 'Invalid signature' });
    }
    // Process webhook
    await processHostawayWebhook(req.body, req.correlationId);
    logger.info({
      message: 'Webhook processed successfully',
      correlationId: req.correlationId,
      channel: 'hostaway',
      eventType: req.body.event_type,
      duration: Date.now() - startTime,
    });
    return reply.status(200).send({ received: true });
  } catch (error) {
    logger.error({
      message: 'Webhook processing failed',
      correlationId: req.correlationId,
      channel: 'hostaway',
      eventType: req.body.event_type,
      error: error.message,
      stack: error.stack,
      duration: Date.now() - startTime,
    });
    return reply.status(500).send({ error: 'Processing failed' });
  }
});
5. Background Job Logging
// src/workers/webhookWorker.ts
const worker = new Worker('webhooks', async (job) => {
  const { payload, correlationId } = job.data;
  logger.info({
    message: 'Processing webhook job',
    correlationId,
    jobId: job.id,
    eventType: payload.event_type,
    attempt: job.attemptsMade + 1,
  });
  try {
    await processWebhook(payload, correlationId);
    logger.info({
      message: 'Webhook job completed',
      correlationId,
      jobId: job.id,
      duration: job.finishedOn - job.processedOn,
    });
  } catch (error) {
    logger.error({
      message: 'Webhook job failed',
      correlationId,
      jobId: job.id,
      error: error.message,
      stack: error.stack,
      attempt: job.attemptsMade + 1,
      maxAttempts: job.opts.attempts,
    });
    throw error; // Re-throw to trigger retry
  }
});
Correlation Flow
Request Trace
1. Webhook arrives
   └─ correlationId: abc-123
2. Queued to BullMQ
   └─ correlationId: abc-123 (passed in job data)
3. Worker processes
   └─ correlationId: abc-123 (logged with job)
4. Database write
   └─ correlationId: abc-123 (stored in metadata)
5. Event published
   └─ correlationId: abc-123 (in event metadata)
All logs searchable by: correlationId:abc-123
Log Levels
// ERROR - System errors requiring attention
logger.error({
  message: 'Database connection failed',
  error: error.message,
  stack: error.stack,
});
// WARN - Unexpected but handled issues
logger.warn({
  message: 'Circuit breaker opened',
  channel: 'hostaway',
  breakerState: 'open',
});
// INFO - Important events (webhooks, API calls)
logger.info({
  message: 'Booking created',
  bookingId: booking.id,
  correlationId: req.correlationId,
});
// DEBUG - Detailed debugging (disabled in production)
logger.debug({
  message: 'Field mapping applied',
  externalField: 'checkInDate',
  internalField: 'checkIn',
  value: '2025-02-01',
});
Sensitive Data Redaction
// src/logging/redact.ts
import redact from 'fast-redact';
const redactSensitiveData = redact({
  paths: [
    'password',
    'apiKey',
    'apiSecret',
    'accessToken',
    'refreshToken',
    'creditCard',
    'ssn',
  ],
  censor: '[REDACTED]',
});
// Usage
logger.info({
  message: 'User authenticated',
  user: redactSensitiveData({
    email: 'user@example.com',
    password: 'secret123', // Will be [REDACTED]
  }),
});
Query Examples (Loki)
# All errors for a specific org
{service="tvl-platform"} |= "orgId:org-123" | json | level="error"
# Trace specific webhook
{service="tvl-platform"} |= "correlationId:abc-123"
# Hostaway circuit breaker events
{service="tvl-platform"} |= "channel:hostaway" | json | message="Circuit breaker opened"
# Slow requests (>5s)
{service="tvl-platform"} | json | duration > 5000
# Failed webhook processing
{service="tvl-platform"} |= "Webhook processing failed" | json
Consequences
Positive
- ✅ Traceability: Follow request across services
- ✅ Machine-Readable: Parse with Loki/Datadog
- ✅ Compliance: Audit trail for SOC 2
- ✅ Debugging: Rich context for errors
Negative
- ❌ Log Volume: JSON logs are verbose (2-3x plain text)
- ❌ Performance: Structured logging adds 1-2ms overhead
Mitigations
- Use log sampling in production (10% of INFO logs)
- Aggregate to Loki (compresses JSON)
- Disable DEBUG logs in production
Validation Checklist
- Winston configured with JSON format
- Correlation ID middleware registered
- All integration events logged (webhooks, API calls, jobs)
- Sensitive data redaction enabled
- Log aggregation configured (Loki - ADR-0046)