ADR-0035: Structured Logging for Integration Events

Status

Accepted - 2025-01-26

Context

Integration events (webhooks, API calls, sync jobs) require detailed logging for debugging failures, audit trails, and compliance.

Decision

Structured JSON logging with correlation IDs for tracing requests across systems.

Rationale

Machine-Readable: Parse logs with tools (Loki, Datadog)
Correlation: Trace request from webhook → queue → DB
Compliance: Audit trail for GDPR, SOC 2
Debugging: Search by requestId, orgId, channel

Alternatives Considered

Alternative 1: Plain Text Logging

Rejected - Hard to parse, no structured search

Alternative 2: Custom Binary Format

Rejected - Not standard, requires custom parsers

Alternative 3: OpenTelemetry Logs

Deferred - OTLP logs not widely adopted yet, use for traces/metrics only

Implementation

1. Log Format (JSON)

{
  "timestamp": "2025-01-26T14:30:00.123Z",
  "level": "info",
  "message": "Webhook received",
  "service": "webhook-processor",
  "environment": "production",
  "correlationId": "550e8400-e29b-41d4-a716-446655440000",
  "orgId": "org-123",
  "accountId": "acc-456",
  "channel": "hostaway",
  "eventType": "booking.created",
  "duration": 125,
  "statusCode": 200,
  "error": null
}

2. Winston Logger Configuration

// src/logging/logger.ts
import winston from 'winston';

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: 'tvl-platform',
    environment: process.env.NODE_ENV,
  },
  transports: [
    // Console (local dev)
    new winston.transports.Console({
      format: winston.format.combine(
        winston.format.colorize(),
        winston.format.simple()
      ),
    }),

    // File (production)
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
    new winston.transports.File({ filename: 'logs/combined.log' }),
  ],
});

export default logger;

3. Correlation ID Middleware

// src/middleware/correlationId.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { v4 as uuidv4 } from 'uuid';

export async function correlationIdMiddleware(
  req: FastifyRequest,
  reply: FastifyReply
) {
  // Use existing correlation ID or generate new
  const correlationId = (req.headers['x-correlation-id'] as string) || uuidv4();

  // Attach to request
  req.correlationId = correlationId;

  // Return in response headers
  reply.header('X-Correlation-ID', correlationId);
}

// Register middleware
app.addHook('onRequest', correlationIdMiddleware);

4. Webhook Logging

// src/routes/webhooks/hostaway.ts
app.post('/webhooks/hostaway', async (req, reply) => {
  const startTime = Date.now();

  logger.info({
    message: 'Webhook received',
    correlationId: req.correlationId,
    channel: 'hostaway',
    eventType: req.body.event_type,
    webhookId: req.body.id,
    signature: req.headers['x-hostaway-signature'],
  });

  try {
    // Verify signature
    if (!verifyWebhookSignature(req.body, req.headers['x-hostaway-signature'])) {
      logger.warn({
        message: 'Invalid webhook signature',
        correlationId: req.correlationId,
        channel: 'hostaway',
        eventType: req.body.event_type,
      });
      return reply.status(401).send({ error: 'Invalid signature' });
    }

    // Process webhook
    await processHostawayWebhook(req.body, req.correlationId);

    logger.info({
      message: 'Webhook processed successfully',
      correlationId: req.correlationId,
      channel: 'hostaway',
      eventType: req.body.event_type,
      duration: Date.now() - startTime,
    });

    return reply.status(200).send({ received: true });
  } catch (error) {
    logger.error({
      message: 'Webhook processing failed',
      correlationId: req.correlationId,
      channel: 'hostaway',
      eventType: req.body.event_type,
      error: error.message,
      stack: error.stack,
      duration: Date.now() - startTime,
    });

    return reply.status(500).send({ error: 'Processing failed' });
  }
});

5. Background Job Logging

// src/workers/webhookWorker.ts
const worker = new Worker('webhooks', async (job) => {
  const { payload, correlationId } = job.data;

  logger.info({
    message: 'Processing webhook job',
    correlationId,
    jobId: job.id,
    eventType: payload.event_type,
    attempt: job.attemptsMade + 1,
  });

  try {
    await processWebhook(payload, correlationId);

    logger.info({
      message: 'Webhook job completed',
      correlationId,
      jobId: job.id,
      duration: job.finishedOn - job.processedOn,
    });
  } catch (error) {
    logger.error({
      message: 'Webhook job failed',
      correlationId,
      jobId: job.id,
      error: error.message,
      stack: error.stack,
      attempt: job.attemptsMade + 1,
      maxAttempts: job.opts.attempts,
    });
    throw error; // Re-throw to trigger retry
  }
});

Correlation Flow

Request Trace

1. Webhook arrives
   └─ correlationId: abc-123

2. Queued to BullMQ
   └─ correlationId: abc-123 (passed in job data)

3. Worker processes
   └─ correlationId: abc-123 (logged with job)

4. Database write
   └─ correlationId: abc-123 (stored in metadata)

5. Event published
   └─ correlationId: abc-123 (in event metadata)

All logs searchable by: correlationId:abc-123

Log Levels

// ERROR - System errors requiring attention
logger.error({
  message: 'Database connection failed',
  error: error.message,
  stack: error.stack,
});

// WARN - Unexpected but handled issues
logger.warn({
  message: 'Circuit breaker opened',
  channel: 'hostaway',
  breakerState: 'open',
});

// INFO - Important events (webhooks, API calls)
logger.info({
  message: 'Booking created',
  bookingId: booking.id,
  correlationId: req.correlationId,
});

// DEBUG - Detailed debugging (disabled in production)
logger.debug({
  message: 'Field mapping applied',
  externalField: 'checkInDate',
  internalField: 'checkIn',
  value: '2025-02-01',
});

Sensitive Data Redaction

// src/logging/redact.ts
import redact from 'fast-redact';

const redactSensitiveData = redact({
  paths: [
    'password',
    'apiKey',
    'apiSecret',
    'accessToken',
    'refreshToken',
    'creditCard',
    'ssn',
  ],
  censor: '[REDACTED]',
});

// Usage
logger.info({
  message: 'User authenticated',
  user: redactSensitiveData({
    email: 'user@example.com',
    password: 'secret123', // Will be [REDACTED]
  }),
});

Query Examples (Loki)

# All errors for a specific org
{service="tvl-platform"} |= "orgId:org-123" | json | level="error"

# Trace specific webhook
{service="tvl-platform"} |= "correlationId:abc-123"

# Hostaway circuit breaker events
{service="tvl-platform"} |= "channel:hostaway" | json | message="Circuit breaker opened"

# Slow requests (>5s)
{service="tvl-platform"} | json | duration > 5000

# Failed webhook processing
{service="tvl-platform"} |= "Webhook processing failed" | json

Consequences

Positive

✅ Traceability: Follow request across services
✅ Machine-Readable: Parse with Loki/Datadog
✅ Compliance: Audit trail for SOC 2
✅ Debugging: Rich context for errors

Negative

❌ Log Volume: JSON logs are verbose (2-3x plain text)
❌ Performance: Structured logging adds 1-2ms overhead

Mitigations

Use log sampling in production (10% of INFO logs)
Aggregate to Loki (compresses JSON)
Disable DEBUG logs in production

Validation Checklist

Winston configured with JSON format
Correlation ID middleware registered
All integration events logged (webhooks, API calls, jobs)
Sensitive data redaction enabled
Log aggregation configured (Loki - ADR-0046)

Status​

Context​

Decision​

Rationale​

Alternatives Considered​

Alternative 1: Plain Text Logging​

Alternative 2: Custom Binary Format​

Alternative 3: OpenTelemetry Logs​

Implementation​

1. Log Format (JSON)​

2. Winston Logger Configuration​

3. Correlation ID Middleware​

4. Webhook Logging​

5. Background Job Logging​

Correlation Flow​

Request Trace​

Log Levels​

Sensitive Data Redaction​

Query Examples (Loki)​

Consequences​

Positive​

Negative​

Mitigations​

Validation Checklist​

References​