Skip to main content

ADR-0035: Structured Logging for Integration Events

Status

Accepted - 2025-01-26


Context

Integration events (webhooks, API calls, sync jobs) require detailed logging for debugging failures, audit trails, and compliance.


Decision

Structured JSON logging with correlation IDs for tracing requests across systems.

Rationale

  1. Machine-Readable: Parse logs with tools (Loki, Datadog)
  2. Correlation: Trace request from webhook → queue → DB
  3. Compliance: Audit trail for GDPR, SOC 2
  4. Debugging: Search by requestId, orgId, channel

Alternatives Considered

Alternative 1: Plain Text Logging

Rejected - Hard to parse, no structured search

Alternative 2: Custom Binary Format

Rejected - Not standard, requires custom parsers

Alternative 3: OpenTelemetry Logs

Deferred - OTLP logs not widely adopted yet, use for traces/metrics only


Implementation

1. Log Format (JSON)

{
"timestamp": "2025-01-26T14:30:00.123Z",
"level": "info",
"message": "Webhook received",
"service": "webhook-processor",
"environment": "production",
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"orgId": "org-123",
"accountId": "acc-456",
"channel": "hostaway",
"eventType": "booking.created",
"duration": 125,
"statusCode": 200,
"error": null
}

2. Winston Logger Configuration

// src/logging/logger.ts
import winston from 'winston';

const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'tvl-platform',
environment: process.env.NODE_ENV,
},
transports: [
// Console (local dev)
new winston.transports.Console({
format: winston.format.combine(
winston.format.colorize(),
winston.format.simple()
),
}),

// File (production)
new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
new winston.transports.File({ filename: 'logs/combined.log' }),
],
});

export default logger;

3. Correlation ID Middleware

// src/middleware/correlationId.ts
import { FastifyRequest, FastifyReply } from 'fastify';
import { v4 as uuidv4 } from 'uuid';

export async function correlationIdMiddleware(
req: FastifyRequest,
reply: FastifyReply
) {
// Use existing correlation ID or generate new
const correlationId = (req.headers['x-correlation-id'] as string) || uuidv4();

// Attach to request
req.correlationId = correlationId;

// Return in response headers
reply.header('X-Correlation-ID', correlationId);
}

// Register middleware
app.addHook('onRequest', correlationIdMiddleware);

4. Webhook Logging

// src/routes/webhooks/hostaway.ts
app.post('/webhooks/hostaway', async (req, reply) => {
const startTime = Date.now();

logger.info({
message: 'Webhook received',
correlationId: req.correlationId,
channel: 'hostaway',
eventType: req.body.event_type,
webhookId: req.body.id,
signature: req.headers['x-hostaway-signature'],
});

try {
// Verify signature
if (!verifyWebhookSignature(req.body, req.headers['x-hostaway-signature'])) {
logger.warn({
message: 'Invalid webhook signature',
correlationId: req.correlationId,
channel: 'hostaway',
eventType: req.body.event_type,
});
return reply.status(401).send({ error: 'Invalid signature' });
}

// Process webhook
await processHostawayWebhook(req.body, req.correlationId);

logger.info({
message: 'Webhook processed successfully',
correlationId: req.correlationId,
channel: 'hostaway',
eventType: req.body.event_type,
duration: Date.now() - startTime,
});

return reply.status(200).send({ received: true });
} catch (error) {
logger.error({
message: 'Webhook processing failed',
correlationId: req.correlationId,
channel: 'hostaway',
eventType: req.body.event_type,
error: error.message,
stack: error.stack,
duration: Date.now() - startTime,
});

return reply.status(500).send({ error: 'Processing failed' });
}
});

5. Background Job Logging

// src/workers/webhookWorker.ts
const worker = new Worker('webhooks', async (job) => {
const { payload, correlationId } = job.data;

logger.info({
message: 'Processing webhook job',
correlationId,
jobId: job.id,
eventType: payload.event_type,
attempt: job.attemptsMade + 1,
});

try {
await processWebhook(payload, correlationId);

logger.info({
message: 'Webhook job completed',
correlationId,
jobId: job.id,
duration: job.finishedOn - job.processedOn,
});
} catch (error) {
logger.error({
message: 'Webhook job failed',
correlationId,
jobId: job.id,
error: error.message,
stack: error.stack,
attempt: job.attemptsMade + 1,
maxAttempts: job.opts.attempts,
});
throw error; // Re-throw to trigger retry
}
});

Correlation Flow

Request Trace

1. Webhook arrives
└─ correlationId: abc-123

2. Queued to BullMQ
└─ correlationId: abc-123 (passed in job data)

3. Worker processes
└─ correlationId: abc-123 (logged with job)

4. Database write
└─ correlationId: abc-123 (stored in metadata)

5. Event published
└─ correlationId: abc-123 (in event metadata)

All logs searchable by: correlationId:abc-123


Log Levels

// ERROR - System errors requiring attention
logger.error({
message: 'Database connection failed',
error: error.message,
stack: error.stack,
});

// WARN - Unexpected but handled issues
logger.warn({
message: 'Circuit breaker opened',
channel: 'hostaway',
breakerState: 'open',
});

// INFO - Important events (webhooks, API calls)
logger.info({
message: 'Booking created',
bookingId: booking.id,
correlationId: req.correlationId,
});

// DEBUG - Detailed debugging (disabled in production)
logger.debug({
message: 'Field mapping applied',
externalField: 'checkInDate',
internalField: 'checkIn',
value: '2025-02-01',
});

Sensitive Data Redaction

// src/logging/redact.ts
import redact from 'fast-redact';

const redactSensitiveData = redact({
paths: [
'password',
'apiKey',
'apiSecret',
'accessToken',
'refreshToken',
'creditCard',
'ssn',
],
censor: '[REDACTED]',
});

// Usage
logger.info({
message: 'User authenticated',
user: redactSensitiveData({
email: 'user@example.com',
password: 'secret123', // Will be [REDACTED]
}),
});

Query Examples (Loki)

# All errors for a specific org
{service="tvl-platform"} |= "orgId:org-123" | json | level="error"

# Trace specific webhook
{service="tvl-platform"} |= "correlationId:abc-123"

# Hostaway circuit breaker events
{service="tvl-platform"} |= "channel:hostaway" | json | message="Circuit breaker opened"

# Slow requests (>5s)
{service="tvl-platform"} | json | duration > 5000

# Failed webhook processing
{service="tvl-platform"} |= "Webhook processing failed" | json

Consequences

Positive

  • Traceability: Follow request across services
  • Machine-Readable: Parse with Loki/Datadog
  • Compliance: Audit trail for SOC 2
  • Debugging: Rich context for errors

Negative

  • Log Volume: JSON logs are verbose (2-3x plain text)
  • Performance: Structured logging adds 1-2ms overhead

Mitigations

  • Use log sampling in production (10% of INFO logs)
  • Aggregate to Loki (compresses JSON)
  • Disable DEBUG logs in production

Validation Checklist

  • Winston configured with JSON format
  • Correlation ID middleware registered
  • All integration events logged (webhooks, API calls, jobs)
  • Sensitive data redaction enabled
  • Log aggregation configured (Loki - ADR-0046)

References