Structured Logging and Health Checks in Vapor

The Landmarks backend has been running on AWS since Series 1 Post 7. The sync engine from the previous post pushes and pulls data constantly. But right now, if something goes wrong, you'd only know about it when a user complains. There's no visibility into what the server is doing.

This post adds three things that every production server needs: structured logging that's actually searchable, a health check endpoint for load balancers, and request timing middleware that shows you where time is being spent.

The companion repo has the complete working code: View Code

Structured Logging with swift-log

Vapor already uses swift-log under the hood. Every Request object has a logger property. But the default output looks like this:

[ INFO ] GET /landmarks

That's not searchable. You can't filter by user, correlate across requests, or find slow endpoints. Structured logging adds key-value metadata to every log line so you can query it later.

Adding Metadata to Every Request

Create a middleware that enriches the request logger with useful context:

struct RequestMetadataMiddleware: AsyncMiddleware {
    func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
        // Add a unique request ID for correlation
        let requestID = UUID().uuidString.prefix(8)
        request.logger[metadataKey: "request_id"] = "\(requestID)"

        // Add the route path
        request.logger[metadataKey: "path"] = "\(request.url.path)"
        request.logger[metadataKey: "method"] = "\(request.method.rawValue)"

        // Add user ID if authenticated
        if let user = request.auth.get(AuthenticatedUser.self) {
            request.logger[metadataKey: "user_id"] = "\(user.id)"
        }

        return try await next.respond(to: request)
    }
}

Register it globally in configure.swift:

app.middleware.use(RequestMetadataMiddleware())

Now every log line from within a request handler includes the request ID, path, method, and user. If you log inside a route handler using req.logger.info("something happened"), all that metadata comes along for free.

Request Timing

Knowing how long each request takes is essential for spotting performance regressions. Add timing to the metadata middleware:

struct RequestTimingMiddleware: AsyncMiddleware {
    func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
        let start = ContinuousClock.now

        let response = try await next.respond(to: request)

        let duration = ContinuousClock.now - start
        let ms = duration.components.seconds * 1000
            + duration.components.attoseconds / 1_000_000_000_000_000

        request.logger.info(
            "Request completed",
            metadata: [
                "status": "\(response.status.code)",
                "duration_ms": "\(ms)"
            ]
        )

        // Add timing header for client-side debugging
        response.headers.add(name: "X-Response-Time", value: "\(ms)ms")

        return response
    }
}

With this in place, every request gets a log line like:

[ INFO ] Request completed [request_id: a1b2c3d4] [path: /landmarks] [method: GET] [user_id: 550e8400-...] [status: 200] [duration_ms: 45]

That's searchable. You can find all requests slower than 500ms, all requests from a specific user, or all 500 errors on a specific endpoint.

Log Levels

Use log levels intentionally:

// Normal operations
req.logger.info("Landmarks fetched", metadata: ["count": "\(landmarks.count)"])

// Something unexpected but recoverable
req.logger.warning("Device token invalid, removing", metadata: ["token_prefix": "\(token.prefix(8))"])

// Something broke
req.logger.error("Database query failed", metadata: ["error": "\(error)"])

// Detailed debugging (only visible when log level is set to debug)
req.logger.debug("Cache miss for landmark", metadata: ["id": "\(landmarkID)"])

In production, set the log level to .info. In development, use .debug:

app.logger.logLevel = app.environment == .production ? .info : .debug

JSON Log Format

CloudWatch and other log aggregators work best with JSON. Create a custom log handler:

import Logging

struct JSONLogHandler: LogHandler {
    var metadata: Logger.Metadata = [:]
    var logLevel: Logger.Level = .info
    let label: String

    subscript(metadataKey key: String) -> Logger.Metadata.Value? {
        get { metadata[key] }
        set { metadata[key] = newValue }
    }

    func log(
        level: Logger.Level,
        message: Logger.Message,
        metadata: Logger.Metadata?,
        source: String,
        file: String,
        function: String,
        line: UInt
    ) {
        let merged = self.metadata.merging(metadata ?? [:]) { _, new in new }

        var dict: [String: String] = [
            "timestamp": ISO8601DateFormatter().string(from: Date()),
            "level": level.rawValue.uppercased(),
            "message": "\(message)",
            "source": source
        ]

        for (key, value) in merged {
            dict[key] = "\(value)"
        }

        if let json = try? JSONSerialization.data(
            withJSONObject: dict,
            options: [.sortedKeys]
        ), let string = String(data: json, encoding: .utf8) {
            print(string)
        }
    }
}

Bootstrap it before Vapor starts:

LoggingSystem.bootstrap { label in
    JSONLogHandler(label: label)
}

Now your logs are machine-parseable JSON that CloudWatch can index automatically.

Health Check Endpoint

Load balancers need to know if your server is healthy. ECS, ALB, and Kubernetes all poll a health endpoint periodically. If it stops responding (or returns an error), traffic gets routed elsewhere.

A basic health check verifies the server is running:

app.get("health") { req async throws -> HealthResponse in
    HealthResponse(status: "ok")
}

struct HealthResponse: Content {
    let status: String
}

But that only proves the HTTP server is alive. A better health check verifies the dependencies too:

app.get("health") { req async throws -> HealthResponse in
    var checks: [String: String] = [:]

    // Check database connectivity
    do {
        _ = try await req.db.execute(query: .init(stringLiteral: "SELECT 1"))
        checks["database"] = "ok"
    } catch {
        checks["database"] = "error: \(error.localizedDescription)"
    }

    let allHealthy = checks.values.allSatisfy { $0 == "ok" }

    let response = HealthResponse(
        status: allHealthy ? "ok" : "degraded",
        checks: checks,
        uptime: ProcessInfo.processInfo.systemUptime
    )

    if !allHealthy {
        req.logger.warning("Health check degraded", metadata: [
            "checks": "\(checks)"
        ])
    }

    return response
}

struct HealthResponse: Content {
    let status: String
    let checks: [String: String]
    let uptime: TimeInterval
}

Configuring ECS Health Checks

In your ECS task definition (from Series 1 Post 7), point the health check at this endpoint:

{
    "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
    }
}

The startPeriod gives Vapor time to boot and run migrations before ECS starts checking.

If you're using an Application Load Balancer, configure it to check /health on the target group. Set the healthy threshold to 2 and unhealthy threshold to 3. That gives transient issues time to resolve before the container gets killed.

Error Logging Middleware

Capture unhandled errors before Vapor converts them to 500 responses:

struct ErrorLoggingMiddleware: AsyncMiddleware {
    func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
        do {
            return try await next.respond(to: request)
        } catch let abort as Abort {
            // Expected errors (400, 401, 404, etc.) - log at info level
            if abort.status.code >= 500 {
                request.logger.error("Server error", metadata: [
                    "status": "\(abort.status.code)",
                    "reason": "\(abort.reason)"
                ])
            }
            throw abort
        } catch {
            // Unexpected errors - always log at error level
            request.logger.error("Unhandled error", metadata: [
                "error": "\(error)",
                "type": "\(type(of: error))"
            ])
            throw error
        }
    }
}

Register it before other middleware so it catches everything:

app.middleware.use(ErrorLoggingMiddleware())
app.middleware.use(RequestTimingMiddleware())
app.middleware.use(RequestMetadataMiddleware())

Middleware executes in registration order for requests and reverse order for responses. Putting the error middleware first means it wraps everything else.

A Simple Metrics Endpoint

For a lightweight metrics solution without pulling in a full Prometheus client, track basic counters in memory:

actor MetricsStore {
    static let shared = MetricsStore()

    private var requestCounts: [String: Int] = [:]
    private var errorCounts: [String: Int] = [:]

    func recordRequest(path: String) {
        requestCounts[path, default: 0] += 1
    }

    func recordError(path: String) {
        errorCounts[path, default: 0] += 1
    }

    func snapshot() -> MetricsSnapshot {
        MetricsSnapshot(
            requestCounts: requestCounts,
            errorCounts: errorCounts,
            timestamp: Date()
        )
    }
}

struct MetricsSnapshot: Content {
    let requestCounts: [String: Int]
    let errorCounts: [String: Int]
    let timestamp: Date
}

Expose it on an admin endpoint (behind auth):

protected.get("admin", "metrics") { req async throws -> MetricsSnapshot in
    await MetricsStore.shared.snapshot()
}

This is intentionally simple. For a real production system, you'd want Prometheus, Grafana, or CloudWatch metrics with proper histograms and percentiles. But this gives you immediate visibility without any infrastructure changes.

Testing

Test that the health endpoint returns the right structure:

@Test func healthCheckReturnsOK() async throws {
    let app = try await Application.make(.testing)
    defer { Task { try await app.asyncShutdown() } }
    try configure(app)

    try await app.test(.GET, "health") { res async in
        #expect(res.status == .ok)
        let health = try res.content.decode(HealthResponse.self)
        #expect(health.status == "ok")
        #expect(health.checks["database"] == "ok")
    }
}

What's Next

We can now see what the server is doing in production. But what about the iOS app? In the next post, we'll profile the Landmarks app with Instruments to find real performance bottlenecks and fix them before users notice.