Structured Logging and Health Checks in Vapor
The Landmarks backend has been running on AWS since Series 1 Post 7. The sync engine from the previous post pushes and pulls data constantly. But right now, if something goes wrong, you'd only know about it when a user complains. There's no visibility into what the server is doing.
This post adds three things that every production server needs: structured logging that's actually searchable, a health check endpoint for load balancers, and request timing middleware that shows you where time is being spent.
The companion repo has the complete working code: View Code
Structured Logging with swift-log
Vapor already uses swift-log under the hood. Every Request object has a logger property. But the default output looks like this:
[ INFO ] GET /landmarks
That's not searchable. You can't filter by user, correlate across requests, or find slow endpoints. Structured logging adds key-value metadata to every log line so you can query it later.
Adding Metadata to Every Request
Create a middleware that enriches the request logger with useful context:
struct RequestMetadataMiddleware: AsyncMiddleware {
func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
// Add a unique request ID for correlation
let requestID = UUID().uuidString.prefix(8)
request.logger[metadataKey: "request_id"] = "\(requestID)"
// Add the route path
request.logger[metadataKey: "path"] = "\(request.url.path)"
request.logger[metadataKey: "method"] = "\(request.method.rawValue)"
// Add user ID if authenticated
if let user = request.auth.get(AuthenticatedUser.self) {
request.logger[metadataKey: "user_id"] = "\(user.id)"
}
return try await next.respond(to: request)
}
}
Register it globally in configure.swift:
app.middleware.use(RequestMetadataMiddleware())
Now every log line from within a request handler includes the request ID, path, method, and user. If you log inside a route handler using req.logger.info("something happened"), all that metadata comes along for free.
Request Timing
Knowing how long each request takes is essential for spotting performance regressions. Add timing to the metadata middleware:
struct RequestTimingMiddleware: AsyncMiddleware {
func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
let start = ContinuousClock.now
let response = try await next.respond(to: request)
let duration = ContinuousClock.now - start
let ms = duration.components.seconds * 1000
+ duration.components.attoseconds / 1_000_000_000_000_000
request.logger.info(
"Request completed",
metadata: [
"status": "\(response.status.code)",
"duration_ms": "\(ms)"
]
)
// Add timing header for client-side debugging
response.headers.add(name: "X-Response-Time", value: "\(ms)ms")
return response
}
}
With this in place, every request gets a log line like:
[ INFO ] Request completed [request_id: a1b2c3d4] [path: /landmarks] [method: GET] [user_id: 550e8400-...] [status: 200] [duration_ms: 45]
That's searchable. You can find all requests slower than 500ms, all requests from a specific user, or all 500 errors on a specific endpoint.
Log Levels
Use log levels intentionally:
// Normal operations
req.logger.info("Landmarks fetched", metadata: ["count": "\(landmarks.count)"])
// Something unexpected but recoverable
req.logger.warning("Device token invalid, removing", metadata: ["token_prefix": "\(token.prefix(8))"])
// Something broke
req.logger.error("Database query failed", metadata: ["error": "\(error)"])
// Detailed debugging (only visible when log level is set to debug)
req.logger.debug("Cache miss for landmark", metadata: ["id": "\(landmarkID)"])
In production, set the log level to .info. In development, use .debug:
app.logger.logLevel = app.environment == .production ? .info : .debug
JSON Log Format
CloudWatch and other log aggregators work best with JSON. Create a custom log handler:
import Logging
struct JSONLogHandler: LogHandler {
var metadata: Logger.Metadata = [:]
var logLevel: Logger.Level = .info
let label: String
subscript(metadataKey key: String) -> Logger.Metadata.Value? {
get { metadata[key] }
set { metadata[key] = newValue }
}
func log(
level: Logger.Level,
message: Logger.Message,
metadata: Logger.Metadata?,
source: String,
file: String,
function: String,
line: UInt
) {
let merged = self.metadata.merging(metadata ?? [:]) { _, new in new }
var dict: [String: String] = [
"timestamp": ISO8601DateFormatter().string(from: Date()),
"level": level.rawValue.uppercased(),
"message": "\(message)",
"source": source
]
for (key, value) in merged {
dict[key] = "\(value)"
}
if let json = try? JSONSerialization.data(
withJSONObject: dict,
options: [.sortedKeys]
), let string = String(data: json, encoding: .utf8) {
print(string)
}
}
}
Bootstrap it before Vapor starts:
LoggingSystem.bootstrap { label in
JSONLogHandler(label: label)
}
Now your logs are machine-parseable JSON that CloudWatch can index automatically.
Health Check Endpoint
Load balancers need to know if your server is healthy. ECS, ALB, and Kubernetes all poll a health endpoint periodically. If it stops responding (or returns an error), traffic gets routed elsewhere.
A basic health check verifies the server is running:
app.get("health") { req async throws -> HealthResponse in
HealthResponse(status: "ok")
}
struct HealthResponse: Content {
let status: String
}
But that only proves the HTTP server is alive. A better health check verifies the dependencies too:
app.get("health") { req async throws -> HealthResponse in
var checks: [String: String] = [:]
// Check database connectivity
do {
_ = try await req.db.execute(query: .init(stringLiteral: "SELECT 1"))
checks["database"] = "ok"
} catch {
checks["database"] = "error: \(error.localizedDescription)"
}
let allHealthy = checks.values.allSatisfy { $0 == "ok" }
let response = HealthResponse(
status: allHealthy ? "ok" : "degraded",
checks: checks,
uptime: ProcessInfo.processInfo.systemUptime
)
if !allHealthy {
req.logger.warning("Health check degraded", metadata: [
"checks": "\(checks)"
])
}
return response
}
struct HealthResponse: Content {
let status: String
let checks: [String: String]
let uptime: TimeInterval
}
Configuring ECS Health Checks
In your ECS task definition (from Series 1 Post 7), point the health check at this endpoint:
{
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
The startPeriod gives Vapor time to boot and run migrations before ECS starts checking.
If you're using an Application Load Balancer, configure it to check /health on the target group. Set the healthy threshold to 2 and unhealthy threshold to 3. That gives transient issues time to resolve before the container gets killed.
Error Logging Middleware
Capture unhandled errors before Vapor converts them to 500 responses:
struct ErrorLoggingMiddleware: AsyncMiddleware {
func respond(to request: Request, chainingTo next: any AsyncResponder) async throws -> Response {
do {
return try await next.respond(to: request)
} catch let abort as Abort {
// Expected errors (400, 401, 404, etc.) - log at info level
if abort.status.code >= 500 {
request.logger.error("Server error", metadata: [
"status": "\(abort.status.code)",
"reason": "\(abort.reason)"
])
}
throw abort
} catch {
// Unexpected errors - always log at error level
request.logger.error("Unhandled error", metadata: [
"error": "\(error)",
"type": "\(type(of: error))"
])
throw error
}
}
}
Register it before other middleware so it catches everything:
app.middleware.use(ErrorLoggingMiddleware())
app.middleware.use(RequestTimingMiddleware())
app.middleware.use(RequestMetadataMiddleware())
Middleware executes in registration order for requests and reverse order for responses. Putting the error middleware first means it wraps everything else.
A Simple Metrics Endpoint
For a lightweight metrics solution without pulling in a full Prometheus client, track basic counters in memory:
actor MetricsStore {
static let shared = MetricsStore()
private var requestCounts: [String: Int] = [:]
private var errorCounts: [String: Int] = [:]
func recordRequest(path: String) {
requestCounts[path, default: 0] += 1
}
func recordError(path: String) {
errorCounts[path, default: 0] += 1
}
func snapshot() -> MetricsSnapshot {
MetricsSnapshot(
requestCounts: requestCounts,
errorCounts: errorCounts,
timestamp: Date()
)
}
}
struct MetricsSnapshot: Content {
let requestCounts: [String: Int]
let errorCounts: [String: Int]
let timestamp: Date
}
Expose it on an admin endpoint (behind auth):
protected.get("admin", "metrics") { req async throws -> MetricsSnapshot in
await MetricsStore.shared.snapshot()
}
This is intentionally simple. For a real production system, you'd want Prometheus, Grafana, or CloudWatch metrics with proper histograms and percentiles. But this gives you immediate visibility without any infrastructure changes.
Testing
Test that the health endpoint returns the right structure:
@Test func healthCheckReturnsOK() async throws {
let app = try await Application.make(.testing)
defer { Task { try await app.asyncShutdown() } }
try configure(app)
try await app.test(.GET, "health") { res async in
#expect(res.status == .ok)
let health = try res.content.decode(HealthResponse.self)
#expect(health.status == "ok")
#expect(health.checks["database"] == "ok")
}
}
What's Next
We can now see what the server is doing in production. But what about the iOS app? In the next post, we'll profile the Landmarks app with Instruments to find real performance bottlenecks and fix them before users notice.