Visual Regression Testing with Grantiva and Stateful Mocks

Unit tests verify that your logic is correct. Integration tests verify that your server works. But neither one answers the question that matters most to users: does the app actually work when you tap things?

Does tapping "Reserve Visit" show the confirmation screen? Does cancelling a reservation actually remove it from the list? Does filtering by category show the right landmarks? These are real user flows, and if you're only testing them by hand, you're one refactor away from shipping a broken experience.

Grantiva answers these questions with simple YAML flows and catches visual regressions automatically. Thanks to the MockReservationStore we built in the previous post, Grantiva can test the full UI without a running server. Buttons update local state as if real API responses came back. The views react. Grantiva verifies what the user sees and captures screenshots at every step. Next time something changes, it diffs those screenshots pixel-by-pixel and tells you exactly what broke. No network, no flakiness, deterministic results.

What Is Grantiva?

Grantiva is a visual regression testing tool for iOS apps. It navigates your app using YAML-defined flows, captures screenshots at key points, and diffs them against approved baselines. If a pixel changes, you know about it before it ships.

If you've ever written XCUITest code, you know how much boilerplate is involved — launching apps, finding elements, waiting for animations, managing test runners. Grantiva replaces all of that with declarative YAML files that describe what a user does:

appId: com.kylebrowning.Landmarks
---
- launchApp
- assertVisible: "Landmarks"
- tapOn: "Golden Gate Bridge"
- assertVisible: "San Francisco, CA"
- takeScreenshot: "Landmark Detail"

That's a complete test. Launch the app, verify the title, tap a landmark, verify the detail screen, and capture a screenshot. No XCUIApplication setup, no XCTAssert chains, no brittle accessibility identifier management. And the takeScreenshot at the end captures the screen for visual diffing — if that detail screen ever changes unexpectedly, you'll see it.

Grantiva also reads Maestro flow files natively, so if you already have Maestro flows, they work as a drop-in replacement.

Install Grantiva with Homebrew:

brew install grantiva/tap/grantiva

Verify your environment is ready:

grantiva doctor

Grantiva runs against a simulator, uses a built-in XCUITest driver for automation, and works headless on CI out of the box. No Accessibility permission needed.

Setting Up a Test Scheme

The trick to making Grantiva work without a server is controlling which Services implementation the app uses at launch. We need the app to use .mock services when Grantiva is driving it, and .live services in normal operation.

The cleanest way to do this is with an Active Compilation Condition. Create a dedicated Xcode scheme called "Landmarks (UI Testing)" and add UI_TESTING to Build Settings > Swift Compiler - Custom Flags > Active Compilation Conditions. Then check for it in LandmarksApp.swift:

@main
struct LandmarksApp: App {
    let services: Services

    init() {
        #if UI_TESTING
        services = .mock
        #else
        #if DEBUG
        let baseURL = URL(string: "http://localhost:8080")!
        #else
        let baseURL = URL(string: "https://api.yourapp.com")!
        #endif
        services = .live(baseURL: baseURL)
        #endif
    }

    var body: some Scene {
        WindowGroup {
            ContentView()
                .services(services)
        }
    }
}

When you build with the "Landmarks (UI Testing)" scheme, the compiler flag activates .mock services. This means every service closure uses the in-memory mock implementations from Post 9. The MockReservationStore holds reservations in an array. The LandmarkStore serves pre-loaded landmark data. No server needed.

Tell Grantiva which scheme to use in your grantiva.yml:

scheme: "Landmarks (UI Testing)"
simulator: iPhone 16

Or pass it on the command line: grantiva diff capture --scheme "Landmarks (UI Testing)".

Stateful Mocks That Actually Work

Why do stateful mocks matter so much for UI testing? Because UI tests verify state transitions, not just static screens. When Grantiva taps "Reserve Visit", something needs to actually happen. The button needs to create a reservation, the store needs to update, and SwiftUI needs to re-render. If you're stubbing responses with hardcoded data, none of that works.

The closure-based service pattern from Post 3 solves this. The ReservationStore is already @Observable — views watch it for changes. The .mock variant just provides closures that update the store directly, no server needed:

extension ReservationService {
    static var mock: ReservationService {
        let store = ReservationStore()

        return ReservationService(
            store: store,
            createReservation: { landmark, date, partySize in
                let reservation = Reservation(
                    id: UUID(),
                    landmark: landmark,
                    date: date,
                    partySize: partySize,
                    status: .confirmed
                )
                await store.addReservation(reservation)
                return reservation
            },
            fetchReservations: { },
            cancelReservation: { id in
                await store.removeReservation(id: id)
            }
        )
    }
}

That's the entire mock. No separate mock class, no intermediary state — the closures operate directly on the @Observable store that SwiftUI is already watching. When Grantiva taps "Confirm Reservation", the createReservation closure fires, adds the reservation to the store, SwiftUI re-renders, and the confirmation screen appears. The data actually exists — not because something was hardcoded, but because the closure put it there.

The .mock variant slots into the Services container alongside the .live and .preview variants from Post 3:

extension Services {
    static func live(baseURL: URL) -> Services {
        Services(
            landmarks: .live(client: networkClient, baseURL: baseURL, cache: cache),
            reservations: .live(client: networkClient, baseURL: baseURL),
            analytics: .live
        )
    }

    static var mock: Services {
        Services(
            landmarks: .mock,
            reservations: .mock,
            analytics: .live
        )
    }

    static var preview: Services {
        Services(
            landmarks: .preview,
            reservations: .preview,
            analytics: .preview
        )
    }
}

Three variants, same interface. .live hits your Vapor server. .mock provides closures that update the @Observable store directly — perfect for UI testing because every interaction produces real state changes that SwiftUI sees immediately. .preview pre-populates sample data for SwiftUI previews. The views don't know or care which one they're using. They observe the store, and the store updates.

When you build with the "Landmarks (UI Testing)" scheme and #if UI_TESTING activates .mock, the entire app runs with stateful mocks. Grantiva navigates the real UI, tapping real buttons that trigger real closures that update real observable state. The only thing missing is the network.

This is what makes the visual diffs meaningful. A real server introduces variables — network latency, database state, race conditions, timeout errors. Mock stores eliminate all of that. If a Grantiva test passes once, it passes every time. A changed pixel means your code changed something, not that the server was slow.

Writing Grantiva Flows

Let's write flows that test the core user journeys in the Landmarks app. Each flow lives in your grantiva.yml configuration file. If you're already using Maestro, Grantiva will auto-detect and pick up your existing .maestro/ flows too.

Browse Landmarks

The simplest flow verifies that the app launches and displays landmark data correctly:

appId: com.kylebrowning.Landmarks
---
- launchApp
- assertVisible: "Landmarks"
- assertVisible: "Featured"
- takeScreenshot: "Home"
- tapOn: "Golden Gate Bridge"
- assertVisible: "San Francisco, CA"
- takeScreenshot: "Landmark Detail"

The flow launches the app, verifies the landmarks list and featured section appear, captures the home screen, taps into a landmark detail, verifies the location, and captures another screenshot.

Each takeScreenshot becomes a named capture point. When you run this later, Grantiva will diff "Home" and "Landmark Detail" against their baselines. If the layout changes, the font changes, or a view disappears — you'll see it in the diff.

Notice how readable this is. A non-engineer could look at this file and understand what it tests.

Reserve a Visit

This flow tests the full reservation journey — from browsing to confirming a reservation:

appId: com.kylebrowning.Landmarks
---
- launchApp
- tapOn: "Golden Gate Bridge"
- tapOn: "Reserve Visit"
- assertVisible: "Party Size"
- takeScreenshot: "Reservation Form"
- tapOn: "Confirm Reservation"
- assertVisible: "Reservation Confirmed"
- takeScreenshot: "Reservation Confirmed"
- tapOn: "Done"
- tapOn: "Reservations"
- assertVisible: "Golden Gate Bridge"
- takeScreenshot: "Reservations List"

This is where the MockReservationStore really shines. When Grantiva taps "Confirm Reservation", the mock store adds the reservation. When the flow navigates to the Reservations tab, the reservation is there because the store actually holds it in memory. No stubbed screens, no fake data injected at the view layer. The entire data flow works, just with local storage instead of a server.

The three screenshots capture the form, confirmation, and list — every visual state in the journey.

Cancel a Reservation

Testing cancellation requires first creating a reservation, then removing it:

appId: com.kylebrowning.Landmarks
---
- launchApp
# First create a reservation
- tapOn: "Golden Gate Bridge"
- tapOn: "Reserve Visit"
- tapOn: "Confirm Reservation"
- tapOn: "Done"
# Now cancel it
- tapOn: "Reservations"
- assertVisible: "Golden Gate Bridge"
- takeScreenshot: "Before Cancel"
- swipe:
    direction: "LEFT"
    from: "Golden Gate Bridge"
- tapOn: "Cancel"
- assertNotVisible: "Golden Gate Bridge"
- takeScreenshot: "After Cancel"

This flow tests a complete create-then-delete cycle. The assertNotVisible verifies that the reservation was actually removed from the list, not just hidden. The swipe-to-delete gesture is a native iOS pattern, and Grantiva handles it naturally.

The "Before Cancel" and "After Cancel" screenshots make the visual diff powerful — you can see exactly what changed when a reservation is removed.

Category Filtering

This flow verifies that category filtering shows the correct landmarks and hides others:

appId: com.kylebrowning.Landmarks
---
- launchApp
- tapOn: "Bridges"
- assertVisible: "Golden Gate Bridge"
- assertVisible: "Brooklyn Bridge"
- assertNotVisible: "Yosemite Valley"
- takeScreenshot: "Bridges Category"
- tapOn: "Mountains"
- assertVisible: "Yosemite Valley"
- assertNotVisible: "Golden Gate Bridge"
- takeScreenshot: "Mountains Category"

The assertNotVisible checks are just as important as the assertVisible ones. Verifying that "Yosemite Valley" does not appear in the Bridges category confirms that filtering actually works, not just that the category screen loads.

Running Locally

First, build the XCUITest driver (once per Xcode version):

grantiva driver build

Then capture screenshots for all flows:

grantiva diff capture --scheme "Landmarks (UI Testing)"

Grantiva boots the simulator, builds the app with your UI Testing scheme, installs and launches it, then executes each flow. For every takeScreenshot, it captures a PNG and saves it to .grantiva/captures/.

Approve the captures as baselines:

grantiva diff approve

Now, after making changes, compare against baselines:

grantiva diff compare

Grantiva diffs each screenshot using two metrics: pixel difference percentage and CIE76 perceptual color distance. If either threshold is exceeded, the screen fails and a diff image highlighting the changes is saved to .grantiva/captures/diffs/.

A few tips for local development:

Use --no-build for fast iteration. If you've already built and installed the app, skip the build step: grantiva diff capture --no-build.

Use --app-file with pre-built binaries. If your build pipeline produces an .app bundle, point Grantiva at it directly: grantiva diff capture --app-file ./build/Landmarks.app.

Use --json for scripting. All commands support --json for machine-readable output.

Running in CI

Grantiva flows should run on every pull request. A failing visual regression test means a user-facing change, and you want to review those before merging.

Here's a GitHub Actions workflow that builds the app, then runs visual regression testing:

name: Visual Regression

on:
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: macos-15
    steps:
      - uses: actions/checkout@v4

      - name: Build for UI Testing
        run: |
          xcodebuild build \
            -scheme "Landmarks (UI Testing)" \
            -destination "generic/platform=iOS Simulator" \
            -derivedDataPath build/

      - uses: actions/upload-artifact@v4
        with:
          name: app-binary
          path: build/Build/Products/Debug-iphonesimulator/Landmarks.app

  visual-regression:
    needs: build
    runs-on: macos-15
    steps:
      - uses: actions/checkout@v4

      - uses: actions/download-artifact@v4
        with:
          name: app-binary
          path: ./app

      - name: Install Grantiva CLI
        run: brew install grantiva/tap/grantiva

      - name: Run visual regression
        env:
          GRANTIVA_API_KEY: ${{ secrets.GRANTIVA_API_KEY }}
        run: grantiva ci run --app-file ./app/Landmarks.app

Save this as .github/workflows/visual-regression.yml. A few things to note about this workflow:

The build and test are separate jobs. The build step uploads the .app as an artifact, and the visual regression job downloads it. This means you can cache builds, run multiple Grantiva configs in parallel (different devices, locales), and only rebuild when code changes.

The build step uses the UI Testing scheme. This scheme includes the UI_TESTING compilation condition, so the built app uses mock services.

grantiva ci run does everything. It boots the simulator, installs the pre-built binary, launches the app, navigates every flow, captures screenshots, diffs against baselines, uploads results to the Grantiva dashboard, and posts a GitHub Check Run on your PR with before/after diffs.

No Maestro installation needed. Grantiva reads .maestro/ flow files natively. The same YAML files work with both tools.

Wrapping Up

This is the final post in the series. Let's take a step back and look at what we've built across all ten posts.

Post 1: Navigation established type-safe navigation with a Screen enum and centralized DestinationContent. Every screen in the app is a case in an enum, and the compiler enforces valid navigation.

Post 2: API vs Domain Models separated network concerns from business logic. API models decode JSON. Domain models power the UI. The .domainModel mapping validates data at the boundary.

Post 3: Dependency Injection replaced protocols and view models with closure-based services and observable stores. A Services container with .live, .mock, and .preview variants made every layer swappable.

Post 4: Caching added memory and disk caching with LRU eviction and flexible fetch policies. Offline mode fell out naturally from the architecture.

Post 5: Vapor Backend built the server with Fluent models, response types, and RESTful routes that mirror the iOS patterns.

Post 6: Full Integration wired every layer together into a complete client-server app. Network client, API model mapping, caching, observable stores, and SwiftUI views all working in concert.

Post 7: Deployment containerized the server with Docker, deployed to AWS with ECS Fargate, and automated everything with GitHub Actions.

Post 8: Server Integration Tests added integration tests that boot a real Vapor server with in-memory SQLite and verify every endpoint with actual HTTP requests.

Post 9: iOS App Testing tested the iOS app with mock services. The MockReservationStore proved that closure-based DI makes testing straightforward — swap the closures, verify the behavior.

Post 10: Visual Regression Testing (this post) used those same mock services to power YAML-based visual regression tests that verify real user flows and catch visual changes automatically — without a running server.

The thread running through all ten posts is clean separation of concerns, applied consistently at every layer. Views don't know about networking. Services don't know about views. The server doesn't know about the client's caching strategy. Each layer has a single job and a clear interface.

That separation is what makes everything else possible. Want to test the server? Swap in an in-memory database. Want to test the iOS app? Swap in mock services. Want to test the UI visually? Build with a flag that activates mocks, point Grantiva at the binary, and let it navigate your flows. The same architectural decisions that make the code clean also make it testable. That's not a coincidence. That's the whole point.