Files

T

justin 821ff9b9ef Add ZVMA pre/post script recipe + env-dump examples

Adds a Kubernetes-ZVMA companion to the existing Windows-ZVM recipe:

- scripts/examples/zerto-zvma-send.ps1 - Zerto-side sender for both
  pre and post phases, packages the Zerto* env vars into a structured
  JSON body and POSTs to a {phase}-templated webhook URL.
- scripts/examples/zerto-receiver-notify.ps1 - server-side receiver
  that posts a Slack/Teams notification, with phase-aware formatting
  and ZertoForce highlighted on pre.
- scripts/examples/zerto-receiver-vm-healthcheck.ps1 - server-side
  receiver that pings + port-probes each VM in VmDisplayNames after
  failover and writes a per-run JSON report.
- scripts/examples/send-env-vars.ps1 + save-env-vars.ps1 - generic
  env-dump client/receiver pair (the diagnostic that surfaced what
  the ZVMA scripts-service container exposes).
- docs/recipes/zerto-zvma-pre-post.md - full walkthrough mirroring
  the existing Windows-ZVM recipe's structure.
- README.md and docs/README.md - link the new recipe and examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 14:16:07 -04:00

11 KiB

Raw Permalink Blame History

Recipe: Zerto ZVMA (Kubernetes) pre/post scripts → notify + VM health check

Companion to Zerto failover post-script → DNS + service checks. That recipe targets the Windows ZVM (the older deployment, where the Zerto-side script is a .ps1 calling curl.exe). This recipe targets the ZVMA on Kubernetes — the newer deployment, where pre/post scripts run inside the in-cluster scripts-service container (Linux + pwsh 7). The webhook-server side is the same Windows service in both cases; only the Zerto-side runtime differs.

What we're building

ZVMA's scripts-service pod runs your VPG pre/post scripts inside a Linux container. It exposes a small set of Zerto* environment variables, and we want to:

POST those variables to a Webhook Server endpoint at the start (pre) and end (post) of every VPG operation, and
On the receiving Windows host, do something useful with them — at minimum a chat notification, and on post a quick health check of the VMs that just powered on.

The endpoints are Async, so the Zerto VPG sequence is never blocked by slow downstream actions (notifications, port probes, etc.).

Zerto VPG operation starts
   |
   +-- ZVMA scripts-service container runs:
   |     /app/scripts-files/zerto-zvma-send.ps1 -Phase pre
   |       -> POST http://webhook.dr/hook/zerto-pre   (async, returns 202)
   |
   +-- VMs come up at recovery site
   |
   +-- ZVMA scripts-service container runs:
         /app/scripts-files/zerto-zvma-send.ps1 -Phase post
           -> POST http://webhook.dr/hook/zerto-post  (async, returns 202)

(meanwhile, on the webhook server)
   /hook/zerto-pre  -> Slack/Teams notification ("Test failover starting...")
   /hook/zerto-post -> Slack/Teams notification + ping/port probe each VM,
                       write a JSON report to disk, exit non-zero on failure.

What ZVMA exposes

Captured from a real Test failover; same set is present in pre and post:

Variable	Example	Notes
`ZertoVPGName`	`ubuntu-2404-local`	The VPG that fired the script
`ZertoInternalVpgName`	`ubuntu-2404-local`	Usually identical to `ZertoVPGName`
`ZertoOperation`	`Test`	`Test` / `Failover` / `Move` / `FailoverBeforeCommit` / `FailoverDuringCommit`
`ZertoForce`	`Yes` (pre) / `No` (post)	Set to `Yes` only during the pre phase when force mode is on; reset to `No` by post
`VmDisplayNames`	`ubuntu-2404(1)(1)(1)`	Comma-separated for multi-VM VPGs; Test failovers add `(N)` suffixes
`ZertoHypervisorManagerIP`	`192.168.50.20`	The vCenter / Hyper-V manager ZVMA is talking to
`ZertoHypervisorManagerPort`	`443`
`ZertoOutputDir`	`/app/scripts-output`	Container-side output dir (written back to ZVMA via PVC)
`ZertoWorkingDir`	`/app/scripts-files`	Where script files live in-container

Branch on ZertoOperation to differentiate Test runs from real failovers. ZertoForce is only meaningful during the pre phase — capture it there if you need it later, because by post it's been reset.

1. The Zerto-side script (sender)

A ready-to-use script ships in this repo at scripts/examples/zerto-zvma-send.ps1. Place it where the scripts-service pod can read it — typically the scripts-service-scripts-files-pvc, mounted at /app/scripts-files/ — and wire it into the VPG twice:

VPG settings → Recovery → Scripts → Pre-Recovery Script Path: /app/scripts-files/zerto-zvma-send.ps1 Parameters: -Phase pre

VPG settings → Recovery → Scripts → Post-Recovery Script Path: /app/scripts-files/zerto-zvma-send.ps1 Parameters: -Phase post

The default $WebhookUrl includes {phase} so one script + one URL config serves both phases — http://webhook.dr/hook/zerto-{phase} becomes /hook/zerto-pre and /hook/zerto-post automatically. Override with -WebhookUrl and -Bearer if you'd rather pass them per-VPG.

The script POSTs a single JSON object:

{
  "phase": "pre",
  "capturedAt": "2026-05-08T17:45:54Z",
  "host": "scripts-service-f9b6cb7-4xbxq",
  "zerto": {
    "vpgName":               "ubuntu-2404-local",
    "internalVpgName":       "ubuntu-2404-local",
    "operation":             "Test",
    "force":                 "Yes",
    "vmDisplayNames":        "ubuntu-2404(1)(1)(1)",
    "hypervisorManagerIP":   "192.168.50.20",
    "hypervisorManagerPort": "443",
    "outputDir":             "/app/scripts-output",
    "workingDir":            "/app/scripts-files"
  }
}

A webhook outage does not fail the VPG — the script catches and exits 0. Comment in the file shows how to flip that to strict mode if you'd rather a webhook outage abort the failover.

2. The webhook-server-side scripts (receivers)

Two examples ship in the repo. Both read the JSON body from stdin (the webhook server delivers the body to the script's stdin when JSON body to stdin is ticked on the endpoint).

a. Slack/Teams notification — both phases

scripts/examples/zerto-receiver-notify.ps1 posts a single-line summary to a Slack or Teams Incoming Webhook URL. It picks an icon based on ZertoOperation:

Test → 🧪 — benign, expected
Failover → 🚨 — real production event
Move → 🚚 — planned migration

…and highlights ZertoForce=Yes on the pre message so you can see at a glance whether the operation was force-flagged.

Set the destination via NOTIFY_URL env var on the webhook host, or hardcode at the top of the script.

b. Post-recovery VM health check — post phase only

scripts/examples/zerto-receiver-vm-healthcheck.ps1 runs only on phase=post for operations that bring VMs up (Test/Failover/Move/FailoverBeforeCommit/FailoverDuringCommit). For each name in VmDisplayNames it:

Strips the trailing (1)(1)(1) suffix Zerto adds on Test failovers, so DNS resolution targets the actual hostname.
Pings (Test-Connection).
Probes a configurable TCP port (-ProbePort, default 3389 for RDP; use 22 for SSH or 443 for the web tier).
Writes a JSON report to C:\ProgramData\WebhookServer\zerto-healthchecks\<vpg>-<op>-<utcstamp>.json.
Exits non-zero if any VM failed either probe — which surfaces in the webhook server's run history (and outbound callback, if configured).

Bump the endpoint's Timeout (sec) to 120 when wiring this in, since network probes can take a while.

3. Configure the endpoints in the GUI

Two endpoints. Identical except for the slug, the script, and (for the healthcheck) the timeout.

`zerto-pre`

Section	Setting	Value
Identity	Slug	`zerto-pre`
Identity	Description	"Zerto pre-recovery: chat notification"
Auth	Mode	Bearer
Auth	Bearer secret	generate a 32-byte random string; reuse for `zerto-post`
Allowed clients	(one per line)	the IP of the K8s node running `scripts-service` (e.g. `192.168.50.30`)
Executor	Type	Windows PowerShell (or PowerShell 7)
Executor	Script path	`C:\scripts\zerto-receiver-notify.ps1`
Data passing	JSON body to stdin	✓
Run as	Identity	Service
Response	Mode	Async
Response	Timeout (sec)	`30`
Response	Fail on non-zero exit	unticked (async hooks have no caller to receive a 502)

`zerto-post`

Same as above, except:

Setting	Value
Slug	`zerto-post`
Description	"Zerto post-recovery: notify + VM health check"
Script path	a wrapper that calls both receiver scripts in turn (see below)
Timeout (sec)	`120`

Two receivers on one endpoint is easiest with a tiny wrapper that fans stdin out to both scripts:

# C:\scripts\zerto-post-fanout.ps1
$body = [Console]::In.ReadToEnd()
$body | & 'C:\scripts\zerto-receiver-notify.ps1'
$body | & 'C:\scripts\zerto-receiver-vm-healthcheck.ps1'

Or run the two as separate endpoints (zerto-post-notify and zerto-post-healthcheck) and have the Zerto-side script POST to both — either pattern is fine. The fanout wrapper keeps the Zerto config simpler.

4. Wire up the bearer token

On the ZVMA / scripts-service side, the easiest place to put the token is a Kubernetes Secret mounted into the pod, but the simplest approach for testing is to pass it as a parameter to the Zerto-side script:

VPG settings → Pre-Recovery Script → Parameters: -Phase pre -Bearer <paste-token>

VPG settings → Post-Recovery Script → Parameters: -Phase post -Bearer <paste-token>

For production, mount a Secret at a known path in the pod and have the sender script read from it (Get-Content /run/secrets/webhook-token).

5. Test before going live

Run a Test failover on a non-critical VPG. Watch:

Slack/Teams: a :test_tube: Zerto Test - phase: pre message arrives, followed ~30s–several minutes later by a :test_tube: Zerto Test - phase: post message.
Webhook Server GUI → run history: two runs for zerto-pre / zerto-post, both green.
C:\ProgramData\WebhookServer\zerto-healthchecks\: a fresh JSON report named <vpg>-Test-<utcstamp>.json containing per-VM ping and port probe results.
ZVMA: the VPG operation completes successfully; nothing in the pre/post logs blocked on the webhook.

Variations

Branch on Test vs. real failover in the receivers

The notifier already styles the message differently. To do something only on a real failover (e.g. update DNS), guard with:

if ($p.zerto.operation -ne 'Test') {
    # do the destructive thing
}

A ZertoOperation of Test means "exercise — don't touch production dependencies." Always check it before doing anything that mutates real state.

Capture `ZertoForce` from pre for use in post

ZertoForce is Yes only during the pre phase when force mode is on and is reset to No by the post phase. If your post-side logic needs to know the operation was force-flagged, save it during pre (e.g. write a small marker to the shared ZertoOutputDir) and read it back during post.

Per-VPG endpoints

For fine-grained access control or different actions per VPG, create one endpoint per VPG (zerto-pre-app01, zerto-post-app01, …) with its own bearer token. Override -WebhookUrl and -Bearer on the Zerto side per VPG.

Audit trail

Every endpoint can have an outbound Callback URL. Configure with your SIEM's HTTP collector + an HMAC secret, and every run produces a JSON record with runId, exit code, duration, stdout, and stderr — convenient for compliance.

Security note

The ZVMA scripts-service pod runs your scripts inside a Linux container with broad reach into the management cluster — anything your script does runs with whatever ServiceAccount that pod uses. Treat the script content as privileged and make sure pre/post script edit rights are restricted to trusted operators. If you're unfamiliar with the pod's RBAC posture, check Get-ChildItem Env: from inside the container and look at /var/run/secrets/kubernetes.io/serviceaccount/ — that token is what your scripts (and a malicious script) can use to talk to the K8s API.

11 KiB Raw Permalink Blame History Unescape Escape