Add ZVMA pre/post script recipe + env-dump examples

Adds a Kubernetes-ZVMA companion to the existing Windows-ZVM recipe:

- scripts/examples/zerto-zvma-send.ps1 - Zerto-side sender for both
  pre and post phases, packages the Zerto* env vars into a structured
  JSON body and POSTs to a {phase}-templated webhook URL.
- scripts/examples/zerto-receiver-notify.ps1 - server-side receiver
  that posts a Slack/Teams notification, with phase-aware formatting
  and ZertoForce highlighted on pre.
- scripts/examples/zerto-receiver-vm-healthcheck.ps1 - server-side
  receiver that pings + port-probes each VM in VmDisplayNames after
  failover and writes a per-run JSON report.
- scripts/examples/send-env-vars.ps1 + save-env-vars.ps1 - generic
  env-dump client/receiver pair (the diagnostic that surfaced what
  the ZVMA scripts-service container exposes).
- docs/recipes/zerto-zvma-pre-post.md - full walkthrough mirroring
  the existing Windows-ZVM recipe's structure.
- README.md and docs/README.md - link the new recipe and examples.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 14:16:07 -04:00
parent 4954e94d08
commit 821ff9b9ef
8 changed files with 701 additions and 4 deletions
+277
View File
@@ -0,0 +1,277 @@
# Recipe: Zerto ZVMA (Kubernetes) pre/post scripts → notify + VM health check
> Companion to [Zerto failover post-script → DNS + service checks](zerto-pre-post-scripts.md).
> That recipe targets the **Windows ZVM** (the older deployment, where the
> Zerto-side script is a `.ps1` calling `curl.exe`). **This** recipe targets
> the **ZVMA on Kubernetes** — the newer deployment, where pre/post scripts
> run inside the in-cluster `scripts-service` container (Linux + pwsh 7).
> The webhook-server side is the same Windows service in both cases; only
> the Zerto-side runtime differs.
## What we're building
ZVMA's `scripts-service` pod runs your VPG pre/post scripts inside a Linux
container. It exposes a small set of `Zerto*` environment variables, and we
want to:
1. POST those variables to a Webhook Server endpoint at the start (pre) and
end (post) of every VPG operation, and
2. On the receiving Windows host, do something useful with them — at minimum
a chat notification, and on `post` a quick health check of the VMs that
just powered on.
The endpoints are **Async**, so the Zerto VPG sequence is never blocked by
slow downstream actions (notifications, port probes, etc.).
```
Zerto VPG operation starts
|
+-- ZVMA scripts-service container runs:
| /app/scripts-files/zerto-zvma-send.ps1 -Phase pre
| -> POST http://webhook.dr/hook/zerto-pre (async, returns 202)
|
+-- VMs come up at recovery site
|
+-- ZVMA scripts-service container runs:
/app/scripts-files/zerto-zvma-send.ps1 -Phase post
-> POST http://webhook.dr/hook/zerto-post (async, returns 202)
(meanwhile, on the webhook server)
/hook/zerto-pre -> Slack/Teams notification ("Test failover starting...")
/hook/zerto-post -> Slack/Teams notification + ping/port probe each VM,
write a JSON report to disk, exit non-zero on failure.
```
## What ZVMA exposes
Captured from a real Test failover; same set is present in pre and post:
| Variable | Example | Notes |
|---|---|---|
| `ZertoVPGName` | `ubuntu-2404-local` | The VPG that fired the script |
| `ZertoInternalVpgName` | `ubuntu-2404-local` | Usually identical to `ZertoVPGName` |
| `ZertoOperation` | `Test` | `Test` / `Failover` / `Move` / `FailoverBeforeCommit` / `FailoverDuringCommit` |
| `ZertoForce` | `Yes` (pre) / `No` (post) | Set to `Yes` only during the pre phase when force mode is on; reset to `No` by post |
| `VmDisplayNames` | `ubuntu-2404(1)(1)(1)` | Comma-separated for multi-VM VPGs; Test failovers add `(N)` suffixes |
| `ZertoHypervisorManagerIP` | `192.168.50.20` | The vCenter / Hyper-V manager ZVMA is talking to |
| `ZertoHypervisorManagerPort` | `443` | |
| `ZertoOutputDir` | `/app/scripts-output` | Container-side output dir (written back to ZVMA via PVC) |
| `ZertoWorkingDir` | `/app/scripts-files` | Where script files live in-container |
Branch on `ZertoOperation` to differentiate Test runs from real failovers.
**`ZertoForce` is only meaningful during the pre phase** — capture it there
if you need it later, because by post it's been reset.
## 1. The Zerto-side script (sender)
A ready-to-use script ships in this repo at
[`scripts/examples/zerto-zvma-send.ps1`](../../scripts/examples/zerto-zvma-send.ps1).
Place it where the `scripts-service` pod can read it — typically the
`scripts-service-scripts-files-pvc`, mounted at `/app/scripts-files/` — and
wire it into the VPG twice:
> **VPG settings → Recovery → Scripts → Pre-Recovery Script**
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
> Parameters: `-Phase pre`
>
> **VPG settings → Recovery → Scripts → Post-Recovery Script**
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
> Parameters: `-Phase post`
The default `$WebhookUrl` includes `{phase}` so one script + one URL config
serves both phases — `http://webhook.dr/hook/zerto-{phase}` becomes
`/hook/zerto-pre` and `/hook/zerto-post` automatically. Override with
`-WebhookUrl` and `-Bearer` if you'd rather pass them per-VPG.
The script POSTs a single JSON object:
```json
{
"phase": "pre",
"capturedAt": "2026-05-08T17:45:54Z",
"host": "scripts-service-f9b6cb7-4xbxq",
"zerto": {
"vpgName": "ubuntu-2404-local",
"internalVpgName": "ubuntu-2404-local",
"operation": "Test",
"force": "Yes",
"vmDisplayNames": "ubuntu-2404(1)(1)(1)",
"hypervisorManagerIP": "192.168.50.20",
"hypervisorManagerPort": "443",
"outputDir": "/app/scripts-output",
"workingDir": "/app/scripts-files"
}
}
```
A webhook outage **does not fail the VPG** — the script catches and exits 0.
Comment in the file shows how to flip that to strict mode if you'd rather a
webhook outage abort the failover.
## 2. The webhook-server-side scripts (receivers)
Two examples ship in the repo. Both read the JSON body from stdin (the
webhook server delivers the body to the script's stdin when **JSON body to
stdin** is ticked on the endpoint).
### a. Slack/Teams notification — both phases
[`scripts/examples/zerto-receiver-notify.ps1`](../../scripts/examples/zerto-receiver-notify.ps1)
posts a single-line summary to a Slack or Teams Incoming Webhook URL. It
picks an icon based on `ZertoOperation`:
- `Test` → 🧪 — benign, expected
- `Failover` → 🚨 — real production event
- `Move` → 🚚 — planned migration
…and highlights `ZertoForce=Yes` on the **pre** message so you can see at
a glance whether the operation was force-flagged.
Set the destination via `NOTIFY_URL` env var on the webhook host, or
hardcode at the top of the script.
### b. Post-recovery VM health check — post phase only
[`scripts/examples/zerto-receiver-vm-healthcheck.ps1`](../../scripts/examples/zerto-receiver-vm-healthcheck.ps1)
runs only on `phase=post` for operations that bring VMs up
(`Test`/`Failover`/`Move`/`FailoverBeforeCommit`/`FailoverDuringCommit`).
For each name in `VmDisplayNames` it:
1. Strips the trailing `(1)(1)(1)` suffix Zerto adds on Test failovers, so
DNS resolution targets the actual hostname.
2. Pings (`Test-Connection`).
3. Probes a configurable TCP port (`-ProbePort`, default `3389` for RDP;
use `22` for SSH or `443` for the web tier).
4. Writes a JSON report to
`C:\ProgramData\WebhookServer\zerto-healthchecks\<vpg>-<op>-<utcstamp>.json`.
5. Exits non-zero if any VM failed either probe — which surfaces in the
webhook server's run history (and outbound callback, if configured).
Bump the endpoint's **Timeout (sec)** to `120` when wiring this in, since
network probes can take a while.
## 3. Configure the endpoints in the GUI
Two endpoints. Identical except for the slug, the script, and (for the
healthcheck) the timeout.
### `zerto-pre`
| Section | Setting | Value |
|---|---|---|
| Identity | Slug | `zerto-pre` |
| Identity | Description | "Zerto pre-recovery: chat notification" |
| Auth | Mode | **Bearer** |
| Auth | Bearer secret | generate a 32-byte random string; reuse for `zerto-post` |
| Allowed clients | (one per line) | the IP of the K8s node running `scripts-service` (e.g. `192.168.50.30`) |
| Executor | Type | **Windows PowerShell** (or PowerShell 7) |
| Executor | Script path | `C:\scripts\zerto-receiver-notify.ps1` |
| Data passing | JSON body to stdin | ✓ |
| Run as | Identity | **Service** |
| Response | Mode | **Async** |
| Response | Timeout (sec) | `30` |
| Response | Fail on non-zero exit | unticked *(async hooks have no caller to receive a 502)* |
### `zerto-post`
Same as above, except:
| Setting | Value |
|---|---|
| Slug | `zerto-post` |
| Description | "Zerto post-recovery: notify + VM health check" |
| Script path | a **wrapper** that calls both receiver scripts in turn (see below) |
| Timeout (sec) | `120` |
Two receivers on one endpoint is easiest with a tiny wrapper that fans
stdin out to both scripts:
```powershell
# C:\scripts\zerto-post-fanout.ps1
$body = [Console]::In.ReadToEnd()
$body | & 'C:\scripts\zerto-receiver-notify.ps1'
$body | & 'C:\scripts\zerto-receiver-vm-healthcheck.ps1'
```
Or run the two as separate endpoints (`zerto-post-notify` and
`zerto-post-healthcheck`) and have the Zerto-side script POST to both —
either pattern is fine. The fanout wrapper keeps the Zerto config simpler.
## 4. Wire up the bearer token
On the ZVMA / scripts-service side, the easiest place to put the token is
a Kubernetes Secret mounted into the pod, but the simplest approach for
testing is to pass it as a parameter to the Zerto-side script:
> VPG settings → Pre-Recovery Script → Parameters:
> `-Phase pre -Bearer <paste-token>`
>
> VPG settings → Post-Recovery Script → Parameters:
> `-Phase post -Bearer <paste-token>`
For production, mount a Secret at a known path in the pod and have the
sender script read from it (`Get-Content /run/secrets/webhook-token`).
## 5. Test before going live
Run a Test failover on a non-critical VPG. Watch:
- **Slack/Teams**: a `:test_tube: Zerto Test - phase: pre` message arrives,
followed ~30sseveral minutes later by a `:test_tube: Zerto Test - phase:
post` message.
- **Webhook Server GUI** → run history: two runs for `zerto-pre` /
`zerto-post`, both green.
- **`C:\ProgramData\WebhookServer\zerto-healthchecks\`**: a fresh JSON
report named `<vpg>-Test-<utcstamp>.json` containing per-VM ping and port
probe results.
- **ZVMA**: the VPG operation completes successfully; nothing in the
pre/post logs blocked on the webhook.
## Variations
### Branch on Test vs. real failover in the receivers
The notifier already styles the message differently. To do something only
on a real failover (e.g. update DNS), guard with:
```powershell
if ($p.zerto.operation -ne 'Test') {
# do the destructive thing
}
```
A `ZertoOperation` of `Test` means "exercise — don't touch production
dependencies." Always check it before doing anything that mutates real
state.
### Capture `ZertoForce` from pre for use in post
`ZertoForce` is `Yes` only during the **pre** phase when force mode is on
and is reset to `No` by the **post** phase. If your post-side logic needs
to know the operation was force-flagged, save it during pre (e.g. write a
small marker to the shared `ZertoOutputDir`) and read it back during post.
### Per-VPG endpoints
For fine-grained access control or different actions per VPG, create one
endpoint per VPG (`zerto-pre-app01`, `zerto-post-app01`, …) with its own
bearer token. Override `-WebhookUrl` and `-Bearer` on the Zerto side per
VPG.
### Audit trail
Every endpoint can have an outbound **Callback** URL. Configure with your
SIEM's HTTP collector + an HMAC secret, and every run produces a JSON
record with runId, exit code, duration, stdout, and stderr — convenient
for compliance.
## Security note
The ZVMA `scripts-service` pod runs your scripts inside a Linux container
with broad reach into the management cluster — anything your script does
runs with whatever ServiceAccount that pod uses. Treat the script content
as privileged and make sure pre/post script edit rights are restricted to
trusted operators. If you're unfamiliar with the pod's RBAC posture, check
`Get-ChildItem Env:` from inside the container and look at
`/var/run/secrets/kubernetes.io/serviceaccount/` — that token is what your
scripts (and a malicious script) can use to talk to the K8s API.