8c051778a4
The Windows ZVM is largely deprecated in favor of the ZVMA on Kubernetes, so the older recipe and its companion sender script are gone. The ZVMA recipe is promoted to canonical and its header no longer references the deleted recipe. - delete docs/recipes/zerto-pre-post-scripts.md (Windows-ZVM-only) - delete scripts/examples/zerto-post-failover.ps1 (curl.exe sender) - promote ZVMA recipe in README, docs/README, installation, sync-wiki If anyone still needs the DNS-update / service-check handler from the deleted recipe it's available in git history (commit before this one). Happy to re-resurrect into a generic post-failover recipe if folks ask.
276 lines
11 KiB
Markdown
276 lines
11 KiB
Markdown
# Recipe: Zerto ZVMA pre/post scripts → notify + VM health check
|
||
|
||
> This is the **canonical** Zerto recipe. It targets the **ZVMA on
|
||
> Kubernetes** — the supported deployment — where pre/post scripts run
|
||
> inside the in-cluster `scripts-service` container (Linux + pwsh 7). The
|
||
> webhook-server side is a normal Windows service that does the
|
||
> Windows-domain work the ZVMA container can't reach directly.
|
||
|
||
## What we're building
|
||
|
||
ZVMA's `scripts-service` pod runs your VPG pre/post scripts inside a Linux
|
||
container. It exposes a small set of `Zerto*` environment variables, and we
|
||
want to:
|
||
|
||
1. POST those variables to a Webhook Server endpoint at the start (pre) and
|
||
end (post) of every VPG operation, and
|
||
2. On the receiving Windows host, do something useful with them — at minimum
|
||
a chat notification, and on `post` a quick health check of the VMs that
|
||
just powered on.
|
||
|
||
The endpoints are **Async**, so the Zerto VPG sequence is never blocked by
|
||
slow downstream actions (notifications, port probes, etc.).
|
||
|
||
```
|
||
Zerto VPG operation starts
|
||
|
|
||
+-- ZVMA scripts-service container runs:
|
||
| /app/scripts-files/zerto-zvma-send.ps1 -Phase pre
|
||
| -> POST http://webhook.dr/hook/zerto-pre (async, returns 202)
|
||
|
|
||
+-- VMs come up at recovery site
|
||
|
|
||
+-- ZVMA scripts-service container runs:
|
||
/app/scripts-files/zerto-zvma-send.ps1 -Phase post
|
||
-> POST http://webhook.dr/hook/zerto-post (async, returns 202)
|
||
|
||
(meanwhile, on the webhook server)
|
||
/hook/zerto-pre -> Slack/Teams notification ("Test failover starting...")
|
||
/hook/zerto-post -> Slack/Teams notification + ping/port probe each VM,
|
||
write a JSON report to disk, exit non-zero on failure.
|
||
```
|
||
|
||
## What ZVMA exposes
|
||
|
||
Captured from a real Test failover; same set is present in pre and post:
|
||
|
||
| Variable | Example | Notes |
|
||
|---|---|---|
|
||
| `ZertoVPGName` | `ubuntu-2404-local` | The VPG that fired the script |
|
||
| `ZertoInternalVpgName` | `ubuntu-2404-local` | Usually identical to `ZertoVPGName` |
|
||
| `ZertoOperation` | `Test` | `Test` / `Failover` / `Move` / `FailoverBeforeCommit` / `FailoverDuringCommit` |
|
||
| `ZertoForce` | `Yes` (pre) / `No` (post) | Set to `Yes` only during the pre phase when force mode is on; reset to `No` by post |
|
||
| `VmDisplayNames` | `ubuntu-2404(1)(1)(1)` | Comma-separated for multi-VM VPGs; Test failovers add `(N)` suffixes |
|
||
| `ZertoHypervisorManagerIP` | `192.168.50.20` | The vCenter / Hyper-V manager ZVMA is talking to |
|
||
| `ZertoHypervisorManagerPort` | `443` | |
|
||
| `ZertoOutputDir` | `/app/scripts-output` | Container-side output dir (written back to ZVMA via PVC) |
|
||
| `ZertoWorkingDir` | `/app/scripts-files` | Where script files live in-container |
|
||
|
||
Branch on `ZertoOperation` to differentiate Test runs from real failovers.
|
||
**`ZertoForce` is only meaningful during the pre phase** — capture it there
|
||
if you need it later, because by post it's been reset.
|
||
|
||
## 1. The Zerto-side script (sender)
|
||
|
||
A ready-to-use script ships in this repo at
|
||
[`scripts/examples/zerto-zvma-send.ps1`](../../scripts/examples/zerto-zvma-send.ps1).
|
||
Place it where the `scripts-service` pod can read it — typically the
|
||
`scripts-service-scripts-files-pvc`, mounted at `/app/scripts-files/` — and
|
||
wire it into the VPG twice:
|
||
|
||
> **VPG settings → Recovery → Scripts → Pre-Recovery Script**
|
||
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
|
||
> Parameters: `-Phase pre`
|
||
>
|
||
> **VPG settings → Recovery → Scripts → Post-Recovery Script**
|
||
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
|
||
> Parameters: `-Phase post`
|
||
|
||
The default `$WebhookUrl` includes `{phase}` so one script + one URL config
|
||
serves both phases — `http://webhook.dr/hook/zerto-{phase}` becomes
|
||
`/hook/zerto-pre` and `/hook/zerto-post` automatically. Override with
|
||
`-WebhookUrl` and `-Bearer` if you'd rather pass them per-VPG.
|
||
|
||
The script POSTs a single JSON object:
|
||
|
||
```json
|
||
{
|
||
"phase": "pre",
|
||
"capturedAt": "2026-05-08T17:45:54Z",
|
||
"host": "scripts-service-f9b6cb7-4xbxq",
|
||
"zerto": {
|
||
"vpgName": "ubuntu-2404-local",
|
||
"internalVpgName": "ubuntu-2404-local",
|
||
"operation": "Test",
|
||
"force": "Yes",
|
||
"vmDisplayNames": "ubuntu-2404(1)(1)(1)",
|
||
"hypervisorManagerIP": "192.168.50.20",
|
||
"hypervisorManagerPort": "443",
|
||
"outputDir": "/app/scripts-output",
|
||
"workingDir": "/app/scripts-files"
|
||
}
|
||
}
|
||
```
|
||
|
||
A webhook outage **does not fail the VPG** — the script catches and exits 0.
|
||
Comment in the file shows how to flip that to strict mode if you'd rather a
|
||
webhook outage abort the failover.
|
||
|
||
## 2. The webhook-server-side scripts (receivers)
|
||
|
||
Two examples ship in the repo. Both read the JSON body from stdin (the
|
||
webhook server delivers the body to the script's stdin when **JSON body to
|
||
stdin** is ticked on the endpoint).
|
||
|
||
### a. Slack/Teams notification — both phases
|
||
|
||
[`scripts/examples/zerto-receiver-notify.ps1`](../../scripts/examples/zerto-receiver-notify.ps1)
|
||
posts a single-line summary to a Slack or Teams Incoming Webhook URL. It
|
||
picks an icon based on `ZertoOperation`:
|
||
|
||
- `Test` → 🧪 — benign, expected
|
||
- `Failover` → 🚨 — real production event
|
||
- `Move` → 🚚 — planned migration
|
||
|
||
…and highlights `ZertoForce=Yes` on the **pre** message so you can see at
|
||
a glance whether the operation was force-flagged.
|
||
|
||
Set the destination via `NOTIFY_URL` env var on the webhook host, or
|
||
hardcode at the top of the script.
|
||
|
||
### b. Post-recovery VM health check — post phase only
|
||
|
||
[`scripts/examples/zerto-receiver-vm-healthcheck.ps1`](../../scripts/examples/zerto-receiver-vm-healthcheck.ps1)
|
||
runs only on `phase=post` for operations that bring VMs up
|
||
(`Test`/`Failover`/`Move`/`FailoverBeforeCommit`/`FailoverDuringCommit`).
|
||
For each name in `VmDisplayNames` it:
|
||
|
||
1. Strips the trailing `(1)(1)(1)` suffix Zerto adds on Test failovers, so
|
||
DNS resolution targets the actual hostname.
|
||
2. Pings (`Test-Connection`).
|
||
3. Probes a configurable TCP port (`-ProbePort`, default `3389` for RDP;
|
||
use `22` for SSH or `443` for the web tier).
|
||
4. Writes a JSON report to
|
||
`C:\ProgramData\WebhookServer\zerto-healthchecks\<vpg>-<op>-<utcstamp>.json`.
|
||
5. Exits non-zero if any VM failed either probe — which surfaces in the
|
||
webhook server's run history (and outbound callback, if configured).
|
||
|
||
Bump the endpoint's **Timeout (sec)** to `120` when wiring this in, since
|
||
network probes can take a while.
|
||
|
||
## 3. Configure the endpoints in the GUI
|
||
|
||
Two endpoints. Identical except for the slug, the script, and (for the
|
||
healthcheck) the timeout.
|
||
|
||
### `zerto-pre`
|
||
|
||
| Section | Setting | Value |
|
||
|---|---|---|
|
||
| Identity | Slug | `zerto-pre` |
|
||
| Identity | Description | "Zerto pre-recovery: chat notification" |
|
||
| Auth | Mode | **Bearer** |
|
||
| Auth | Bearer secret | generate a 32-byte random string; reuse for `zerto-post` |
|
||
| Allowed clients | (one per line) | the IP of the K8s node running `scripts-service` (e.g. `192.168.50.30`) |
|
||
| Executor | Type | **Windows PowerShell** (or PowerShell 7) |
|
||
| Executor | Script path | `C:\scripts\zerto-receiver-notify.ps1` |
|
||
| Data passing | JSON body to stdin | ✓ |
|
||
| Run as | Identity | **Service** |
|
||
| Response | Mode | **Async** |
|
||
| Response | Timeout (sec) | `30` |
|
||
| Response | Fail on non-zero exit | unticked *(async hooks have no caller to receive a 502)* |
|
||
|
||
### `zerto-post`
|
||
|
||
Same as above, except:
|
||
|
||
| Setting | Value |
|
||
|---|---|
|
||
| Slug | `zerto-post` |
|
||
| Description | "Zerto post-recovery: notify + VM health check" |
|
||
| Script path | a **wrapper** that calls both receiver scripts in turn (see below) |
|
||
| Timeout (sec) | `120` |
|
||
|
||
Two receivers on one endpoint is easiest with a tiny wrapper that fans
|
||
stdin out to both scripts:
|
||
|
||
```powershell
|
||
# C:\scripts\zerto-post-fanout.ps1
|
||
$body = [Console]::In.ReadToEnd()
|
||
$body | & 'C:\scripts\zerto-receiver-notify.ps1'
|
||
$body | & 'C:\scripts\zerto-receiver-vm-healthcheck.ps1'
|
||
```
|
||
|
||
Or run the two as separate endpoints (`zerto-post-notify` and
|
||
`zerto-post-healthcheck`) and have the Zerto-side script POST to both —
|
||
either pattern is fine. The fanout wrapper keeps the Zerto config simpler.
|
||
|
||
## 4. Wire up the bearer token
|
||
|
||
On the ZVMA / scripts-service side, the easiest place to put the token is
|
||
a Kubernetes Secret mounted into the pod, but the simplest approach for
|
||
testing is to pass it as a parameter to the Zerto-side script:
|
||
|
||
> VPG settings → Pre-Recovery Script → Parameters:
|
||
> `-Phase pre -Bearer <paste-token>`
|
||
>
|
||
> VPG settings → Post-Recovery Script → Parameters:
|
||
> `-Phase post -Bearer <paste-token>`
|
||
|
||
For production, mount a Secret at a known path in the pod and have the
|
||
sender script read from it (`Get-Content /run/secrets/webhook-token`).
|
||
|
||
## 5. Test before going live
|
||
|
||
Run a Test failover on a non-critical VPG. Watch:
|
||
|
||
- **Slack/Teams**: a `:test_tube: Zerto Test - phase: pre` message arrives,
|
||
followed ~30s–several minutes later by a `:test_tube: Zerto Test - phase:
|
||
post` message.
|
||
- **Webhook Server GUI** → run history: two runs for `zerto-pre` /
|
||
`zerto-post`, both green.
|
||
- **`C:\ProgramData\WebhookServer\zerto-healthchecks\`**: a fresh JSON
|
||
report named `<vpg>-Test-<utcstamp>.json` containing per-VM ping and port
|
||
probe results.
|
||
- **ZVMA**: the VPG operation completes successfully; nothing in the
|
||
pre/post logs blocked on the webhook.
|
||
|
||
## Variations
|
||
|
||
### Branch on Test vs. real failover in the receivers
|
||
|
||
The notifier already styles the message differently. To do something only
|
||
on a real failover (e.g. update DNS), guard with:
|
||
|
||
```powershell
|
||
if ($p.zerto.operation -ne 'Test') {
|
||
# do the destructive thing
|
||
}
|
||
```
|
||
|
||
A `ZertoOperation` of `Test` means "exercise — don't touch production
|
||
dependencies." Always check it before doing anything that mutates real
|
||
state.
|
||
|
||
### Capture `ZertoForce` from pre for use in post
|
||
|
||
`ZertoForce` is `Yes` only during the **pre** phase when force mode is on
|
||
and is reset to `No` by the **post** phase. If your post-side logic needs
|
||
to know the operation was force-flagged, save it during pre (e.g. write a
|
||
small marker to the shared `ZertoOutputDir`) and read it back during post.
|
||
|
||
### Per-VPG endpoints
|
||
|
||
For fine-grained access control or different actions per VPG, create one
|
||
endpoint per VPG (`zerto-pre-app01`, `zerto-post-app01`, …) with its own
|
||
bearer token. Override `-WebhookUrl` and `-Bearer` on the Zerto side per
|
||
VPG.
|
||
|
||
### Audit trail
|
||
|
||
Every endpoint can have an outbound **Callback** URL. Configure with your
|
||
SIEM's HTTP collector + an HMAC secret, and every run produces a JSON
|
||
record with runId, exit code, duration, stdout, and stderr — convenient
|
||
for compliance.
|
||
|
||
## Security note
|
||
|
||
The ZVMA `scripts-service` pod runs your scripts inside a Linux container
|
||
with broad reach into the management cluster — anything your script does
|
||
runs with whatever ServiceAccount that pod uses. Treat the script content
|
||
as privileged and make sure pre/post script edit rights are restricted to
|
||
trusted operators. If you're unfamiliar with the pod's RBAC posture, check
|
||
`Get-ChildItem Env:` from inside the container and look at
|
||
`/var/run/secrets/kubernetes.io/serviceaccount/` — that token is what your
|
||
scripts (and a malicious script) can use to talk to the K8s API.
|