276 lines
11 KiB
Markdown
276 lines
11 KiB
Markdown
# Recipe: Zerto ZVMA pre/post scripts → notify + VM health check
|
||
|
||
> This is the **canonical** Zerto recipe. It targets the **ZVMA on
|
||
> Kubernetes** — the supported deployment — where pre/post scripts run
|
||
> inside the in-cluster `scripts-service` container (Linux + pwsh 7). The
|
||
> webhook-server side is a normal Windows service that does the
|
||
> Windows-domain work the ZVMA container can't reach directly.
|
||
|
||
## What we're building
|
||
|
||
ZVMA's `scripts-service` pod runs your VPG pre/post scripts inside a Linux
|
||
container. It exposes a small set of `Zerto*` environment variables, and we
|
||
want to:
|
||
|
||
1. POST those variables to a Webhook Server endpoint at the start (pre) and
|
||
end (post) of every VPG operation, and
|
||
2. On the receiving Windows host, do something useful with them — at minimum
|
||
a chat notification, and on `post` a quick health check of the VMs that
|
||
just powered on.
|
||
|
||
The endpoints are **Async**, so the Zerto VPG sequence is never blocked by
|
||
slow downstream actions (notifications, port probes, etc.).
|
||
|
||
```
|
||
Zerto VPG operation starts
|
||
|
|
||
+-- ZVMA scripts-service container runs:
|
||
| /app/scripts-files/zerto-zvma-send.ps1 -Phase pre
|
||
| -> POST http://webhook.dr/hook/zerto-pre (async, returns 202)
|
||
|
|
||
+-- VMs come up at recovery site
|
||
|
|
||
+-- ZVMA scripts-service container runs:
|
||
/app/scripts-files/zerto-zvma-send.ps1 -Phase post
|
||
-> POST http://webhook.dr/hook/zerto-post (async, returns 202)
|
||
|
||
(meanwhile, on the webhook server)
|
||
/hook/zerto-pre -> Slack/Teams notification ("Test failover starting...")
|
||
/hook/zerto-post -> Slack/Teams notification + ping/port probe each VM,
|
||
write a JSON report to disk, exit non-zero on failure.
|
||
```
|
||
|
||
## What ZVMA exposes
|
||
|
||
Captured from a real Test failover; same set is present in pre and post:
|
||
|
||
| Variable | Example | Notes |
|
||
|---|---|---|
|
||
| `ZertoVPGName` | `ubuntu-2404-local` | The VPG that fired the script |
|
||
| `ZertoInternalVpgName` | `ubuntu-2404-local` | Usually identical to `ZertoVPGName` |
|
||
| `ZertoOperation` | `Test` | `Test` / `Failover` / `Move` / `FailoverBeforeCommit` / `FailoverDuringCommit` |
|
||
| `ZertoForce` | `Yes` (pre) / `No` (post) | Set to `Yes` only during the pre phase when force mode is on; reset to `No` by post |
|
||
| `VmDisplayNames` | `ubuntu-2404(1)(1)(1)` | Comma-separated for multi-VM VPGs; Test failovers add `(N)` suffixes |
|
||
| `ZertoHypervisorManagerIP` | `192.168.50.20` | The vCenter / Hyper-V manager ZVMA is talking to |
|
||
| `ZertoHypervisorManagerPort` | `443` | |
|
||
| `ZertoOutputDir` | `/app/scripts-output` | Container-side output dir (written back to ZVMA via PVC) |
|
||
| `ZertoWorkingDir` | `/app/scripts-files` | Where script files live in-container |
|
||
|
||
Branch on `ZertoOperation` to differentiate Test runs from real failovers.
|
||
**`ZertoForce` is only meaningful during the pre phase** — capture it there
|
||
if you need it later, because by post it's been reset.
|
||
|
||
## 1. The Zerto-side script (sender)
|
||
|
||
A ready-to-use script ships in this repo at
|
||
[`scripts/examples/zerto-zvma-send.ps1`](../../scripts/examples/zerto-zvma-send.ps1).
|
||
Place it where the `scripts-service` pod can read it — typically the
|
||
`scripts-service-scripts-files-pvc`, mounted at `/app/scripts-files/` — and
|
||
wire it into the VPG twice:
|
||
|
||
> **VPG settings → Recovery → Scripts → Pre-Recovery Script**
|
||
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
|
||
> Parameters: `-Phase pre`
|
||
>
|
||
> **VPG settings → Recovery → Scripts → Post-Recovery Script**
|
||
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
|
||
> Parameters: `-Phase post`
|
||
|
||
The default `$WebhookUrl` includes `{phase}` so one script + one URL config
|
||
serves both phases — `http://webhook.dr/hook/zerto-{phase}` becomes
|
||
`/hook/zerto-pre` and `/hook/zerto-post` automatically. Override with
|
||
`-WebhookUrl` and `-Bearer` if you'd rather pass them per-VPG.
|
||
|
||
The script POSTs a single JSON object:
|
||
|
||
```json
|
||
{
|
||
"phase": "pre",
|
||
"capturedAt": "2026-05-08T17:45:54Z",
|
||
"host": "scripts-service-f9b6cb7-4xbxq",
|
||
"zerto": {
|
||
"vpgName": "ubuntu-2404-local",
|
||
"internalVpgName": "ubuntu-2404-local",
|
||
"operation": "Test",
|
||
"force": "Yes",
|
||
"vmDisplayNames": "ubuntu-2404(1)(1)(1)",
|
||
"hypervisorManagerIP": "192.168.50.20",
|
||
"hypervisorManagerPort": "443",
|
||
"outputDir": "/app/scripts-output",
|
||
"workingDir": "/app/scripts-files"
|
||
}
|
||
}
|
||
```
|
||
|
||
A webhook outage **does not fail the VPG** — the script catches and exits 0.
|
||
Comment in the file shows how to flip that to strict mode if you'd rather a
|
||
webhook outage abort the failover.
|
||
|
||
## 2. The webhook-server-side scripts (receivers)
|
||
|
||
Two examples ship in the repo. Both read the JSON body from stdin (the
|
||
webhook server delivers the body to the script's stdin when **JSON body to
|
||
stdin** is ticked on the endpoint).
|
||
|
||
### a. Slack/Teams notification — both phases
|
||
|
||
[`scripts/examples/zerto-receiver-notify.ps1`](../../scripts/examples/zerto-receiver-notify.ps1)
|
||
posts a single-line summary to a Slack or Teams Incoming Webhook URL. It
|
||
picks an icon based on `ZertoOperation`:
|
||
|
||
- `Test` → 🧪 — benign, expected
|
||
- `Failover` → 🚨 — real production event
|
||
- `Move` → 🚚 — planned migration
|
||
|
||
…and highlights `ZertoForce=Yes` on the **pre** message so you can see at
|
||
a glance whether the operation was force-flagged.
|
||
|
||
Set the destination via `NOTIFY_URL` env var on the webhook host, or
|
||
hardcode at the top of the script.
|
||
|
||
### b. Post-recovery VM health check — post phase only
|
||
|
||
[`scripts/examples/zerto-receiver-vm-healthcheck.ps1`](../../scripts/examples/zerto-receiver-vm-healthcheck.ps1)
|
||
runs only on `phase=post` for operations that bring VMs up
|
||
(`Test`/`Failover`/`Move`/`FailoverBeforeCommit`/`FailoverDuringCommit`).
|
||
For each name in `VmDisplayNames` it:
|
||
|
||
1. Strips the trailing `(1)(1)(1)` suffix Zerto adds on Test failovers, so
|
||
DNS resolution targets the actual hostname.
|
||
2. Pings (`Test-Connection`).
|
||
3. Probes a configurable TCP port (`-ProbePort`, default `3389` for RDP;
|
||
use `22` for SSH or `443` for the web tier).
|
||
4. Writes a JSON report to
|
||
`C:\ProgramData\WebhookServer\zerto-healthchecks\<vpg>-<op>-<utcstamp>.json`.
|
||
5. Exits non-zero if any VM failed either probe — which surfaces in the
|
||
webhook server's run history (and outbound callback, if configured).
|
||
|
||
Bump the endpoint's **Timeout (sec)** to `120` when wiring this in, since
|
||
network probes can take a while.
|
||
|
||
## 3. Configure the endpoints in the GUI
|
||
|
||
Two endpoints. Identical except for the slug, the script, and (for the
|
||
healthcheck) the timeout.
|
||
|
||
### `zerto-pre`
|
||
|
||
| Section | Setting | Value |
|
||
|---|---|---|
|
||
| Identity | Slug | `zerto-pre` |
|
||
| Identity | Description | "Zerto pre-recovery: chat notification" |
|
||
| Auth | Mode | **Bearer** |
|
||
| Auth | Bearer secret | generate a 32-byte random string; reuse for `zerto-post` |
|
||
| Allowed clients | (one per line) | the IP of the K8s node running `scripts-service` (e.g. `192.168.50.30`) |
|
||
| Executor | Type | **Windows PowerShell** (or PowerShell 7) |
|
||
| Executor | Script path | `C:\scripts\zerto-receiver-notify.ps1` |
|
||
| Data passing | JSON body to stdin | ✓ |
|
||
| Run as | Identity | **Service** |
|
||
| Response | Mode | **Async** |
|
||
| Response | Timeout (sec) | `30` |
|
||
| Response | Fail on non-zero exit | unticked *(async hooks have no caller to receive a 502)* |
|
||
|
||
### `zerto-post`
|
||
|
||
Same as above, except:
|
||
|
||
| Setting | Value |
|
||
|---|---|
|
||
| Slug | `zerto-post` |
|
||
| Description | "Zerto post-recovery: notify + VM health check" |
|
||
| Script path | a **wrapper** that calls both receiver scripts in turn (see below) |
|
||
| Timeout (sec) | `120` |
|
||
|
||
Two receivers on one endpoint is easiest with a tiny wrapper that fans
|
||
stdin out to both scripts:
|
||
|
||
```powershell
|
||
# C:\scripts\zerto-post-fanout.ps1
|
||
$body = [Console]::In.ReadToEnd()
|
||
$body | & 'C:\scripts\zerto-receiver-notify.ps1'
|
||
$body | & 'C:\scripts\zerto-receiver-vm-healthcheck.ps1'
|
||
```
|
||
|
||
Or run the two as separate endpoints (`zerto-post-notify` and
|
||
`zerto-post-healthcheck`) and have the Zerto-side script POST to both —
|
||
either pattern is fine. The fanout wrapper keeps the Zerto config simpler.
|
||
|
||
## 4. Wire up the bearer token
|
||
|
||
On the ZVMA / scripts-service side, the easiest place to put the token is
|
||
a Kubernetes Secret mounted into the pod, but the simplest approach for
|
||
testing is to pass it as a parameter to the Zerto-side script:
|
||
|
||
> VPG settings → Pre-Recovery Script → Parameters:
|
||
> `-Phase pre -Bearer <paste-token>`
|
||
>
|
||
> VPG settings → Post-Recovery Script → Parameters:
|
||
> `-Phase post -Bearer <paste-token>`
|
||
|
||
For production, mount a Secret at a known path in the pod and have the
|
||
sender script read from it (`Get-Content /run/secrets/webhook-token`).
|
||
|
||
## 5. Test before going live
|
||
|
||
Run a Test failover on a non-critical VPG. Watch:
|
||
|
||
- **Slack/Teams**: a `:test_tube: Zerto Test - phase: pre` message arrives,
|
||
followed ~30s–several minutes later by a `:test_tube: Zerto Test - phase:
|
||
post` message.
|
||
- **Webhook Server GUI** → run history: two runs for `zerto-pre` /
|
||
`zerto-post`, both green.
|
||
- **`C:\ProgramData\WebhookServer\zerto-healthchecks\`**: a fresh JSON
|
||
report named `<vpg>-Test-<utcstamp>.json` containing per-VM ping and port
|
||
probe results.
|
||
- **ZVMA**: the VPG operation completes successfully; nothing in the
|
||
pre/post logs blocked on the webhook.
|
||
|
||
## Variations
|
||
|
||
### Branch on Test vs. real failover in the receivers
|
||
|
||
The notifier already styles the message differently. To do something only
|
||
on a real failover (e.g. update DNS), guard with:
|
||
|
||
```powershell
|
||
if ($p.zerto.operation -ne 'Test') {
|
||
# do the destructive thing
|
||
}
|
||
```
|
||
|
||
A `ZertoOperation` of `Test` means "exercise — don't touch production
|
||
dependencies." Always check it before doing anything that mutates real
|
||
state.
|
||
|
||
### Capture `ZertoForce` from pre for use in post
|
||
|
||
`ZertoForce` is `Yes` only during the **pre** phase when force mode is on
|
||
and is reset to `No` by the **post** phase. If your post-side logic needs
|
||
to know the operation was force-flagged, save it during pre (e.g. write a
|
||
small marker to the shared `ZertoOutputDir`) and read it back during post.
|
||
|
||
### Per-VPG endpoints
|
||
|
||
For fine-grained access control or different actions per VPG, create one
|
||
endpoint per VPG (`zerto-pre-app01`, `zerto-post-app01`, …) with its own
|
||
bearer token. Override `-WebhookUrl` and `-Bearer` on the Zerto side per
|
||
VPG.
|
||
|
||
### Audit trail
|
||
|
||
Every endpoint can have an outbound **Callback** URL. Configure with your
|
||
SIEM's HTTP collector + an HMAC secret, and every run produces a JSON
|
||
record with runId, exit code, duration, stdout, and stderr — convenient
|
||
for compliance.
|
||
|
||
## Security note
|
||
|
||
The ZVMA `scripts-service` pod runs your scripts inside a Linux container
|
||
with broad reach into the management cluster — anything your script does
|
||
runs with whatever ServiceAccount that pod uses. Treat the script content
|
||
as privileged and make sure pre/post script edit rights are restricted to
|
||
trusted operators. If you're unfamiliar with the pod's RBAC posture, check
|
||
`Get-ChildItem Env:` from inside the container and look at
|
||
`/var/run/secrets/kubernetes.io/serviceaccount/` — that token is what your
|
||
scripts (and a malicious script) can use to talk to the K8s API.
|