Files

276 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Recipe: Zerto ZVMA pre/post scripts → notify + VM health check
> This is the **canonical** Zerto recipe. It targets the **ZVMA on
> Kubernetes** — the supported deployment — where pre/post scripts run
> inside the in-cluster `scripts-service` container (Linux + pwsh 7). The
> webhook-server side is a normal Windows service that does the
> Windows-domain work the ZVMA container can't reach directly.
## What we're building
ZVMA's `scripts-service` pod runs your VPG pre/post scripts inside a Linux
container. It exposes a small set of `Zerto*` environment variables, and we
want to:
1. POST those variables to a Webhook Server endpoint at the start (pre) and
end (post) of every VPG operation, and
2. On the receiving Windows host, do something useful with them — at minimum
a chat notification, and on `post` a quick health check of the VMs that
just powered on.
The endpoints are **Async**, so the Zerto VPG sequence is never blocked by
slow downstream actions (notifications, port probes, etc.).
```
Zerto VPG operation starts
|
+-- ZVMA scripts-service container runs:
| /app/scripts-files/zerto-zvma-send.ps1 -Phase pre
| -> POST http://webhook.dr/hook/zerto-pre (async, returns 202)
|
+-- VMs come up at recovery site
|
+-- ZVMA scripts-service container runs:
/app/scripts-files/zerto-zvma-send.ps1 -Phase post
-> POST http://webhook.dr/hook/zerto-post (async, returns 202)
(meanwhile, on the webhook server)
/hook/zerto-pre -> Slack/Teams notification ("Test failover starting...")
/hook/zerto-post -> Slack/Teams notification + ping/port probe each VM,
write a JSON report to disk, exit non-zero on failure.
```
## What ZVMA exposes
Captured from a real Test failover; same set is present in pre and post:
| Variable | Example | Notes |
|---|---|---|
| `ZertoVPGName` | `ubuntu-2404-local` | The VPG that fired the script |
| `ZertoInternalVpgName` | `ubuntu-2404-local` | Usually identical to `ZertoVPGName` |
| `ZertoOperation` | `Test` | `Test` / `Failover` / `Move` / `FailoverBeforeCommit` / `FailoverDuringCommit` |
| `ZertoForce` | `Yes` (pre) / `No` (post) | Set to `Yes` only during the pre phase when force mode is on; reset to `No` by post |
| `VmDisplayNames` | `ubuntu-2404(1)(1)(1)` | Comma-separated for multi-VM VPGs; Test failovers add `(N)` suffixes |
| `ZertoHypervisorManagerIP` | `192.168.50.20` | The vCenter / Hyper-V manager ZVMA is talking to |
| `ZertoHypervisorManagerPort` | `443` | |
| `ZertoOutputDir` | `/app/scripts-output` | Container-side output dir (written back to ZVMA via PVC) |
| `ZertoWorkingDir` | `/app/scripts-files` | Where script files live in-container |
Branch on `ZertoOperation` to differentiate Test runs from real failovers.
**`ZertoForce` is only meaningful during the pre phase** — capture it there
if you need it later, because by post it's been reset.
## 1. The Zerto-side script (sender)
A ready-to-use script ships in this repo at
[`scripts/examples/zerto-zvma-send.ps1`](../../scripts/examples/zerto-zvma-send.ps1).
Place it where the `scripts-service` pod can read it — typically the
`scripts-service-scripts-files-pvc`, mounted at `/app/scripts-files/` — and
wire it into the VPG twice:
> **VPG settings → Recovery → Scripts → Pre-Recovery Script**
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
> Parameters: `-Phase pre`
>
> **VPG settings → Recovery → Scripts → Post-Recovery Script**
> Path: `/app/scripts-files/zerto-zvma-send.ps1`
> Parameters: `-Phase post`
The default `$WebhookUrl` includes `{phase}` so one script + one URL config
serves both phases — `http://webhook.dr/hook/zerto-{phase}` becomes
`/hook/zerto-pre` and `/hook/zerto-post` automatically. Override with
`-WebhookUrl` and `-Bearer` if you'd rather pass them per-VPG.
The script POSTs a single JSON object:
```json
{
"phase": "pre",
"capturedAt": "2026-05-08T17:45:54Z",
"host": "scripts-service-f9b6cb7-4xbxq",
"zerto": {
"vpgName": "ubuntu-2404-local",
"internalVpgName": "ubuntu-2404-local",
"operation": "Test",
"force": "Yes",
"vmDisplayNames": "ubuntu-2404(1)(1)(1)",
"hypervisorManagerIP": "192.168.50.20",
"hypervisorManagerPort": "443",
"outputDir": "/app/scripts-output",
"workingDir": "/app/scripts-files"
}
}
```
A webhook outage **does not fail the VPG** — the script catches and exits 0.
Comment in the file shows how to flip that to strict mode if you'd rather a
webhook outage abort the failover.
## 2. The webhook-server-side scripts (receivers)
Two examples ship in the repo. Both read the JSON body from stdin (the
webhook server delivers the body to the script's stdin when **JSON body to
stdin** is ticked on the endpoint).
### a. Slack/Teams notification — both phases
[`scripts/examples/zerto-receiver-notify.ps1`](../../scripts/examples/zerto-receiver-notify.ps1)
posts a single-line summary to a Slack or Teams Incoming Webhook URL. It
picks an icon based on `ZertoOperation`:
- `Test` → 🧪 — benign, expected
- `Failover` → 🚨 — real production event
- `Move` → 🚚 — planned migration
…and highlights `ZertoForce=Yes` on the **pre** message so you can see at
a glance whether the operation was force-flagged.
Set the destination via `NOTIFY_URL` env var on the webhook host, or
hardcode at the top of the script.
### b. Post-recovery VM health check — post phase only
[`scripts/examples/zerto-receiver-vm-healthcheck.ps1`](../../scripts/examples/zerto-receiver-vm-healthcheck.ps1)
runs only on `phase=post` for operations that bring VMs up
(`Test`/`Failover`/`Move`/`FailoverBeforeCommit`/`FailoverDuringCommit`).
For each name in `VmDisplayNames` it:
1. Strips the trailing `(1)(1)(1)` suffix Zerto adds on Test failovers, so
DNS resolution targets the actual hostname.
2. Pings (`Test-Connection`).
3. Probes a configurable TCP port (`-ProbePort`, default `3389` for RDP;
use `22` for SSH or `443` for the web tier).
4. Writes a JSON report to
`C:\ProgramData\WebhookServer\zerto-healthchecks\<vpg>-<op>-<utcstamp>.json`.
5. Exits non-zero if any VM failed either probe — which surfaces in the
webhook server's run history (and outbound callback, if configured).
Bump the endpoint's **Timeout (sec)** to `120` when wiring this in, since
network probes can take a while.
## 3. Configure the endpoints in the GUI
Two endpoints. Identical except for the slug, the script, and (for the
healthcheck) the timeout.
### `zerto-pre`
| Section | Setting | Value |
|---|---|---|
| Identity | Slug | `zerto-pre` |
| Identity | Description | "Zerto pre-recovery: chat notification" |
| Auth | Mode | **Bearer** |
| Auth | Bearer secret | generate a 32-byte random string; reuse for `zerto-post` |
| Allowed clients | (one per line) | the IP of the K8s node running `scripts-service` (e.g. `192.168.50.30`) |
| Executor | Type | **Windows PowerShell** (or PowerShell 7) |
| Executor | Script path | `C:\scripts\zerto-receiver-notify.ps1` |
| Data passing | JSON body to stdin | ✓ |
| Run as | Identity | **Service** |
| Response | Mode | **Async** |
| Response | Timeout (sec) | `30` |
| Response | Fail on non-zero exit | unticked *(async hooks have no caller to receive a 502)* |
### `zerto-post`
Same as above, except:
| Setting | Value |
|---|---|
| Slug | `zerto-post` |
| Description | "Zerto post-recovery: notify + VM health check" |
| Script path | a **wrapper** that calls both receiver scripts in turn (see below) |
| Timeout (sec) | `120` |
Two receivers on one endpoint is easiest with a tiny wrapper that fans
stdin out to both scripts:
```powershell
# C:\scripts\zerto-post-fanout.ps1
$body = [Console]::In.ReadToEnd()
$body | & 'C:\scripts\zerto-receiver-notify.ps1'
$body | & 'C:\scripts\zerto-receiver-vm-healthcheck.ps1'
```
Or run the two as separate endpoints (`zerto-post-notify` and
`zerto-post-healthcheck`) and have the Zerto-side script POST to both —
either pattern is fine. The fanout wrapper keeps the Zerto config simpler.
## 4. Wire up the bearer token
On the ZVMA / scripts-service side, the easiest place to put the token is
a Kubernetes Secret mounted into the pod, but the simplest approach for
testing is to pass it as a parameter to the Zerto-side script:
> VPG settings → Pre-Recovery Script → Parameters:
> `-Phase pre -Bearer <paste-token>`
>
> VPG settings → Post-Recovery Script → Parameters:
> `-Phase post -Bearer <paste-token>`
For production, mount a Secret at a known path in the pod and have the
sender script read from it (`Get-Content /run/secrets/webhook-token`).
## 5. Test before going live
Run a Test failover on a non-critical VPG. Watch:
- **Slack/Teams**: a `:test_tube: Zerto Test - phase: pre` message arrives,
followed ~30sseveral minutes later by a `:test_tube: Zerto Test - phase:
post` message.
- **Webhook Server GUI** → run history: two runs for `zerto-pre` /
`zerto-post`, both green.
- **`C:\ProgramData\WebhookServer\zerto-healthchecks\`**: a fresh JSON
report named `<vpg>-Test-<utcstamp>.json` containing per-VM ping and port
probe results.
- **ZVMA**: the VPG operation completes successfully; nothing in the
pre/post logs blocked on the webhook.
## Variations
### Branch on Test vs. real failover in the receivers
The notifier already styles the message differently. To do something only
on a real failover (e.g. update DNS), guard with:
```powershell
if ($p.zerto.operation -ne 'Test') {
# do the destructive thing
}
```
A `ZertoOperation` of `Test` means "exercise — don't touch production
dependencies." Always check it before doing anything that mutates real
state.
### Capture `ZertoForce` from pre for use in post
`ZertoForce` is `Yes` only during the **pre** phase when force mode is on
and is reset to `No` by the **post** phase. If your post-side logic needs
to know the operation was force-flagged, save it during pre (e.g. write a
small marker to the shared `ZertoOutputDir`) and read it back during post.
### Per-VPG endpoints
For fine-grained access control or different actions per VPG, create one
endpoint per VPG (`zerto-pre-app01`, `zerto-post-app01`, …) with its own
bearer token. Override `-WebhookUrl` and `-Bearer` on the Zerto side per
VPG.
### Audit trail
Every endpoint can have an outbound **Callback** URL. Configure with your
SIEM's HTTP collector + an HMAC secret, and every run produces a JSON
record with runId, exit code, duration, stdout, and stderr — convenient
for compliance.
## Security note
The ZVMA `scripts-service` pod runs your scripts inside a Linux container
with broad reach into the management cluster — anything your script does
runs with whatever ServiceAccount that pod uses. Treat the script content
as privileged and make sure pre/post script edit rights are restricted to
trusted operators. If you're unfamiliar with the pod's RBAC posture, check
`Get-ChildItem Env:` from inside the container and look at
`/var/run/secrets/kubernetes.io/serviceaccount/` — that token is what your
scripts (and a malicious script) can use to talk to the K8s API.