b6e642da04
The AD password reset endpoint was a poor fit for what people actually need this server for. Replaced with a realistic Zerto post-failover example that's much closer to the project's purpose: - Update DNS A records for failed-over hostnames - Wait for the VM to come up at the DR site - PowerShell-remote into the VM and check / start critical services - Notify Teams with the result The flagship pattern is now: Zerto post-script (curl, fire-and-forget) calls an Async webhook endpoint -> 202 in milliseconds -> Zerto's failover sequence is never blocked. The server runs the actual work in the background, with full output captured in the daily log. A ready-to-use Zerto-side script ships at scripts/examples/zerto-post-failover.ps1 - pure curl.exe (no PowerShell modules), reads the bearer token from a file the ZVM service account can read. The installer now bundles scripts/examples/ alongside docs/ so the example is also available locally at C:\Program Files\WebhookServer\scripts\examples\. Removed: docs/recipes/ad-password-reset.md. Updated: docs/README.md, README.md, the recipe content itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
244 lines
11 KiB
Markdown
244 lines
11 KiB
Markdown
# Recipe: Zerto failover post-script → DNS update + service checks
|
|
|
|
This is the canonical reason Webhook Server exists.
|
|
|
|
When Zerto fails a VM over from production to DR, the VM boots fine — but **the things around it** often need attention: DNS records still point at the production IP, dependent services need to be checked, on-call needs a heads-up. Zerto pre/post scripts run on the **Zerto Virtual Manager**, not on a domain controller and not necessarily with admin rights to the things that need fixing. So you want a single webhook URL that the post-script hits, and a Windows host on the DR side that does the actual work with the right identity.
|
|
|
|
## What we're building
|
|
|
|
Zerto's post-recovery script (a one-shot PowerShell file pointing at curl) calls `http://webhook.dr.contoso.local:8080/hook/post-failover` with a JSON body identifying the VPG and operation. The Webhook Server, running on a DR-side Windows host as a gMSA with delegated AD/DNS rights, runs PowerShell that:
|
|
|
|
1. Updates DNS A records to point the failed-over hostnames at their DR IPs
|
|
2. Waits for the failed-over VM to come up (ping + WinRM probe)
|
|
3. Connects to the VM via PowerShell remoting and starts/checks critical services
|
|
4. Sends a Teams notification with the result
|
|
|
|
The endpoint is **Async** so the Zerto script returns in milliseconds — no risk of timing out Zerto's failover sequence even if the actions take minutes. The script's full output ends up in the webhook log and (optionally) in an outbound callback.
|
|
|
|
## Why curl and not Invoke-WebRequest?
|
|
|
|
Zerto's PowerShell runner is intentionally minimal — many environments run an older Windows on the ZVM and don't have full PowerShell modules installed. `curl.exe` ships with Windows 10 1803+ and Server 2019+ and works without any modules. Plus, calling an HTTP endpoint with `curl.exe` doesn't depend on the version of `Invoke-WebRequest` shipped with the host's PowerShell.
|
|
|
|
## 1. The Zerto post-script (client side)
|
|
|
|
A ready-to-use script ships in this repo at [`scripts/examples/zerto-post-failover.ps1`](../../scripts/examples/zerto-post-failover.ps1). Copy it to the ZVM, edit `$WebhookUrl` and the bearer-token path at the top, and wire it into the VPG:
|
|
|
|
> **VPG settings → Recovery → Scripts → Post-Recovery Script**
|
|
> Path: `C:\Scripts\zerto-post-failover.ps1`
|
|
> Parameters: *(leave empty)*
|
|
|
|
The script is ~50 lines and only depends on `curl.exe` + a token file readable by the ZVM service account.
|
|
|
|
The flow:
|
|
|
|
```
|
|
Zerto VPG failover starts
|
|
|
|
|
+-- VM is brought up at DR site
|
|
|
|
|
+-- Zerto post-script fires:
|
|
| curl POST http://webhook.dr/hook/post-failover (async, returns 202 in ~50ms)
|
|
|
|
|
+-- Zerto sees success, finishes the failover and reports done
|
|
|
|
|
(meanwhile, on the webhook server)
|
|
|
|
|
running PowerShell for several minutes:
|
|
- update DNS
|
|
- wait for VM ready
|
|
- check services on VM
|
|
- notify Teams
|
|
```
|
|
|
|
## 2. The server-side script (does the actual work)
|
|
|
|
Save this on the webhook host as `C:\Scripts\post-failover-handler.ps1`:
|
|
|
|
```powershell
|
|
[CmdletBinding()]
|
|
param()
|
|
$ErrorActionPreference = 'Stop'
|
|
|
|
$body = $input | ConvertFrom-Json
|
|
|
|
# ---------- environment specifics; edit for your site ----------
|
|
$dnsServer = 'dc01.contoso.local'
|
|
$forwardZone = 'contoso.local'
|
|
$teamsWebhook = 'https://contoso.webhook.office.com/...'
|
|
$drIpMap = @{
|
|
'app01' = '10.42.10.11'
|
|
'app02' = '10.42.10.12'
|
|
'db01' = '10.42.10.21'
|
|
}
|
|
$serviceMap = @{
|
|
'app01' = @('W3SVC','MyAppSvc')
|
|
'app02' = @('W3SVC','MyAppSvc')
|
|
'db01' = @('MSSQLSERVER','SQLAgent')
|
|
}
|
|
# ---------------------------------------------------------------
|
|
|
|
# Default the VM list to "all VMs we know about" if the post-script didn't
|
|
# tell us, so the same handler works without having to embed the VM list in
|
|
# every Zerto post-script.
|
|
$vms = if ($body.vms) { $body.vms } else { $drIpMap.Keys }
|
|
|
|
$summary = @()
|
|
|
|
foreach ($vm in $vms) {
|
|
if (-not $drIpMap.ContainsKey($vm)) {
|
|
$summary += "skip $vm (no DR IP mapping in handler)"
|
|
continue
|
|
}
|
|
$ip = $drIpMap[$vm]
|
|
|
|
# 1. DNS - delete + re-add the A record
|
|
try {
|
|
$existing = Get-DnsServerResourceRecord -ZoneName $forwardZone -Name $vm `
|
|
-RRType A -ComputerName $dnsServer -ErrorAction SilentlyContinue
|
|
if ($existing) {
|
|
Remove-DnsServerResourceRecord -ZoneName $forwardZone -Name $vm `
|
|
-RRType A -RecordData $existing.RecordData.IPv4Address `
|
|
-ComputerName $dnsServer -Force
|
|
}
|
|
Add-DnsServerResourceRecordA -ZoneName $forwardZone -Name $vm `
|
|
-IPv4Address $ip -ComputerName $dnsServer -TimeToLive 00:05:00
|
|
$summary += "dns $vm -> $ip"
|
|
} catch {
|
|
$summary += "DNS! $vm $($_.Exception.Message)"
|
|
continue
|
|
}
|
|
|
|
# 2. Wait for the VM to be reachable (up to 5 minutes)
|
|
$deadline = (Get-Date).AddMinutes(5)
|
|
$reachable = $false
|
|
while ((Get-Date) -lt $deadline) {
|
|
if (Test-Connection -ComputerName $ip -Count 1 -Quiet -ErrorAction SilentlyContinue) {
|
|
try {
|
|
# Quick WinRM probe; succeeds when the VM has finished booting
|
|
Invoke-Command -ComputerName $ip -ScriptBlock { $true } -ErrorAction Stop | Out-Null
|
|
$reachable = $true
|
|
break
|
|
} catch { Start-Sleep -Seconds 10 }
|
|
} else {
|
|
Start-Sleep -Seconds 10
|
|
}
|
|
}
|
|
if (-not $reachable) {
|
|
$summary += "wait! $vm not reachable after 5 minutes"
|
|
continue
|
|
}
|
|
|
|
# 3. Check + start critical services on the VM
|
|
if ($serviceMap.ContainsKey($vm)) {
|
|
$svcReport = Invoke-Command -ComputerName $ip -ArgumentList @(,$serviceMap[$vm]) -ScriptBlock {
|
|
param($services)
|
|
$report = @()
|
|
foreach ($s in $services) {
|
|
$svc = Get-Service -Name $s -ErrorAction SilentlyContinue
|
|
if (-not $svc) { $report += "$s : missing"; continue }
|
|
if ($svc.Status -ne 'Running') {
|
|
Start-Service $s
|
|
Start-Sleep -Seconds 2
|
|
$svc.Refresh()
|
|
}
|
|
$report += "$s : $($svc.Status)"
|
|
}
|
|
return $report
|
|
}
|
|
$summary += "svc $vm : $($svcReport -join ', ')"
|
|
} else {
|
|
$summary += "svc $vm (no services configured)"
|
|
}
|
|
}
|
|
|
|
# 4. Notify Teams
|
|
$teamsBody = @{
|
|
text = "Webhook post-failover for VPG **$($body.vpg)**:`n" + ($summary -join "`n")
|
|
} | ConvertTo-Json
|
|
try {
|
|
Invoke-RestMethod -Uri $teamsWebhook -Method POST -ContentType 'application/json' -Body $teamsBody | Out-Null
|
|
} catch {
|
|
$summary += "teams! notification failed: $($_.Exception.Message)"
|
|
}
|
|
|
|
# Return the summary so it shows up in the webhook log + outbound callback
|
|
$summary -join "`n"
|
|
```
|
|
|
|
Two things to call out:
|
|
|
|
- **PowerShell remoting to the VM** uses the gMSA's network identity (or whoever the service runs as). Make sure the gMSA / service account can `Invoke-Command` to the failed-over hosts — usually that means the account is a local admin on the target VMs, or you've configured constrained delegation.
|
|
- **WinRM** must be enabled on the failed-over VMs for the remoting calls to work. `Enable-PSRemoting` is the simplest, but most prod environments configure WinRM via Group Policy.
|
|
|
|
## 3. Configure the endpoint in the GUI
|
|
|
|
**File → New endpoint:**
|
|
|
|
| Section | Setting | Value |
|
|
|---|---|---|
|
|
| Identity | Slug | `post-failover` |
|
|
| Identity | Description | "Zerto post-recovery: DNS + service checks" |
|
|
| Auth | Mode | **Bearer** |
|
|
| Auth | Bearer secret | generate a 32-byte random string; copy it for the Zerto script's token file |
|
|
| Allowed clients | (one per line) | `10.0.0.0/8` *(your ZVM's network)* |
|
|
| Executor | Type | **Windows PowerShell** |
|
|
| Executor | Script path | `C:\Scripts\post-failover-handler.ps1` |
|
|
| Data passing | JSON body to stdin | ✓ |
|
|
| Run as | Identity | **Service** if the service runs under a gMSA with the right rights, otherwise **SpecificUser** with a delegated account |
|
|
| Response | Mode | **Async** ← critical: this is what makes the Zerto script non-blocking |
|
|
| Response | Timeout (sec) | `600` *(this is the cap on the long-running handler script, not the Zerto-facing response)* |
|
|
| Response | Fail on non-zero exit | unticked *(async hooks have no caller to receive a 502)* |
|
|
|
|
Save. Right-click the row → **Copy URL** to grab `http://webhook.dr.contoso.local:8080/hook/post-failover` and paste it into `$WebhookUrl` at the top of the Zerto-side script.
|
|
|
|
> **Why Bearer instead of HMAC?** Both work. Bearer is simpler — drop the token in a file on the ZVM that's readable by the ZVM service account and you're done. HMAC requires the Zerto-side script to compute a signature, which is doable but adds a few lines of code. Pick what fits your environment.
|
|
|
|
## 4. Wire up the bearer token
|
|
|
|
Place the bearer token in a file the ZVM service account can read (and nobody else):
|
|
|
|
```powershell
|
|
# on the ZVM, from elevated PowerShell
|
|
$token = (New-Guid).ToString('N') # or paste the value from the GUI
|
|
$tokenPath = 'C:\ProgramData\Zerto\webhook-token.txt'
|
|
$token | Out-File -LiteralPath $tokenPath -Encoding utf8 -NoNewline
|
|
icacls $tokenPath /inheritance:r /grant 'NT SERVICE\Zerto Online Services:R' 'BUILTIN\Administrators:F' /T
|
|
```
|
|
|
|
Adjust the service principal name to whatever Zerto runs as on your version. The script reads from this path automatically; no change needed in the script itself.
|
|
|
|
## 5. Test before going live
|
|
|
|
In a maintenance window, fire the webhook by hand:
|
|
|
|
```powershell
|
|
# from any machine that can reach the webhook server
|
|
$body = @{
|
|
operation = 'test'
|
|
vpg = 'SmokeTest'
|
|
timestamp = (Get-Date).ToUniversalTime().ToString('o')
|
|
} | ConvertTo-Json -Compress
|
|
|
|
curl.exe --silent --show-error --max-time 10 -X POST `
|
|
-H "Authorization: Bearer paste-the-token" `
|
|
-H "Content-Type: application/json" `
|
|
-d $body `
|
|
http://webhook.dr.contoso.local:8080/hook/post-failover
|
|
```
|
|
|
|
You'll get back `{"runId":"…","accepted":true}` immediately. Open the Webhook Server GUI and watch the log panel — within 30 seconds or so you'll see lines for the run. Confirm DNS records updated, services on each VM ended in `Running`, and the Teams notification arrived.
|
|
|
|
## Variations
|
|
|
|
### Different actions for failover vs. failback
|
|
|
|
Pass an `operation` field in the body and branch on it. The Zerto-side script already sends `operation = 'failover'`. Add a separate post-failback script (or detect from `$env:ZertoOperationType`) that sends `operation = 'failback'` and have the handler revert DNS to production IPs.
|
|
|
|
### Per-VPG endpoints
|
|
|
|
If you want fine-grained access control or different actions per VPG, create one endpoint per VPG (`post-failover-app`, `post-failover-db`, …) and give each its own bearer token. The GUI handles dozens of endpoints fine.
|
|
|
|
### Audit trail to a SIEM
|
|
|
|
Each endpoint can have an outbound **Callback** URL. Configure it with your SIEM's HTTP collector + an HMAC secret, and every run produces a JSON record with runId, exit code, duration, stdout, and stderr — perfect for compliance.
|