Files
webhook-server/docs/troubleshooting.md
T
justin f00ee0cf3a v0.1.2: Config Checkpoints dialog, descriptions, daily auto-snapshot, docs (#3)
* Documentation: install/upgrade/uninstall guides + recipes incl. Zerto

Adds a docs/ folder under the repo root with full operator documentation
aimed at sysadmins (not webhook developers). The Zerto pre/post script
recipe is the canonical "why does this exist" walkthrough; the GitHub
HMAC, AD password reset, and UI-on-desktop recipes round out common
patterns.

Pages:
- README.md (index)
- concepts.md (5-minute "what is a webhook" explainer)
- installation.md (interactive + silent install)
- upgrading.md (single-click upgrade flow + edge cases)
- uninstalling.md (clean removal + wiping ProgramData)
- runas-modes.md (Service / InteractiveUser / SpecificUser decision flow)
- service-account-and-ad.md (gMSA setup, delegated rights)
- network-and-security.md (bind addresses, allowlists, HTTPS, secret storage)
- troubleshooting.md (symptom -> first check, common errors)
- recipes/zerto-pre-post-scripts.md (canonical use case)
- recipes/github-style-hmac.md (GitHub / Stripe-shaped webhooks)
- recipes/ad-password-reset.md (gMSA-backed self-service reset)
- recipes/ui-on-desktop.md (InteractiveUser pattern)

Top-level README.md restructured to point at docs/ as the source of
truth, dropping the duplicated installation snippets.

Installer ships docs/ alongside the binaries so they're available
offline at C:\Program Files\WebhookServer\docs\. GUI Help menu gains
a "Documentation" item that opens the docs site in a browser.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Config Checkpoints dialog + daily auto-checkpoint; drop installer GUI launch

Three fixes:

1. Config Checkpoints submenu replaced with a proper dialog. Lists
   checkpoints with timestamp/size/filename, has a "Take Checkpoint
   Now" button, and a "Roll Back" button that becomes enabled when a
   row is selected. The previous click-a-menu-entry-immediate-restore
   flow was too easy to fire by accident.

2. New CheckpointScheduler BackgroundService creates a checkpoint at
   midnight every day. Combined with the existing auto-on-save
   snapshots, this guarantees a daily rollback point even if the
   config wasn't edited that day. A new "create-checkpoint" admin op
   plus AdminPipeServer.CreateCheckpoint helper does the actual file
   copy; both manual (via the dialog) and the scheduler use it.

3. Installer: drop the post-install "Launch Webhook Server" wizard
   step. It tried to launch the GUI un-elevated, which fails because
   the GUI's manifest is requireAdministrator. The Start Menu shortcut
   handles elevation correctly, so the user can launch from there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Docs: replace AD-reset recipe with realistic Zerto failover walkthrough

The AD password reset endpoint was a poor fit for what people actually
need this server for. Replaced with a realistic Zerto post-failover
example that's much closer to the project's purpose:

- Update DNS A records for failed-over hostnames
- Wait for the VM to come up at the DR site
- PowerShell-remote into the VM and check / start critical services
- Notify Teams with the result

The flagship pattern is now: Zerto post-script (curl, fire-and-forget)
calls an Async webhook endpoint -> 202 in milliseconds -> Zerto's
failover sequence is never blocked. The server runs the actual work in
the background, with full output captured in the daily log.

A ready-to-use Zerto-side script ships at
scripts/examples/zerto-post-failover.ps1 - pure curl.exe (no
PowerShell modules), reads the bearer token from a file the ZVM
service account can read.

The installer now bundles scripts/examples/ alongside docs/ so the
example is also available locally at
C:\Program Files\WebhookServer\scripts\examples\.

Removed: docs/recipes/ad-password-reset.md.
Updated: docs/README.md, README.md, the recipe content itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Restore installer GUI launch (via shellexec) + checkpoint descriptions

Two follow-ups to the previous Config Checkpoints commit:

1. Bring back the post-install "Launch Webhook Server" checkbox in the
   installer. The previous attempt failed because Inno Setup's
   postinstall flag launches via CreateProcess after Setup exits,
   bypassing the GUI's requireAdministrator manifest. Adding the
   shellexec flag switches to ShellExecute, which DOES honor the
   manifest and triggers a clean UAC prompt - so the post-install
   GUI launch works as expected.

2. Each checkpoint now carries a description, stored in a sidecar
   .meta.json file next to the snapshot. Defaults:
     - Auto-on-save: "Before save"
     - Midnight scheduler: "Nightly auto-checkpoint"
     - Manual: opens a small dialog so the user can type a meaningful
       description (defaults to "Manual checkpoint" if blank)
   The dialog and pruning both clean up sidecars alongside snapshots.
   The Config Checkpoints grid grows a Description column between
   When and Size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.1.2: bump checkpoint retention 30 -> 90

Each checkpoint is a few KB of JSON plus a tiny sidecar; even at 90
entries on a config with hundreds of endpoints the on-disk footprint
is negligible (worst case ~20 MB). With daily auto-checkpoints plus
on-save snapshots, 30 entries could fill in a couple weeks of
moderate use; 90 gives a comfortable ~3-month window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:49:09 -04:00

137 lines
6.8 KiB
Markdown

# Troubleshooting
This page indexes the most common ways things go wrong, where to look, and what to do.
## Where to look first
| Symptom | First check |
|---|---|
| GUI shows "Disconnected" | Service running? `Get-Service WebhookServer` |
| Hook returns 404 | Slug typo, or endpoint disabled |
| Hook returns 401 | Auth header / signature mismatch |
| Hook returns 403 | IP allowlist denies the caller |
| Hook returns 200 but nothing happens | Response is the script's stdout — check exit code, stderr |
| Hook returns 502 | Script ran and exited non-zero. Body has stderr. |
| Hook returns 500 | Launch error (script not found, invalid path) |
| Hook hangs | Timeout reached, or script is waiting on stdin |
| Calc / UI doesn't appear despite InteractiveUser | See [Run As modes](runas-modes.md) — common pitfalls |
## Where the logs are
`C:\ProgramData\WebhookServer\logs\webhook-YYYYMMDD.log` — daily rolling, 14-day retention by default.
Every webhook run logs:
- `[INF] Run <id> <slug> ok exit=0 dur=<ms>ms stdout=...` on success
- `[WRN] Run <id> <slug> non-zero exit=<n> dur=<ms>ms stdout=... stderr=...` on script failure
- `[WRN] Run <id> <slug> failed to launch: <reason>` on launch failure
- `[WRN] Run <id> <slug> timed out after <s>s; process killed` on timeout
The GUI's bottom panel auto-refreshes the same log file every 3 seconds. Tick the **Auto-scroll** checkbox to keep it pinned to the latest line.
## Common issues
### "Disconnected: Access to the path is denied" right after install
You launched the GUI without elevation. The admin pipe ACL is `SYSTEM` + `Administrators`-full-control; UAC token splitting denies the standard token.
**Fix in v0.1.1+**: nothing — the GUI's manifest is `requireAdministrator` and Start Menu / shortcut launches auto-elevate.
**Fix in v0.1.0**: right-click the Start Menu shortcut → **Run as administrator**, or upgrade.
### "Connection refused" hitting the hook URL
Three possibilities, in order of probability:
1. **Service stopped.** `Get-Service WebhookServer` and `Start-Service WebhookServer` if needed.
2. **Wrong port.** Default is 8080. Check **Server → Settings → HTTP port** in the GUI, or `netstat -an | findstr :8080`.
3. **Bound to a specific NIC and you're calling on another.** Check **Server → Settings → Listen on**. If "Listen on all interfaces" is unchecked and you only ticked LAN IPs, calls to `localhost` may fail. Tick `127.0.0.1` too.
### Hook works from `localhost` but not from another machine on the LAN
Windows Firewall. The installer doesn't add a firewall rule (intentional — you should choose your scope). Add one:
```powershell
# from elevated PowerShell on the webhook host
New-NetFirewallRule -DisplayName "Webhook Server HTTP 8080" -Direction Inbound `
-Action Allow -Protocol TCP -LocalPort 8080 -Profile Domain,Private
```
Use `-Profile Public` only if you really mean it. Better: front the server with a reverse proxy and don't expose 8080 directly.
### `[WRN] Run … failed to launch: launch error: An error occurred trying to start process 'X'. Access is denied.`
Likely **SpecificUser mode + `psi.UserName`** failure. Should be impossible in v0.1.1+ (we use `LogonUser` + `CreateProcessAsUser` directly). If you see this on v0.1.1, double-check the version: `Get-Item "C:\Program Files\WebhookServer\WebhookServer.Service.exe" | % VersionInfo`.
### `[WRN] Run … failed to launch: LogonUser (DOMAIN\user) failed`
The credentials don't authenticate. Common causes:
- Typo in the password (paste it back into the GUI to verify; the field is plaintext for an admin user)
- Account locked / disabled / expired
- The account is denied the right logon types — check `secpol.msc` → Local Policies → User Rights Assignment → "Deny logon as a batch job" / "Deny logon locally"
- For domain accounts: the host can't reach a DC
### `non-zero exit=-1073741502` (`0xC0000142` STATUS_DLL_INIT_FAILED)
The new process couldn't initialize. With **InteractiveUser** mode this means we tried to open `winsta0\default` and the user's session token doesn't have access (e.g., no one's logged in). With **SpecificUser** this should not occur in v0.1.1+ — we deliberately don't set lpDesktop for that mode.
### Hook returns 502 with empty stdout/stderr
The script's exit was non-zero but it didn't print anything. PowerShell's `$ErrorActionPreference = 'Stop'` is your friend — turn it on at the top of the script and any cmdlet failure becomes terminating with a clear message in stderr.
### "ServiceState: ListenerSettingsChanged" → service restart
After saving Server Settings with a port or HTTPS change, the service stops itself so the SCM restarts it on the new bindings. The GUI briefly shows "Disconnected" then reconnects. If it doesn't reconnect within ~10 seconds:
```powershell
Get-Service WebhookServer | Format-List Status, StartType
```
If the service is in `Stopped`, the SCM didn't restart it (failure-recovery only kicks in on *abnormal* termination, and a clean stop doesn't qualify). Manual:
```powershell
Start-Service WebhookServer
```
### GUI editor changes don't seem to take effect
After saving an endpoint, the service loads the new config in memory immediately — no restart needed. If a hook is mid-run when you save, that run finishes against the OLD config; the new config applies to subsequent runs.
If the GUI's grid still shows old values, hit any other endpoint or wait for the 3-second poll to refresh the display.
### Tray icon doesn't appear
Check whether the GUI is running: `Get-Process WebhookServer.Gui`. If not, the tray icon doesn't exist (it's part of the GUI process). To have a persistent tray independent of the main window, leave the GUI running and minimize it — it'll hide-to-tray rather than truly close.
To run the GUI minimized at login: create a Windows shortcut to `WebhookServer.Gui.exe`, set "Run" to "Minimized" in the shortcut properties, and put it in your user's Startup folder (`shell:startup`). The auto-elevate manifest still takes effect.
## Getting useful logs from a script
Inside your hook scripts, write to stderr for diagnostic info — Webhook Server logs stderr separately from stdout, and stderr is preserved even on success:
```powershell
[Console]::Error.WriteLine("processing item $i of $total")
```
Or use `Write-Error` which produces non-fatal errors:
```powershell
Write-Error "skipping bogus input" # stderr but doesn't terminate
```
The full stderr appears in the log line for the run, plus in the response body for sync calls.
## Asking for help
If you're stuck, file an issue at:
> https://github.com/recklessop/webhook-server/issues
Include:
- Webhook Server version (Help → About, or the file version of the `.exe`)
- Windows version (`winver`)
- The slug + relevant bits of the endpoint config (NOT the secrets)
- The log lines for the failing run (search for the runId)
- What you expected vs. what happened