fix(plugins): sync all 171 plugin SKILL.md files with fixed skills/ versions

Propagates Anti-Patterns sections, description rewrites, Required Inputs additions, and Quality Checks format fixes from skills/ to matching plugin SKILL.md copies. https://claude.ai/code/session_01MuGKn3a3Gbqoe8uM5Lmuqt
2026-06-08 13:06:21 +00:00
parent fb85a1cb55
commit affae033fe
171 changed files with 1428 additions and 56 deletions
@@ -1,6 +1,6 @@
 ---
 name: ai-ethics-review
-description: "Conduct an ethical review of an AI or ML feature, model, or product. Use when asked to run an AI ethics review, assess AI risks, audit a model for bias, or produce an AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with prioritised mitigations."
+description: "Conduct a structured ethical review of an AI or ML feature, model, or product. Use when preparing to deploy an AI system, assessing algorithmic risk, auditing a model for bias, or producing a responsible AI impact assessment. Produces a structured ethics review covering fairness, transparency, privacy, safety, accountability, and societal impact with a risk tier score, pre-deployment checklist, and prioritised mitigations."
 ---

 # AI Ethics Review Skill
@@ -198,6 +198,14 @@ Ask the user for these if not provided:
 - [ ] Accountability section names real people, not teams or roles
 - [ ] Mitigations are specific and time-bound — not "monitor and review"

+## Anti-Patterns
+
+- [ ] Do not limit the affected-population analysis to users of the product — AI that makes decisions about people (hiring, credit, content moderation) affects non-users who have no opt-out
+- [ ] Do not accept "we will monitor" as a mitigation without specifying what is monitored, at what threshold, and who acts
+- [ ] Do not assign fairness analysis to the model team alone — protected characteristic analysis requires input from legal, HR, or a subject-matter expert
+- [ ] Do not defer the DPIA to post-launch — for high-risk tier systems, a DPIA is a pre-requisite for lawful deployment under GDPR
+- [ ] Do not conflate statistical accuracy with fairness — a model can be 95% accurate overall while performing significantly worse for a protected group
+
 ## Example Trigger Phrases

 - "Run an AI ethics review for [feature]"
@@ -152,6 +152,14 @@ Ask the user for these if not provided:
 - **Available data** (what training/inference data exists)
 - **ML/AI lead** (who owns the technical implementation)

+## Anti-Patterns
+
+- [ ] Do not skip the "Why AI?" question — if the answer is "we want to use AI," stop and reframe around the user problem first
+- [ ] Do not launch with an undefined accuracy threshold — "good enough" is not a threshold; set a number before build begins
+- [ ] Do not design the UX to hide AI-generated output as if it were system truth — users need to know when AI is involved so they can override it
+- [ ] Do not defer the Responsible AI checklist to post-launch — bias and privacy issues are far harder to fix in production than in design
+- [ ] Do not treat model latency as a post-launch optimisation — a 6-second AI response that replaces a 1-second rule-based response is a regression, not a feature
+
 ## Quality Checks

 - [ ] "Why AI?" is answered clearly (not "because we can")
@@ -73,3 +73,11 @@ Ask the user for these if not provided:
 - [ ] Success criteria are measurable or observable (not "looks good")
 - [ ] Out-of-scope section names at least one thing that might seem in scope but isn't
 - [ ] Technical constraints are specific enough for an engineer to confirm
+
+## Anti-Patterns
+
+- [ ] Do not write the user goal in feature language ("design the checkout flow") — it must be written from the user's perspective with a motivation and outcome
+- [ ] Do not skip the "Explicitly Out of Scope" section — without it, designers will inadvertently solve problems not intended for this iteration
+- [ ] Do not list edge cases that are so generic they apply to any feature (e.g. "handle errors") — each edge case must be specific to this feature's failure modes
+- [ ] Do not hand off the brief without confirming engineering constraints are accurate — a constraint that is wrong is worse than no constraint
+- [ ] Do not omit the emotional context of the user — designs without emotional grounding produce technically correct but experientially flat results
@@ -67,3 +67,11 @@ Ask the user for these if not provided:
 - [ ] Test was not stopped early (or flagged clearly if it was)
 - [ ] Practical significance assessed separately from statistical significance
 - [ ] Sample ratio mismatch is checked in results interpretation
+
+## Anti-Patterns
+
+- [ ] Do not define success criteria after seeing preliminary results — post-hoc success definitions are HARKing (Hypothesising After Results are Known) and invalidate the experiment
+- [ ] Do not stop a test early because the result looks significant — early stopping dramatically inflates false positive rates; the test must run to the planned sample size
+- [ ] Do not treat statistical significance as the same as practical significance — a p < 0.05 result with a 0.1% lift is real but may not be worth shipping
+- [ ] Do not run the same experiment on the same population multiple times without correction — multiple testing inflates the chance of a false positive proportionally
+- [ ] Do not use more than one primary metric — multiple primary metrics require multiple hypothesis corrections and make the ship/kill decision ambiguous
@@ -1,6 +1,6 @@
 ---
 name: multi-source-signal-synthesiser
-description: "Synthesise user signals from multiple research sources into a unified insight brief, reconciling conflicting feedback. Use when asked to make sense of data from multiple sources, synthesise user research, reconcile conflicting feedback, or when the user says 'what are users really telling us' or 'make sense of all this user data'. Produces ranked insights with confidence ratings, divergent signal analysis, and research gap identification."
+description: "Synthesises user signals from multiple research sources into a unified, weighted insight brief. Use when you have data from interviews, support tickets, NPS verbatims, app reviews, or sales calls and need to reconcile contradictions, surface the underlying need behind requests, or answer 'what are users really telling us'. Produces ranked insights with confidence ratings, source weighting rationale, divergent signal analysis by user segment, and a research gap identification section."
 ---

 # Multi-Source Signal Synthesiser Skill
@@ -60,3 +60,11 @@ Ask the user for these if not provided:
 - [ ] Divergent signals identify the specific user segments, not just "some users disagree"
 - [ ] Confidence ratings are consistent with source diversity and weighting
 - [ ] "What the data does NOT tell us" section is honest about gaps
+
+## Anti-Patterns
+
+- [ ] Do not echo surface-level feature requests as insights — translate every request to the underlying need before including it as a finding
+- [ ] Do not assign High confidence to insights supported by only one source type — confidence requires corroboration across at least two distinct source types
+- [ ] Do not treat all sources as equally weighted — a single interview quote and a pattern across 200 support tickets are not comparable signals
+- [ ] Do not collapse divergent signals into a single finding — where user segments have genuinely different needs, name the segments explicitly rather than averaging them away
+- [ ] Do not omit the research gap section when key decisions rest on thin data — acting on low-confidence findings without flagging the gaps misleads product teams