Merge pull request #6 from mohitagw15856/feat/v7-engineering-skills

feat: v7.0.0 — 6 new engineering skills, star milestone tracker, SKILL_REQUEST.md
This commit is contained in:
mohitagw15856
2026-04-23 15:24:27 +01:00
committed by GitHub
16 changed files with 1447 additions and 26 deletions
+4 -4
View File
@@ -1,8 +1,8 @@
{
"$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
"name": "pm-claude-skills",
"version": "6.0.0",
"description": "100 Claude Skills across 14 professions — product management, legal, finance, HR, sales, engineering, design, Figma, operations, research, and more. Now with standardised quality checks, required inputs, and 7 new skills.",
"version": "7.0.0",
"description": "106 Claude Skills across 15 professions — product management, engineering, legal, finance, HR, sales, design, Figma, operations, research, and more. Includes 6 new engineering skills: debugging, PR descriptions, system design, changelogs, test strategy, and runbooks.",
"owner": {
"name": "Mohit Aggarwal",
"email": "mohit15856@gmail.com"
@@ -82,8 +82,8 @@
},
{
"name": "pm-engineering",
"description": "Engineering & tech skills: Code Review Checklist, Incident Postmortem, API Docs Writer, Architecture Decision Record. Structured outputs for engineering teams and technical PMs.",
"version": "1.1.0",
"description": "Engineering & tech skills: Code Review Checklist, Incident Postmortem, API Docs Writer, Architecture Decision Record, Debugging Log Analyser, PR Description Writer, System Design Interview, Changelog Generator, Test Strategy Doc, Runbook Writer. 10 structured skills for engineering teams, SREs, and technical PMs.",
"version": "2.0.0",
"category": "productivity",
"source": "./plugins/pm-engineering",
"homepage": "https://github.com/mohitagw15856/pm-claude-skills"
+67 -19
View File
@@ -1,10 +1,16 @@
# 🧠 Claude Skills Library — 100 Skills for Every Profession
# 🧠 Claude Skills Library — 106 Skills for Every Profession
> **Save 810 hours per week across 15 professions. Install in 2 minutes. Now with 100 skills and comprehensive quality improvements across every skill.**
[![Stars](https://img.shields.io/github/stars/mohitagw15856/pm-claude-skills?style=social)](https://github.com/mohitagw15856/pm-claude-skills/stargazers)
[![Skills](https://img.shields.io/badge/skills-106-blue)](https://github.com/mohitagw15856/pm-claude-skills)
[![Version](https://img.shields.io/badge/version-7.0.0-brightgreen)](https://github.com/mohitagw15856/pm-claude-skills/releases)
[![Install](https://img.shields.io/badge/Install%20in%20Claude%20Code-2%20minutes-orange)](https://github.com/mohitagw15856/pm-claude-skills#-quick-install-2-minutes)
[![License](https://img.shields.io/badge/license-MIT-lightgrey)](LICENSE)
A community-built library of Claude Skills covering product management, marketing, engineering, data, design, Figma, leadership, legal, finance, HR, sales, operations, research, education, and more. Each skill is a structured SKILL.md file that teaches Claude how to produce professional-grade outputs for your specific workflows.
> **Save 810 hours per week across 15 professions. Install in 2 minutes. Now with 106 skills including 6 new engineering skills.**
**🆕 Latest release (v6.0.0):** 100 skills milestone — 7 new skills added, plus quality improvements across all 93 existing skills (standardised descriptions, Required Inputs sections, and Quality Checks on every skill).
A community-built library of Claude Skills covering product management, engineering, marketing, data, design, Figma, leadership, legal, finance, HR, sales, operations, research, education, and more. Each skill is a structured SKILL.md file that teaches Claude how to produce professional-grade outputs for your specific workflows.
**🆕 Latest release (v7.0.0):** 6 new engineering skills added — Debugging Log Analyser, PR Description Writer, System Design Interview, Changelog Generator, Test Strategy Doc, and Runbook Writer. The `pm-engineering` bundle now has 10 skills.
---
@@ -19,6 +25,7 @@ Or install by profession:
claude plugin install pm-essentials@pm-claude-skills # Core PM + Word tracked changes
claude plugin install pm-delivery@pm-claude-skills # Delivery + PowerPoint auditor
claude plugin install pm-engineering@pm-claude-skills # Engineering + DevOps (10 skills) 🆕
claude plugin install pm-data@pm-claude-skills # Data + chart data extractor
claude plugin install pm-legal@pm-claude-skills # Legal
claude plugin install pm-finance@pm-claude-skills # Finance
@@ -39,9 +46,40 @@ ln -s ~/pm-claude-skills/skills/* ~/.claude/skills/
---
## 🆕 What's New in v6.0.0 — 100 Skills Milestone
## 🎬 See It in Action
**7 new skills added:**
**Debugging Log Analyser** — paste a stack trace or error log, get a structured root cause diagnosis with probable cause, affected code path, a specific fix, and next debugging steps.
**PR Description Writer** — share your diff or commit list, get a reviewer-friendly PR description with summary, changes made, testing steps, and reviewer notes.
**Sprint Planning Skill** — paste your sprint goals and backlog items, get a complete structured sprint plan with capacity, commitments, risks, and a day-one kickoff agenda.
> 📹 Drop a demo in [Discussions](../../discussions) and we'll feature it here.
---
## 🆕 What's New in v7.0.0 — Engineering Skills Expansion
**6 new engineering skills added to `pm-engineering`:**
| Skill | Bundle | What It Does |
|---|---|---|
| **Debugging Log Analyser** 🆕 | pm-engineering | Parse stack traces and error logs into a structured root cause diagnosis with a specific fix |
| **PR Description Writer** 🆕 | pm-engineering | Write reviewer-friendly PR descriptions from a diff, commit list, or change summary |
| **System Design Interview** 🆕 | pm-engineering | Structure complete system design answers with capacity estimates, component deep-dives, and trade-offs |
| **Changelog Generator** 🆕 | pm-engineering | Convert git commits into a polished, user-facing changelog following Keep a Changelog format |
| **Test Strategy Doc** 🆕 | pm-engineering | Write a complete test strategy with risk assessment, test types, coverage targets, and P0/P1 test cases |
| **Runbook Writer** 🆕 | pm-engineering | Write operational runbooks for deployments, incidents, and maintenance with exact commands and rollback steps |
The `pm-engineering` bundle now has **10 skills** — the most complete engineering toolkit in the library.
**Read the full story:** [Part 14 — I Rebuilt All 93 Skills and Added 7 More: What 100 Skills Taught Me About What Makes a Great Skill](https://medium.com/product-powerhouse/a-pull-request-made-me-rebuild-all-93-of-my-claude-skills-then-i-added-7-more-16d5fe3e7f85)
---
## 📖 v6.0.0 — 100 Skills Milestone
**7 skills added:**
| Skill | Bundle | What It Does |
|---|---|---|
@@ -53,13 +91,6 @@ ln -s ~/pm-claude-skills/skills/* ~/.claude/skills/
| **Sales Forecasting Model** | pm-sales | Pipeline-based forecast with stage model, scenario analysis, assumption log, and activity sanity check |
| **Tax Planning Checklist** | pm-finance | Year-end tax planning review framework across income, pension, CGT, business reliefs, and ISAs |
**Quality improvements across all 93 existing skills:**
- Every skill now has a standardised `description` field using the "Verb the thing. Use when X. Produces Y." format
- Every skill now has a `Required Inputs` section prompting Claude to ask for missing information before executing
- Every skill now has a `Quality Checks` section with specific checkboxes Claude verifies before delivering output
**Read the full story:** [Part 14 — I Rebuilt All 93 Skills and Added 7 More: What 100 Skills Taught Me About What Makes a Great Skill](https://medium.com/product-powerhouse/a-pull-request-made-me-rebuild-all-93-of-my-claude-skills-then-i-added-7-more-16d5fe3e7f85)
---
## 📚 The Article Series
@@ -85,7 +116,7 @@ This repo was built alongside a published article series. Read the full story:
---
## 🗂️ All 100 Skills
## 🗂️ All 106 Skills
### 🛠️ Product Management (Skills 134)
**Bundles:** `pm-essentials` · `pm-discovery` · `pm-planning` · `pm-delivery` · `pm-analytics` · `pm-strategy` · `pm-advanced` · `pm-rituals`
@@ -120,7 +151,7 @@ This repo was built alongside a published article series. Read the full story:
---
### 👩‍💻 Engineering & Tech (Skills 4144)
### 👩‍💻 Engineering & Tech (Skills 4150)
**Bundle:** `pm-engineering`
| # | Skill | Folder | What It Does |
@@ -129,6 +160,12 @@ This repo was built alongside a published article series. Read the full story:
| 42 | **Incident Postmortem** | `skills/incident-postmortem/` | Blameless postmortems with timeline, RCA, impact, and action items |
| 43 | **API Docs Writer** | `skills/api-docs-writer/` | Developer-facing API docs: endpoints, parameters, response schemas, code examples |
| 44 | **Architecture Decision Record** | `skills/architecture-decision-record/` | ADRs with context, options considered, decision, consequences, and risks |
| 45 | **Debugging Log Analyser** 🆕 | `skills/debugging-log-analyser/` | Parse stack traces and error logs into a structured root cause diagnosis with a specific fix |
| 46 | **PR Description Writer** 🆕 | `skills/pr-description-writer/` | Write reviewer-friendly PR descriptions from a diff, commit list, or change summary |
| 47 | **System Design Interview** 🆕 | `skills/system-design-interview/` | Structure complete system design answers with capacity estimates, component deep-dives, and trade-offs |
| 48 | **Changelog Generator** 🆕 | `skills/changelog-generator/` | Convert git commits into a polished, user-facing changelog following Keep a Changelog format |
| 49 | **Test Strategy Doc** 🆕 | `skills/test-strategy-doc/` | Write a complete test strategy with risk assessment, test types, coverage targets, and P0/P1 test cases |
| 50 | **Runbook Writer** 🆕 | `skills/runbook-writer/` | Write operational runbooks for deployments, incidents, and maintenance with exact commands and rollback steps |
---
@@ -329,7 +366,7 @@ description: "One sentence. Use when [trigger condition]. Produces [output descr
| `pitch-deck-feedback` | Startup | Investor-perspective critique of a pitch deck |
| `board-minutes` | Governance | Formal board meeting minutes from discussion notes |
Have a skill idea? [Open an issue](../../issues) or raise it in [Discussions](../../discussions).
Have a skill idea? Add it to [SKILL_REQUEST.md](SKILL_REQUEST.md), [open an issue](../../issues), or raise it in [Discussions](../../discussions). Most-voted requests get built first.
**Contributors** get credited in this README and in the article series. 🙌
@@ -352,7 +389,7 @@ claude plugin install pm-strategy@pm-claude-skills
claude plugin install pm-advanced@pm-claude-skills
claude plugin install pm-rituals@pm-claude-skills
claude plugin install pm-gtm@pm-claude-skills
claude plugin install pm-engineering@pm-claude-skills
claude plugin install pm-engineering@pm-claude-skills # 10 engineering skills 🆕
claude plugin install pm-data@pm-claude-skills
claude plugin install pm-people@pm-claude-skills
claude plugin install pm-design@pm-claude-skills
@@ -410,9 +447,20 @@ Learn more: [Anthropic's Skills documentation](https://code.claude.com/docs/en/s
---
## ⭐ If this is useful
## ⭐ Star Milestones
Star the repo so others can find it. And if you build something with these skills — raise a PR, open a discussion, or tag me in your article.
Stars unlock the next wave of skills. Here's the roadmap:
| Milestone | Unlocks | Status |
|---|---|---|
| 100 ⭐ | 10 Figma skills + quality rebuild across all 93 skills | ✅ Shipped (v6.0.0) |
| 250 ⭐ | 10 Customer Success skills (health scorecard, QBR deck, escalation brief, churn analysis) | 🔒 Locked |
| 500 ⭐ | 25 more Engineering skills (CI/CD playbooks, SLO templates, onboarding docs, debugging patterns) | 🔒 Locked |
| 1000 ⭐ | Full Startup Founder kit (fundraising memo, pitch critique, co-founder equity split) | 🔒 Locked |
**[⭐ Star this repo to unlock the next milestone →](https://github.com/mohitagw15856/pm-claude-skills)**
Want a specific skill built? [Vote or request in SKILL_REQUEST.md](SKILL_REQUEST.md).
---
+65
View File
@@ -0,0 +1,65 @@
# Skill Requests — Community Voting Board
Have an idea for a skill? Add it here or upvote existing requests by leaving a 👍 reaction on the issue.
---
## How to Request a Skill
1. [Open an issue](../../issues/new) with the label `skill-request`
2. Include:
- **Skill name** (what you'd call it)
- **Profession** (who uses this)
- **Trigger** (when would you invoke it — e.g. "when I need to write X")
- **Output** (what should Claude produce)
3. Drop a 👍 on existing requests you'd use — most-voted get built first
---
## Milestone Unlocks
Stars drive the roadmap. Here's what's queued:
| Milestone | Unlocks |
|---|---|
| ✅ 100 ⭐ | 10 Figma skills + quality rebuild across all skills (v6.0.0) |
| 🔒 250 ⭐ | 10 Customer Success skills (health scorecard, QBR deck, escalation brief, churn analysis) |
| 🔒 500 ⭐ | 25 more Engineering skills (CI/CD playbooks, debugging deep-dives, onboarding docs, SLO templates) |
| 🔒 1000 ⭐ | Full Startup Founder kit (fundraising memo, pitch critique, co-founder agreement framework) |
**[Star this repo →](https://github.com/mohitagw15856/pm-claude-skills)**
---
## Requested Skills (Open)
Add a request by opening an issue — these are current top asks from the community:
| Skill | Profession | Requested By | Votes |
|---|---|---|---|
| `customer-health-scorecard` | Customer Success | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `qbr-deck-writer` | Customer Success | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `escalation-brief` | Customer Success | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `fundraising-memo` | Startup / Founder | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `youtube-script-writer` | Content Creator | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `newsletter-issue-writer` | Content Creator | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `analytics-event-taxonomy` | Data / Analytics | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `kpi-tree-builder` | Data / Analytics | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `dissertation-chapter-planner` | Academic | [@mohitagw15856](https://github.com/mohitagw15856) | — |
| `board-minutes` | Governance | Community | — |
> **To vote:** React with 👍 on the linked issue. To add a new request, open an issue with label `skill-request`.
---
## Recently Shipped
| Version | Skills Added |
|---|---|
| v7.0.0 | Debugging Log Analyser, PR Description Writer, System Design Interview, Changelog Generator, Test Strategy Doc, Runbook Writer |
| v6.0.0 | Teaching Lesson Plan, SEO Content Brief, Media Pitch, Change Management Plan, Workshop Facilitation Guide, Sales Forecasting Model, Tax Planning Checklist |
| v5.0.0 | 10 Figma skills |
---
*Maintained by [Mohit Aggarwal](https://github.com/mohitagw15856)*
@@ -1,13 +1,13 @@
{
"$schema": "https://anthropic.com/claude-code/plugin.schema.json",
"name": "pm-engineering",
"version": "1.0.0",
"description": "Engineering & tech skills: Code Review Checklist, Incident Postmortem, API Docs Writer, Architecture Decision Record. Structured outputs for engineering teams and technical PMs.",
"version": "2.0.0",
"description": "Engineering & tech skills: Code Review Checklist, Incident Postmortem, API Docs Writer, Architecture Decision Record, Debugging Log Analyser, PR Description Writer, System Design Interview, Changelog Generator, Test Strategy Doc, Runbook Writer. 10 structured skills for engineering teams and technical PMs.",
"author": {
"name": "Mohit Aggarwal",
"email": "mohit15856@gmail.com"
},
"homepage": "https://github.com/mohitagw15856/pm-claude-skills",
"license": "MIT",
"keywords": ["product-management", "engineering", "code-review", "incident-postmortem", "api-documentation", "adr", "architecture"]
"keywords": ["product-management", "engineering", "code-review", "incident-postmortem", "api-documentation", "adr", "architecture", "debugging", "pull-request", "system-design", "changelog", "test-strategy", "runbook", "devops"]
}
@@ -0,0 +1,82 @@
---
name: changelog-generator
description: "Convert a git log, commit list, or release notes into a polished, user-facing changelog. Use when writing release notes, generating a CHANGELOG.md entry, or documenting what changed in a version. Produces a structured changelog section with version header, categorised changes, and migration notes. Optimised for Opus 4.7 and newer models."
---
# Changelog Generator Skill
Converts raw git commits, a diff summary, or developer release notes into a polished changelog entry — categorised, user-facing, and following Keep a Changelog conventions.
## Required Inputs
Ask for these if not provided:
- **Commits or release notes** (paste `git log --oneline`, raw commit messages, or a description of what changed)
- **Version number** (e.g. 2.4.0, v1.0.0-beta.2)
- **Release date** (or "today")
- **Audience** (developers using an API / end users of a product / internal team — affects language)
- **Any breaking changes** (flag these explicitly if known)
## Output Structure
Follow [Keep a Changelog](https://keepachangelog.com) format:
---
## [X.Y.Z] — YYYY-MM-DD
### Breaking Changes ⚠️
[Only include if there are breaking changes]
- **[Breaking change]:** [What changed and what it breaks]
- **Migration required:** [Specific action the user must take]
### Added
- [New feature or capability, written from the user's perspective]
- [Another addition]
### Changed
- [Changed behaviour — what it did before vs. what it does now]
- [Performance improvement with measurable impact if known]
### Fixed
- [Bug fixed — describe what was broken, not the fix implementation]
- [Another fix]
### Deprecated
- [Deprecated thing] — use [replacement] instead. Will be removed in [version].
### Removed
- [Removed thing] — was deprecated in [version]
### Security
- [Security fix — describe the vulnerability class, not exploit details]
---
## Formatting Rules Applied
**Language:** Write for the reader, not the committer. "Add dark mode support" not "implement ThemeProvider with dark palette variant".
**Breaking changes:** Always call these out first with ⚠️. Include a migration path.
**Bug fixes:** Describe what was broken, not what was changed. "Fix crash when user has no profile picture" not "null-check avatar URL before rendering".
**Granularity:** Group related commits into one line. Don't list every micro-commit separately.
**Tone:** Active voice, imperative mood. "Add", "Fix", "Remove" — not "Added", "Fixed", "Removed".
**Empty sections:** Omit any section with no entries. Don't include empty `### Fixed` blocks.
## Quality Checks
- [ ] Breaking changes are at the top with migration instructions
- [ ] All entries are user-facing language (no internal variable names or implementation details)
- [ ] Related commits are grouped into single entries (not listed individually)
- [ ] Version and date header is correct
- [ ] Empty sections are omitted
- [ ] Tone is imperative mood throughout
## Example Trigger Phrases
- "Write a changelog for version [X]" + [paste commits]
- "Generate release notes from these commits"
- "Turn this git log into a CHANGELOG entry"
- "Write the CHANGELOG.md update for this release"
- "What changed in this release?" + [paste commit list]
@@ -0,0 +1,79 @@
---
name: debugging-log-analyser
description: "Parse error logs, stack traces, and crash reports into a structured root cause diagnosis. Use when sharing a log, stack trace, error output, or crash dump. Produces a structured diagnosis with probable root cause, affected code path, suggested fix, and next debugging steps. Optimised for Opus 4.7 and newer models."
---
# Debugging Log Analyser Skill
Parses raw error logs, stack traces, and crash reports into a structured diagnosis with probable root cause, affected code path, and specific next steps — no hand-waving.
## Required Inputs
Ask for these if not provided:
- **The log / stack trace / error output** (paste directly or describe the error)
- **Language and framework** (e.g. Node.js + Express, Python + Django, Java Spring, Go)
- **Context** (what the user was doing when the error occurred)
- **Environment** (local dev / staging / production)
- **What they've already tried** (if anything)
## Output Structure
### 1. Error Classification
**Error type:** [Runtime exception / Build error / Config error / Network error / Memory/resource error / Unknown]
**Severity:** [Fatal / Critical / Warning / Informational]
**Recurrence pattern:** [One-off / Intermittent / Consistent / On-startup / Under load]
### 2. Stack Trace Analysis
Walk the stack frame by frame, starting from the origin:
- **Origin frame:** [File, line, function where it started]
- **Propagation path:** [How it travelled through the call stack]
- **Crash point:** [Where it ultimately threw/panicked/exited]
For each significant frame, note whether it is:
- User code (fixable here)
- Framework/library code (usually a misuse issue)
- System/runtime code (usually a config or environment issue)
### 3. Root Cause Assessment
**Probable root cause:** [12 sentence plain English statement]
**Confidence:** [High / Medium / Low — and why]
**Alternative causes to rule out:** [If confidence is not high]
### 4. Affected Code Path
**Entry point:** [Where the triggering call began]
**Key function(s) involved:** [Specific functions/methods named in the trace]
**Data that triggered it:** [If inferable from the log — e.g. null value, malformed JSON]
### 5. Suggested Fix
Provide a concrete, code-level suggestion:
- What to change (the minimal fix)
- Why this fixes the root cause
- Any trade-offs or risks in the fix
- A short code snippet if helpful
### 6. Next Debugging Steps
If the root cause is uncertain, provide an ordered list of 35 specific debugging actions:
1. [Specific thing to check — file, log line, config value]
2. [Specific reproduction step or isolation test]
3. [Specific tool command — e.g. `strace`, `pprof`, `--verbose`, add logging at X]
### 7. Prevention
One or two concrete things that would prevent this class of error recurring:
- Better input validation at [point]
- Add monitoring/alerting for [condition]
- Test that covers [scenario]
## Quality Checks
- [ ] Root cause is specific (not "there might be a null pointer issue")
- [ ] At least one concrete code-level fix is suggested
- [ ] Next steps are actionable commands, not vague advice
- [ ] Language-specific idioms are used correctly
- [ ] Prevention is proactive (not just "add error handling")
## Example Trigger Phrases
- "Why is this crashing?" + [paste log]
- "Can you analyse this stack trace?"
- "I'm getting this error, what does it mean?"
- "Debug this log for me"
- "What's causing this exception?"
@@ -0,0 +1,87 @@
---
name: pr-description-writer
description: "Write a clear, structured pull request description from a git diff, branch summary, or commit list. Use when asked to write a PR description, draft a pull request, or document code changes. Produces a description with summary, motivation, changes made, testing steps, and reviewer guidance. Optimised for Opus 4.7 and newer models."
---
# PR Description Writer Skill
Writes structured, reviewer-friendly pull request descriptions from a diff, commit list, or informal notes. Covers the what, why, and how-to-review so reviewers can start immediately.
## Required Inputs
Ask for these if not provided:
- **What changed** (paste a git diff, `git log --oneline`, or describe the changes in plain English)
- **Why it was changed** (the problem being solved or feature being added)
- **How to test it** (any specific steps a reviewer needs to verify it works)
- **Risk level** (low / medium / high — affects how much reviewer guidance to include)
- **PR type** (feature / bug fix / refactor / dependency upgrade / config change / hotfix)
## Output Structure
### Title
A clear, imperative-mood title under 72 characters:
`[type]: [concise description of what changed]`
Examples:
- `feat: add rate limiting to the public API`
- `fix: resolve race condition in session expiry`
- `refactor: extract payment logic into PaymentService`
### Summary
23 sentences covering:
- What this PR does (the change)
- Why it was needed (the problem or goal)
- The approach taken (at a high level)
### Changes Made
Bullet list of specific changes — one bullet per logical change, not per file:
- Added [X] to handle [Y]
- Refactored [A] to reduce [B]
- Removed [C] as it was replaced by [D]
- Updated [E] to fix [F]
### Screenshots / Demo
[If UI change: include before/after screenshots or a screen recording]
[If API change: include example request/response]
[If no visual change: this section can be omitted]
### How to Test
Step-by-step instructions a reviewer can follow:
1. [Setup step if needed]
2. [Action to take]
3. [What to verify]
4. [Edge case to check]
Include any specific commands, test data, or environment flags needed.
### Testing Checklist
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Edge cases covered
- [ ] Manual testing completed
- [ ] No regressions in existing tests
### Reviewer Notes
Flag anything that warrants extra attention:
- Areas of uncertainty where a second opinion is welcome
- Deliberate trade-offs made (and why)
- Out-of-scope items noticed but not addressed
- Dependencies on other PRs (link them)
### Related
- Closes #[issue number] (if applicable)
- Related to #[PR/issue number]
## Quality Checks
- [ ] Title is imperative mood and under 72 characters
- [ ] Summary explains what AND why (not just what)
- [ ] Changes list describes logical changes (not file-by-file changes)
- [ ] Testing steps are reproducible by someone unfamiliar with the code
- [ ] Risk-appropriate reviewer guidance is included
## Example Trigger Phrases
- "Write a PR description for these changes" + [paste diff or description]
- "Draft a pull request for [feature]"
- "I need a PR description — here's what I changed"
- "Summarise these commits into a PR description"
- "Write the PR body for this branch"
@@ -0,0 +1,144 @@
---
name: runbook-writer
description: "Write an operational runbook for a service, incident type, or deployment procedure. Use when asked to write a runbook, create an ops guide, document an operational procedure, or prepare an incident response playbook. Produces a runbook with overview, prerequisites, step-by-step procedures, rollback steps, troubleshooting table, and escalation paths. Optimised for Opus 4.7 and newer models."
---
# Runbook Writer Skill
Produces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure.
## Required Inputs
Ask for these if not provided:
- **What the runbook is for** (e.g. deploying the payment service, responding to a database failover, rotating API keys)
- **Runbook type** (Deployment / Incident Response / Maintenance / Disaster Recovery)
- **System/service name and what it does** (brief description)
- **Audience** (new on-call engineers / experienced SREs / DevOps team)
- **Tech stack** (where relevant — e.g. Kubernetes, AWS RDS, Node.js)
## Output Structure
---
**Runbook:** [Runbook Title]
**Service:** [Service Name]
**Type:** [Deployment / Incident Response / Maintenance / DR]
**Last Updated:** [Date]
**Owner:** [Team or person]
**Severity:** [P1 / P2 / P3 — if incident-type]
---
### Overview
**What this runbook covers:**
[12 sentences on the scenario this runbook handles]
**When to use this runbook:**
- [Specific trigger condition 1 — e.g. PagerDuty alert: `high-error-rate-payment-service`]
- [Specific trigger condition 2 — e.g. Deploy needed after PR merged to `main`]
**Estimated time to complete:** [X minutes / XY minutes depending on outcome]
**Impact if not completed correctly:** [e.g. Payment processing degraded / Data loss risk / Users locked out]
---
### Prerequisites
**Access required:**
- [ ] [System/tool access — e.g. AWS Console: `production-account`]
- [ ] [Credential — e.g. `vault read secret/payment-service`]
- [ ] [VPN / bastion access if needed]
**Tools required:**
- [ ] [Tool name and version — e.g. `kubectl` v1.28+]
- [ ] [CLI or dashboard name]
**Before you start:**
- [ ] [Prerequisite check — e.g. Verify current deployment is healthy in Grafana]
- [ ] [Prerequisite action — e.g. Announce in `#ops-live` that you're starting]
---
### Procedure
Number every step. Use exact commands. Do not paraphrase tool names or flags.
**Step 1: [Action name]**
[What you're doing and why — one sentence]
```bash
# Exact command
[command here]
```
**Expected output:** `[what should appear if this worked]`
**If this fails:** [Exact error message to look for] → [What to do, or see Troubleshooting]
**Step 2: [Action name]**
[Same structure as Step 1]
**Step 3: Verify**
Always include a verification step after the main procedure:
```bash
[verification command]
```
**Expected state:** [What a healthy system looks like after this runbook completes]
---
### Rollback
How to undo this procedure if something went wrong:
**Step R1: [Rollback action]**
```bash
[rollback command]
```
**Verify rollback:** `[command to confirm rollback succeeded]`
---
### Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| [Error message or observable symptom] | [Why this happens] | [Exact fix or next step] |
| [Another symptom] | [Cause] | [Resolution] |
---
### Escalation
If this runbook does not resolve the issue:
| Condition | Who to Contact | How |
|---|---|---|
| [e.g. DB unavailable after 10 min] | [DBA on-call] | [PagerDuty policy: `db-oncall`] |
| [e.g. Payment provider unresponsive] | [Vendor contact] | [Contact in 1Password: `vendor-escalation`] |
**Always update the incident timeline in [tool] before escalating.**
---
### Post-Procedure Checklist
After completing the runbook:
- [ ] Announce completion in `#ops-live` with outcome
- [ ] Update the incident ticket / deploy log
- [ ] Verify alerts have resolved in monitoring dashboard
- [ ] If this revealed a gap in this runbook — update it now (link to edit process)
---
## Quality Checks
- [ ] Every step has an exact command (no "run the deploy script")
- [ ] Expected output is specified for each step so engineer knows if it worked
- [ ] Failure path is explicit for each step (not "if it fails, investigate")
- [ ] Rollback procedure is complete and independently testable
- [ ] Escalation paths name specific contacts, not just team names
- [ ] Runbook can be followed by someone who has never touched this system
## Example Trigger Phrases
- "Write a runbook for [service] deployment"
- "Create an incident response runbook for [alert type]"
- "I need a runbook for [procedure]"
- "Document the operational procedure for [X]"
- "Write an ops playbook for [scenario]"
@@ -0,0 +1,135 @@
---
name: system-design-interview
description: "Structure a complete system design answer for interview questions or real architecture sessions. Use when asked to design a system, answer a system design interview question, or architect a solution at scale. Produces a structured answer covering requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations. Optimised for Opus 4.7 and newer models."
---
# System Design Interview Skill
Structures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions.
## Required Inputs
Ask for these if not provided:
- **The system to design** (e.g. "design a URL shortener", "design a notification service", "design Twitter's feed")
- **Scope** (interview prep / real architecture decision / practice run)
- **Scale target** (rough numbers: DAU, requests/sec, data volume — or "assume typical web scale")
- **Constraints or priorities** (e.g. prioritise availability over consistency, minimise cost, low-latency reads)
## Output Structure
### 1. Clarifying Questions
Before designing, list 46 questions that would change the design. Examples:
- Read-heavy or write-heavy? (affects caching and DB choice)
- Global or single-region? (affects latency requirements)
- Strong or eventual consistency? (affects storage and replication)
- Acceptable latency targets? (p50 / p99)
- Any existing infrastructure constraints?
Then proceed with stated assumptions if answering an interview question.
### 2. Functional Requirements
**Core features (must have):**
- [Feature 1]
- [Feature 2]
- [Feature 3]
**Out of scope (for this design):**
- [What's deliberately excluded and why]
### 3. Non-Functional Requirements
| Requirement | Target |
|---|---|
| Availability | [e.g. 99.9% / 99.99%] |
| Latency | [e.g. p95 < 100ms for reads] |
| Throughput | [e.g. 10k writes/sec peak] |
| Consistency | [Strong / Eventual] |
| Durability | [e.g. 99.999% — no data loss] |
### 4. Capacity Estimation
**Traffic:**
- DAU: [X]
- Reads/sec: [X] (peak: [X])
- Writes/sec: [X] (peak: [X])
**Storage:**
- Per record size: [X bytes]
- Records per day: [X]
- 5-year storage: [X GB/TB]
**Bandwidth:**
- Inbound: [X MB/s]
- Outbound: [X MB/s]
### 5. High-Level Architecture
```
[Client] → [CDN/Edge] → [Load Balancer] → [API Servers] → [Cache] → [DB]
→ [Message Queue] → [Workers]
```
Describe each layer in 12 sentences explaining its role and technology choice.
### 6. Component Deep-Dive
Pick the 23 most critical/interesting components and go deep:
**[Component 1: e.g. Database Layer]**
- Choice: [Technology and why — e.g. PostgreSQL for ACID guarantees, Cassandra for write throughput]
- Schema design (high-level): [Key tables/collections and their structure]
- Indexing strategy: [What gets indexed and why]
- Replication: [Primary-replica / Multi-primary — and why]
**[Component 2: e.g. Caching Strategy]**
- Cache type: [Redis / Memcached — and why]
- What gets cached: [Hot data — e.g. user sessions, frequent reads]
- Cache invalidation: [TTL / Write-through / Write-behind — trade-offs]
- Cache hit rate target: [e.g. 95%]
**[Component 3: e.g. API Design]**
- Key endpoints: [List the 35 most important API calls]
- Authentication: [JWT / OAuth / API keys]
- Rate limiting: [Where and at what rate]
### 7. Data Flow
Walk through the two most critical paths end-to-end:
**Write path:** [Step 1 → Step 2 → Step 3...]
**Read path:** [Step 1 → Step 2 → Step 3...]
### 8. Scaling Bottlenecks and Mitigations
| Bottleneck | Mitigation |
|---|---|
| [e.g. DB write throughput] | [e.g. sharding by user_id, write batching] |
| [e.g. Hot-key cache misses] | [e.g. local in-process cache, probabilistic early expiry] |
| [e.g. Single region latency] | [e.g. multi-region deployment, GeoDNS routing] |
### 9. Trade-offs and Alternatives
Be explicit about what was chosen and what was sacrificed:
| Decision | Why | Trade-off |
|---|---|---|
| [e.g. Eventual consistency] | [Higher availability, lower latency] | [Stale reads possible] |
| [e.g. SQL over NoSQL] | [Complex queries, ACID transactions] | [Harder to shard horizontally] |
| [e.g. Async processing via queue] | [Decoupled, more resilient] | [Eventual delivery, harder to debug] |
### 10. Follow-up Considerations
Things to tackle in production but out of scope for this design session:
- Monitoring and alerting (what metrics matter)
- Disaster recovery and backup strategy
- Security (auth, encryption at rest/transit, rate limiting)
- Cost optimisation at scale
- Gradual rollout and feature flagging
## Quality Checks
- [ ] Clarifying questions are design-changing (not generic filler)
- [ ] Capacity estimates use real numbers (not just "it scales")
- [ ] At least 2 component deep-dives with technology choices justified
- [ ] Trade-offs section is honest (not just benefits of chosen approach)
- [ ] Data flow is described end-to-end for the critical path
## Example Trigger Phrases
- "Help me answer a system design interview: [question]"
- "Design [system] for a system design interview"
- "How would I architect [system] at scale?"
- "I have a system design interview — the question is [X]"
- "Design a [URL shortener / chat system / notification service / feed]"
@@ -0,0 +1,127 @@
---
name: test-strategy-doc
description: "Write a test strategy document from a feature spec, PRD, or system description. Use when asked to create a test plan, write a test strategy, define QA approach, or plan testing for a feature or release. Produces a complete test strategy with scope, risk assessment, test types, coverage targets, and a prioritised test case outline. Optimised for Opus 4.7 and newer models."
---
# Test Strategy Document Skill
Produces a complete test strategy from a feature spec, PRD, or system description — covering scope, test types, risk areas, coverage requirements, and a prioritised test case outline.
## Required Inputs
Ask for these if not provided:
- **Feature or system being tested** (paste a spec, PRD, or describe it in plain English)
- **Tech stack** (language, framework, testing tools already in use if known)
- **Risk level** (low / medium / high / critical — affects depth and coverage requirements)
- **Timeline** (when does this need to ship — affects prioritisation)
- **Team context** (who is doing the testing — developers / dedicated QA / both)
## Output Structure
### 1. Test Scope
**In scope:**
- [Specific functionality being tested]
- [Integration points covered]
- [User-facing flows included]
**Out of scope:**
- [What is deliberately not tested here — and why]
- [Dependencies owned by other teams]
**Assumptions:**
- [What the test strategy assumes is true — e.g. mocked services, test data availability]
### 2. Risk Assessment
Identify the highest-risk areas first — these drive depth and coverage:
| Area | Risk Level | Why | Test Priority |
|---|---|---|---|
| [e.g. Payment processing] | High | Money movement, regulatory | P0 — exhaustive |
| [e.g. User authentication] | High | Security boundary | P0 — exhaustive |
| [e.g. Email notifications] | Medium | External dependency | P1 — happy path + key failures |
| [e.g. UI copy changes] | Low | Visual only, reversible | P2 — smoke only |
### 3. Test Types and Coverage
**Unit Tests**
- **What:** Individual functions and methods in isolation
- **Who writes:** Developer
- **Coverage target:** [e.g. 80% line coverage on new code / 100% on critical paths]
- **Tools:** [e.g. Jest, pytest, go test]
- **Focus areas for this feature:** [Specific logic that needs unit coverage]
**Integration Tests**
- **What:** Service interactions, database operations, API contracts
- **Who writes:** Developer / QA
- **Coverage target:** [All happy paths + key failure modes]
- **Tools:** [e.g. Supertest, pytest + testcontainers]
- **Focus areas:** [Specific integrations at risk — e.g. third-party API, DB schema changes]
**End-to-End Tests**
- **What:** Critical user journeys from browser/client to database
- **Who writes:** QA / Developer
- **Coverage target:** [Top N user journeys — list them]
- **Tools:** [e.g. Playwright, Cypress, Selenium]
- **Focus areas:** [The 35 most critical user flows]
**Performance Tests** *(include only if risk is medium+)*
- **What:** Load, stress, or latency testing
- **Targets:** [Specific numbers — e.g. 200 req/sec at p95 < 200ms]
- **Tools:** [e.g. k6, Locust, JMeter]
**Security Tests** *(include only if risk is high+)*
- **What:** OWASP Top 10 checks relevant to this feature
- **Focus:** [Auth bypasses, injection, data exposure]
- **Tools:** [e.g. OWASP ZAP, manual penetration testing, Snyk]
### 4. Test Case Outline
Priority-ordered list of specific test cases:
**P0 — Must pass before merge:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. User can log in with valid credentials] | E2E | [Redirect to dashboard, session created] |
| [e.g. Invalid login returns 401] | Integration | [Error message displayed, no session] |
| [e.g. Password is never stored in plain text] | Unit | [bcrypt hash in DB] |
**P1 — Must pass before release:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. Login fails gracefully when DB is down] | Integration | [User sees friendly error, 503] |
| [e.g. Rate limiting blocks after 5 failed attempts] | Integration | [429 returned, account flagged] |
**P2 — Should pass, can ship with known issues tracked:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. Login page renders correctly on mobile] | E2E | [Layout matches design] |
### 5. Test Data Requirements
- [Specific test data needed — e.g. test user accounts with various states]
- [External service stubs or mocks needed]
- [Database seed data requirements]
- [Any PII concerns and how test data handles them]
### 6. Definition of Done
Testing is complete when:
- [ ] All P0 test cases pass
- [ ] All P1 test cases pass
- [ ] Code coverage meets the stated target
- [ ] No critical or high severity bugs open
- [ ] Performance targets met (if applicable)
- [ ] Security checks completed (if applicable)
## Quality Checks
- [ ] Risk table is populated and drives test priority (not filled in generically)
- [ ] P0 test cases cover the highest-risk paths specifically
- [ ] Each test type names a concrete tool (not "some testing framework")
- [ ] Definition of Done is measurable (not "tests are done when QA is happy")
## Example Trigger Phrases
- "Write a test strategy for [feature]" + [paste spec or PRD]
- "Create a test plan for [system]"
- "How should we test [feature]?"
- "I need a QA plan for this sprint"
- "What tests do we need for [X]?"
+82
View File
@@ -0,0 +1,82 @@
---
name: changelog-generator
description: "Convert a git log, commit list, or release notes into a polished, user-facing changelog. Use when writing release notes, generating a CHANGELOG.md entry, or documenting what changed in a version. Produces a structured changelog section with version header, categorised changes, and migration notes. Optimised for Opus 4.7 and newer models."
---
# Changelog Generator Skill
Converts raw git commits, a diff summary, or developer release notes into a polished changelog entry — categorised, user-facing, and following Keep a Changelog conventions.
## Required Inputs
Ask for these if not provided:
- **Commits or release notes** (paste `git log --oneline`, raw commit messages, or a description of what changed)
- **Version number** (e.g. 2.4.0, v1.0.0-beta.2)
- **Release date** (or "today")
- **Audience** (developers using an API / end users of a product / internal team — affects language)
- **Any breaking changes** (flag these explicitly if known)
## Output Structure
Follow [Keep a Changelog](https://keepachangelog.com) format:
---
## [X.Y.Z] — YYYY-MM-DD
### Breaking Changes ⚠️
[Only include if there are breaking changes]
- **[Breaking change]:** [What changed and what it breaks]
- **Migration required:** [Specific action the user must take]
### Added
- [New feature or capability, written from the user's perspective]
- [Another addition]
### Changed
- [Changed behaviour — what it did before vs. what it does now]
- [Performance improvement with measurable impact if known]
### Fixed
- [Bug fixed — describe what was broken, not the fix implementation]
- [Another fix]
### Deprecated
- [Deprecated thing] — use [replacement] instead. Will be removed in [version].
### Removed
- [Removed thing] — was deprecated in [version]
### Security
- [Security fix — describe the vulnerability class, not exploit details]
---
## Formatting Rules Applied
**Language:** Write for the reader, not the committer. "Add dark mode support" not "implement ThemeProvider with dark palette variant".
**Breaking changes:** Always call these out first with ⚠️. Include a migration path.
**Bug fixes:** Describe what was broken, not what was changed. "Fix crash when user has no profile picture" not "null-check avatar URL before rendering".
**Granularity:** Group related commits into one line. Don't list every micro-commit separately.
**Tone:** Active voice, imperative mood. "Add", "Fix", "Remove" — not "Added", "Fixed", "Removed".
**Empty sections:** Omit any section with no entries. Don't include empty `### Fixed` blocks.
## Quality Checks
- [ ] Breaking changes are at the top with migration instructions
- [ ] All entries are user-facing language (no internal variable names or implementation details)
- [ ] Related commits are grouped into single entries (not listed individually)
- [ ] Version and date header is correct
- [ ] Empty sections are omitted
- [ ] Tone is imperative mood throughout
## Example Trigger Phrases
- "Write a changelog for version [X]" + [paste commits]
- "Generate release notes from these commits"
- "Turn this git log into a CHANGELOG entry"
- "Write the CHANGELOG.md update for this release"
- "What changed in this release?" + [paste commit list]
+79
View File
@@ -0,0 +1,79 @@
---
name: debugging-log-analyser
description: "Parse error logs, stack traces, and crash reports into a structured root cause diagnosis. Use when sharing a log, stack trace, error output, or crash dump. Produces a structured diagnosis with probable root cause, affected code path, suggested fix, and next debugging steps. Optimised for Opus 4.7 and newer models."
---
# Debugging Log Analyser Skill
Parses raw error logs, stack traces, and crash reports into a structured diagnosis with probable root cause, affected code path, and specific next steps — no hand-waving.
## Required Inputs
Ask for these if not provided:
- **The log / stack trace / error output** (paste directly or describe the error)
- **Language and framework** (e.g. Node.js + Express, Python + Django, Java Spring, Go)
- **Context** (what the user was doing when the error occurred)
- **Environment** (local dev / staging / production)
- **What they've already tried** (if anything)
## Output Structure
### 1. Error Classification
**Error type:** [Runtime exception / Build error / Config error / Network error / Memory/resource error / Unknown]
**Severity:** [Fatal / Critical / Warning / Informational]
**Recurrence pattern:** [One-off / Intermittent / Consistent / On-startup / Under load]
### 2. Stack Trace Analysis
Walk the stack frame by frame, starting from the origin:
- **Origin frame:** [File, line, function where it started]
- **Propagation path:** [How it travelled through the call stack]
- **Crash point:** [Where it ultimately threw/panicked/exited]
For each significant frame, note whether it is:
- User code (fixable here)
- Framework/library code (usually a misuse issue)
- System/runtime code (usually a config or environment issue)
### 3. Root Cause Assessment
**Probable root cause:** [12 sentence plain English statement]
**Confidence:** [High / Medium / Low — and why]
**Alternative causes to rule out:** [If confidence is not high]
### 4. Affected Code Path
**Entry point:** [Where the triggering call began]
**Key function(s) involved:** [Specific functions/methods named in the trace]
**Data that triggered it:** [If inferable from the log — e.g. null value, malformed JSON]
### 5. Suggested Fix
Provide a concrete, code-level suggestion:
- What to change (the minimal fix)
- Why this fixes the root cause
- Any trade-offs or risks in the fix
- A short code snippet if helpful
### 6. Next Debugging Steps
If the root cause is uncertain, provide an ordered list of 35 specific debugging actions:
1. [Specific thing to check — file, log line, config value]
2. [Specific reproduction step or isolation test]
3. [Specific tool command — e.g. `strace`, `pprof`, `--verbose`, add logging at X]
### 7. Prevention
One or two concrete things that would prevent this class of error recurring:
- Better input validation at [point]
- Add monitoring/alerting for [condition]
- Test that covers [scenario]
## Quality Checks
- [ ] Root cause is specific (not "there might be a null pointer issue")
- [ ] At least one concrete code-level fix is suggested
- [ ] Next steps are actionable commands, not vague advice
- [ ] Language-specific idioms are used correctly
- [ ] Prevention is proactive (not just "add error handling")
## Example Trigger Phrases
- "Why is this crashing?" + [paste log]
- "Can you analyse this stack trace?"
- "I'm getting this error, what does it mean?"
- "Debug this log for me"
- "What's causing this exception?"
+87
View File
@@ -0,0 +1,87 @@
---
name: pr-description-writer
description: "Write a clear, structured pull request description from a git diff, branch summary, or commit list. Use when asked to write a PR description, draft a pull request, or document code changes. Produces a description with summary, motivation, changes made, testing steps, and reviewer guidance. Optimised for Opus 4.7 and newer models."
---
# PR Description Writer Skill
Writes structured, reviewer-friendly pull request descriptions from a diff, commit list, or informal notes. Covers the what, why, and how-to-review so reviewers can start immediately.
## Required Inputs
Ask for these if not provided:
- **What changed** (paste a git diff, `git log --oneline`, or describe the changes in plain English)
- **Why it was changed** (the problem being solved or feature being added)
- **How to test it** (any specific steps a reviewer needs to verify it works)
- **Risk level** (low / medium / high — affects how much reviewer guidance to include)
- **PR type** (feature / bug fix / refactor / dependency upgrade / config change / hotfix)
## Output Structure
### Title
A clear, imperative-mood title under 72 characters:
`[type]: [concise description of what changed]`
Examples:
- `feat: add rate limiting to the public API`
- `fix: resolve race condition in session expiry`
- `refactor: extract payment logic into PaymentService`
### Summary
23 sentences covering:
- What this PR does (the change)
- Why it was needed (the problem or goal)
- The approach taken (at a high level)
### Changes Made
Bullet list of specific changes — one bullet per logical change, not per file:
- Added [X] to handle [Y]
- Refactored [A] to reduce [B]
- Removed [C] as it was replaced by [D]
- Updated [E] to fix [F]
### Screenshots / Demo
[If UI change: include before/after screenshots or a screen recording]
[If API change: include example request/response]
[If no visual change: this section can be omitted]
### How to Test
Step-by-step instructions a reviewer can follow:
1. [Setup step if needed]
2. [Action to take]
3. [What to verify]
4. [Edge case to check]
Include any specific commands, test data, or environment flags needed.
### Testing Checklist
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Edge cases covered
- [ ] Manual testing completed
- [ ] No regressions in existing tests
### Reviewer Notes
Flag anything that warrants extra attention:
- Areas of uncertainty where a second opinion is welcome
- Deliberate trade-offs made (and why)
- Out-of-scope items noticed but not addressed
- Dependencies on other PRs (link them)
### Related
- Closes #[issue number] (if applicable)
- Related to #[PR/issue number]
## Quality Checks
- [ ] Title is imperative mood and under 72 characters
- [ ] Summary explains what AND why (not just what)
- [ ] Changes list describes logical changes (not file-by-file changes)
- [ ] Testing steps are reproducible by someone unfamiliar with the code
- [ ] Risk-appropriate reviewer guidance is included
## Example Trigger Phrases
- "Write a PR description for these changes" + [paste diff or description]
- "Draft a pull request for [feature]"
- "I need a PR description — here's what I changed"
- "Summarise these commits into a PR description"
- "Write the PR body for this branch"
+144
View File
@@ -0,0 +1,144 @@
---
name: runbook-writer
description: "Write an operational runbook for a service, incident type, or deployment procedure. Use when asked to write a runbook, create an ops guide, document an operational procedure, or prepare an incident response playbook. Produces a runbook with overview, prerequisites, step-by-step procedures, rollback steps, troubleshooting table, and escalation paths. Optimised for Opus 4.7 and newer models."
---
# Runbook Writer Skill
Produces operational runbooks for services, incident types, and deployment procedures — structured so an on-call engineer who's never touched the system can follow them under pressure.
## Required Inputs
Ask for these if not provided:
- **What the runbook is for** (e.g. deploying the payment service, responding to a database failover, rotating API keys)
- **Runbook type** (Deployment / Incident Response / Maintenance / Disaster Recovery)
- **System/service name and what it does** (brief description)
- **Audience** (new on-call engineers / experienced SREs / DevOps team)
- **Tech stack** (where relevant — e.g. Kubernetes, AWS RDS, Node.js)
## Output Structure
---
**Runbook:** [Runbook Title]
**Service:** [Service Name]
**Type:** [Deployment / Incident Response / Maintenance / DR]
**Last Updated:** [Date]
**Owner:** [Team or person]
**Severity:** [P1 / P2 / P3 — if incident-type]
---
### Overview
**What this runbook covers:**
[12 sentences on the scenario this runbook handles]
**When to use this runbook:**
- [Specific trigger condition 1 — e.g. PagerDuty alert: `high-error-rate-payment-service`]
- [Specific trigger condition 2 — e.g. Deploy needed after PR merged to `main`]
**Estimated time to complete:** [X minutes / XY minutes depending on outcome]
**Impact if not completed correctly:** [e.g. Payment processing degraded / Data loss risk / Users locked out]
---
### Prerequisites
**Access required:**
- [ ] [System/tool access — e.g. AWS Console: `production-account`]
- [ ] [Credential — e.g. `vault read secret/payment-service`]
- [ ] [VPN / bastion access if needed]
**Tools required:**
- [ ] [Tool name and version — e.g. `kubectl` v1.28+]
- [ ] [CLI or dashboard name]
**Before you start:**
- [ ] [Prerequisite check — e.g. Verify current deployment is healthy in Grafana]
- [ ] [Prerequisite action — e.g. Announce in `#ops-live` that you're starting]
---
### Procedure
Number every step. Use exact commands. Do not paraphrase tool names or flags.
**Step 1: [Action name]**
[What you're doing and why — one sentence]
```bash
# Exact command
[command here]
```
**Expected output:** `[what should appear if this worked]`
**If this fails:** [Exact error message to look for] → [What to do, or see Troubleshooting]
**Step 2: [Action name]**
[Same structure as Step 1]
**Step 3: Verify**
Always include a verification step after the main procedure:
```bash
[verification command]
```
**Expected state:** [What a healthy system looks like after this runbook completes]
---
### Rollback
How to undo this procedure if something went wrong:
**Step R1: [Rollback action]**
```bash
[rollback command]
```
**Verify rollback:** `[command to confirm rollback succeeded]`
---
### Troubleshooting
| Symptom | Likely Cause | Resolution |
|---|---|---|
| [Error message or observable symptom] | [Why this happens] | [Exact fix or next step] |
| [Another symptom] | [Cause] | [Resolution] |
---
### Escalation
If this runbook does not resolve the issue:
| Condition | Who to Contact | How |
|---|---|---|
| [e.g. DB unavailable after 10 min] | [DBA on-call] | [PagerDuty policy: `db-oncall`] |
| [e.g. Payment provider unresponsive] | [Vendor contact] | [Contact in 1Password: `vendor-escalation`] |
**Always update the incident timeline in [tool] before escalating.**
---
### Post-Procedure Checklist
After completing the runbook:
- [ ] Announce completion in `#ops-live` with outcome
- [ ] Update the incident ticket / deploy log
- [ ] Verify alerts have resolved in monitoring dashboard
- [ ] If this revealed a gap in this runbook — update it now (link to edit process)
---
## Quality Checks
- [ ] Every step has an exact command (no "run the deploy script")
- [ ] Expected output is specified for each step so engineer knows if it worked
- [ ] Failure path is explicit for each step (not "if it fails, investigate")
- [ ] Rollback procedure is complete and independently testable
- [ ] Escalation paths name specific contacts, not just team names
- [ ] Runbook can be followed by someone who has never touched this system
## Example Trigger Phrases
- "Write a runbook for [service] deployment"
- "Create an incident response runbook for [alert type]"
- "I need a runbook for [procedure]"
- "Document the operational procedure for [X]"
- "Write an ops playbook for [scenario]"
+135
View File
@@ -0,0 +1,135 @@
---
name: system-design-interview
description: "Structure a complete system design answer for interview questions or real architecture sessions. Use when asked to design a system, answer a system design interview question, or architect a solution at scale. Produces a structured answer covering requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations. Optimised for Opus 4.7 and newer models."
---
# System Design Interview Skill
Structures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions.
## Required Inputs
Ask for these if not provided:
- **The system to design** (e.g. "design a URL shortener", "design a notification service", "design Twitter's feed")
- **Scope** (interview prep / real architecture decision / practice run)
- **Scale target** (rough numbers: DAU, requests/sec, data volume — or "assume typical web scale")
- **Constraints or priorities** (e.g. prioritise availability over consistency, minimise cost, low-latency reads)
## Output Structure
### 1. Clarifying Questions
Before designing, list 46 questions that would change the design. Examples:
- Read-heavy or write-heavy? (affects caching and DB choice)
- Global or single-region? (affects latency requirements)
- Strong or eventual consistency? (affects storage and replication)
- Acceptable latency targets? (p50 / p99)
- Any existing infrastructure constraints?
Then proceed with stated assumptions if answering an interview question.
### 2. Functional Requirements
**Core features (must have):**
- [Feature 1]
- [Feature 2]
- [Feature 3]
**Out of scope (for this design):**
- [What's deliberately excluded and why]
### 3. Non-Functional Requirements
| Requirement | Target |
|---|---|
| Availability | [e.g. 99.9% / 99.99%] |
| Latency | [e.g. p95 < 100ms for reads] |
| Throughput | [e.g. 10k writes/sec peak] |
| Consistency | [Strong / Eventual] |
| Durability | [e.g. 99.999% — no data loss] |
### 4. Capacity Estimation
**Traffic:**
- DAU: [X]
- Reads/sec: [X] (peak: [X])
- Writes/sec: [X] (peak: [X])
**Storage:**
- Per record size: [X bytes]
- Records per day: [X]
- 5-year storage: [X GB/TB]
**Bandwidth:**
- Inbound: [X MB/s]
- Outbound: [X MB/s]
### 5. High-Level Architecture
```
[Client] → [CDN/Edge] → [Load Balancer] → [API Servers] → [Cache] → [DB]
→ [Message Queue] → [Workers]
```
Describe each layer in 12 sentences explaining its role and technology choice.
### 6. Component Deep-Dive
Pick the 23 most critical/interesting components and go deep:
**[Component 1: e.g. Database Layer]**
- Choice: [Technology and why — e.g. PostgreSQL for ACID guarantees, Cassandra for write throughput]
- Schema design (high-level): [Key tables/collections and their structure]
- Indexing strategy: [What gets indexed and why]
- Replication: [Primary-replica / Multi-primary — and why]
**[Component 2: e.g. Caching Strategy]**
- Cache type: [Redis / Memcached — and why]
- What gets cached: [Hot data — e.g. user sessions, frequent reads]
- Cache invalidation: [TTL / Write-through / Write-behind — trade-offs]
- Cache hit rate target: [e.g. 95%]
**[Component 3: e.g. API Design]**
- Key endpoints: [List the 35 most important API calls]
- Authentication: [JWT / OAuth / API keys]
- Rate limiting: [Where and at what rate]
### 7. Data Flow
Walk through the two most critical paths end-to-end:
**Write path:** [Step 1 → Step 2 → Step 3...]
**Read path:** [Step 1 → Step 2 → Step 3...]
### 8. Scaling Bottlenecks and Mitigations
| Bottleneck | Mitigation |
|---|---|
| [e.g. DB write throughput] | [e.g. sharding by user_id, write batching] |
| [e.g. Hot-key cache misses] | [e.g. local in-process cache, probabilistic early expiry] |
| [e.g. Single region latency] | [e.g. multi-region deployment, GeoDNS routing] |
### 9. Trade-offs and Alternatives
Be explicit about what was chosen and what was sacrificed:
| Decision | Why | Trade-off |
|---|---|---|
| [e.g. Eventual consistency] | [Higher availability, lower latency] | [Stale reads possible] |
| [e.g. SQL over NoSQL] | [Complex queries, ACID transactions] | [Harder to shard horizontally] |
| [e.g. Async processing via queue] | [Decoupled, more resilient] | [Eventual delivery, harder to debug] |
### 10. Follow-up Considerations
Things to tackle in production but out of scope for this design session:
- Monitoring and alerting (what metrics matter)
- Disaster recovery and backup strategy
- Security (auth, encryption at rest/transit, rate limiting)
- Cost optimisation at scale
- Gradual rollout and feature flagging
## Quality Checks
- [ ] Clarifying questions are design-changing (not generic filler)
- [ ] Capacity estimates use real numbers (not just "it scales")
- [ ] At least 2 component deep-dives with technology choices justified
- [ ] Trade-offs section is honest (not just benefits of chosen approach)
- [ ] Data flow is described end-to-end for the critical path
## Example Trigger Phrases
- "Help me answer a system design interview: [question]"
- "Design [system] for a system design interview"
- "How would I architect [system] at scale?"
- "I have a system design interview — the question is [X]"
- "Design a [URL shortener / chat system / notification service / feed]"
+127
View File
@@ -0,0 +1,127 @@
---
name: test-strategy-doc
description: "Write a test strategy document from a feature spec, PRD, or system description. Use when asked to create a test plan, write a test strategy, define QA approach, or plan testing for a feature or release. Produces a complete test strategy with scope, risk assessment, test types, coverage targets, and a prioritised test case outline. Optimised for Opus 4.7 and newer models."
---
# Test Strategy Document Skill
Produces a complete test strategy from a feature spec, PRD, or system description — covering scope, test types, risk areas, coverage requirements, and a prioritised test case outline.
## Required Inputs
Ask for these if not provided:
- **Feature or system being tested** (paste a spec, PRD, or describe it in plain English)
- **Tech stack** (language, framework, testing tools already in use if known)
- **Risk level** (low / medium / high / critical — affects depth and coverage requirements)
- **Timeline** (when does this need to ship — affects prioritisation)
- **Team context** (who is doing the testing — developers / dedicated QA / both)
## Output Structure
### 1. Test Scope
**In scope:**
- [Specific functionality being tested]
- [Integration points covered]
- [User-facing flows included]
**Out of scope:**
- [What is deliberately not tested here — and why]
- [Dependencies owned by other teams]
**Assumptions:**
- [What the test strategy assumes is true — e.g. mocked services, test data availability]
### 2. Risk Assessment
Identify the highest-risk areas first — these drive depth and coverage:
| Area | Risk Level | Why | Test Priority |
|---|---|---|---|
| [e.g. Payment processing] | High | Money movement, regulatory | P0 — exhaustive |
| [e.g. User authentication] | High | Security boundary | P0 — exhaustive |
| [e.g. Email notifications] | Medium | External dependency | P1 — happy path + key failures |
| [e.g. UI copy changes] | Low | Visual only, reversible | P2 — smoke only |
### 3. Test Types and Coverage
**Unit Tests**
- **What:** Individual functions and methods in isolation
- **Who writes:** Developer
- **Coverage target:** [e.g. 80% line coverage on new code / 100% on critical paths]
- **Tools:** [e.g. Jest, pytest, go test]
- **Focus areas for this feature:** [Specific logic that needs unit coverage]
**Integration Tests**
- **What:** Service interactions, database operations, API contracts
- **Who writes:** Developer / QA
- **Coverage target:** [All happy paths + key failure modes]
- **Tools:** [e.g. Supertest, pytest + testcontainers]
- **Focus areas:** [Specific integrations at risk — e.g. third-party API, DB schema changes]
**End-to-End Tests**
- **What:** Critical user journeys from browser/client to database
- **Who writes:** QA / Developer
- **Coverage target:** [Top N user journeys — list them]
- **Tools:** [e.g. Playwright, Cypress, Selenium]
- **Focus areas:** [The 35 most critical user flows]
**Performance Tests** *(include only if risk is medium+)*
- **What:** Load, stress, or latency testing
- **Targets:** [Specific numbers — e.g. 200 req/sec at p95 < 200ms]
- **Tools:** [e.g. k6, Locust, JMeter]
**Security Tests** *(include only if risk is high+)*
- **What:** OWASP Top 10 checks relevant to this feature
- **Focus:** [Auth bypasses, injection, data exposure]
- **Tools:** [e.g. OWASP ZAP, manual penetration testing, Snyk]
### 4. Test Case Outline
Priority-ordered list of specific test cases:
**P0 — Must pass before merge:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. User can log in with valid credentials] | E2E | [Redirect to dashboard, session created] |
| [e.g. Invalid login returns 401] | Integration | [Error message displayed, no session] |
| [e.g. Password is never stored in plain text] | Unit | [bcrypt hash in DB] |
**P1 — Must pass before release:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. Login fails gracefully when DB is down] | Integration | [User sees friendly error, 503] |
| [e.g. Rate limiting blocks after 5 failed attempts] | Integration | [429 returned, account flagged] |
**P2 — Should pass, can ship with known issues tracked:**
| Test Case | Type | Expected Outcome |
|---|---|---|
| [e.g. Login page renders correctly on mobile] | E2E | [Layout matches design] |
### 5. Test Data Requirements
- [Specific test data needed — e.g. test user accounts with various states]
- [External service stubs or mocks needed]
- [Database seed data requirements]
- [Any PII concerns and how test data handles them]
### 6. Definition of Done
Testing is complete when:
- [ ] All P0 test cases pass
- [ ] All P1 test cases pass
- [ ] Code coverage meets the stated target
- [ ] No critical or high severity bugs open
- [ ] Performance targets met (if applicable)
- [ ] Security checks completed (if applicable)
## Quality Checks
- [ ] Risk table is populated and drives test priority (not filled in generically)
- [ ] P0 test cases cover the highest-risk paths specifically
- [ ] Each test type names a concrete tool (not "some testing framework")
- [ ] Definition of Done is measurable (not "tests are done when QA is happy")
## Example Trigger Phrases
- "Write a test strategy for [feature]" + [paste spec or PRD]
- "Create a test plan for [system]"
- "How should we test [feature]?"
- "I need a QA plan for this sprint"
- "What tests do we need for [X]?"