pm-claude-skills/plugins/pm-engineering/skills/rfc-writer/SKILL.md

---
name: rfc-writer
description: "Write an engineering RFC (Request for Comments) for a technical decision, architectural change, or significant implementation approach. Use when asked to write an RFC, document a technical proposal, create a design doc, write an architecture decision for review, or produce a technical specification for team feedback. Produces a complete RFC document covering problem statement, motivation, proposed solution, alternatives rejected, implementation plan, migration plan, security and performance implications, observability changes, rollout plan, and open questions."
---

# RFC Writer Skill

Produce a complete engineering RFC (Request for Comments) for a technical decision or architectural change. An RFC is a structured proposal document — not a persuasion document. Its purpose is to expose a decision to scrutiny, surface trade-offs, document alternatives considered, and create a permanent record of why a choice was made.

A good RFC makes it possible for someone who wasn't in the room to understand years later why the team built something the way they did.

## Required Inputs

Ask for these if not already provided:
- **RFC title and author** — what this RFC is about and who is proposing it
- **Problem being solved** — what is broken, missing, or inadequate today; why action is needed now
- **Proposed solution** — the approach the author is recommending, at least at a high level
- **Context and constraints** — team size, existing architecture, timeline pressures, budget limits, compliance requirements
- **Alternatives considered** — at least 2 alternative approaches the author has thought about
- **Current status** — is this pre-decision (seeking feedback) or post-decision (documenting a made decision)?

## Output Format

---

# RFC [Number]: [Title]

**Author:** [Name] | **Team:** [Team name]
**Created:** [Date] | **Last updated:** [Date]
**Status:** Draft | In Review | Approved | Rejected | Superseded by RFC-[X]
**Ticket:** [JIRA-XXX] | **Slack thread:** [#channel link]
**Review deadline:** [Date — when comments should be submitted by]

---

## Abstract

[2–4 sentences summarising the entire RFC. Should stand alone — someone reading only this should understand what is being proposed, why, and what the main trade-off is. Write this last.]

---

## 1. Problem Statement

[Describe the problem being solved. Focus on the *problem*, not the solution. Be specific and quantified where possible.]

**Current state:**
[Describe how things work today — the existing system, process, or architecture. Include any relevant constraints or limitations.]

**Why this is a problem now:**
[Why is this being addressed now rather than earlier or later? Reference metrics, incidents, product requirements, or scaling thresholds that make this urgent or timely.]

**Example of the problem in practice:**
[A concrete scenario or incident that illustrates the problem. This helps reviewers understand the real-world impact, not just the abstract description.]

```
// Example: current behaviour that illustrates the problem
[code snippet, log output, or sequence description showing the problem]
```

**Impact of not solving this:**
- [Impact 1 — e.g. "New tenant onboarding requires 3 hours of manual configuration per account"]
- [Impact 2 — e.g. "Auth service handles 400 req/s; projected to hit capacity within 8 weeks at current growth"]
- [Impact 3 — e.g. "Current approach is incompatible with the upcoming multi-region requirement"]

---

## 2. Goals and Non-Goals

**Goals:**
- [ ] [Specific, measurable outcome — e.g. "Reduce tenant onboarding time from 3 hours to <5 minutes"]
- [ ] [e.g. "Support 2,000 req/s on the auth service with P99 latency ≤50ms"]
- [ ] [e.g. "Enable multi-region deployment without changes to the application layer"]

**Non-goals:** *(what this RFC explicitly does not address)*
- [e.g. "This RFC does not address authentication for internal service-to-service calls — see RFC-042"]
- [e.g. "Performance improvements to the existing system — this RFC replaces it"]
- [e.g. "Migration of historical data — covered in a follow-on RFC"]

**Success metrics:**
| Metric | Current | Target | Measurement method |
|---|---|---|---|
| [e.g. Onboarding time] | [3 hours] | [<5 minutes] | [Prometheus histogram on onboarding job duration] |
| [e.g. Auth latency P99] | [120ms] | [≤50ms] | [Datadog APM] |
| [e.g. Engineer setup time] | [4 hours] | [<30 minutes] | [Onboarding survey] |

---

## 3. Background and Motivation

[Provide the context a reviewer needs to evaluate the proposal. This is not a repeat of the problem statement — it is the surrounding technical and business context.]

**Existing system overview:**
[Describe the relevant parts of the current architecture. Include an ASCII diagram if the relationships between components help understanding.]

```
[ASCII diagram of current architecture — optional but strongly recommended for architectural RFCs]

  ┌──────────┐     ┌──────────────┐     ┌──────────────┐
  │  Client  │────▶│  [Service A] │────▶│  [Service B] │
  └──────────┘     └──────────────┘     └──────────────┘
                           │
                           ▼
                   ┌──────────────┐
                   │  [Database]  │
                   └──────────────┘
```

**Prior work and related decisions:**
- [RFC-XXX: Title — relevant previous decision; link]
- [ADR-XXX: Title — architectural decision record]
- [Any external standards, blog posts, or vendor documentation that informs this proposal]

**Constraints:**
- [e.g. Must remain backward compatible with v1 API clients for 12 months]
- [e.g. Team has no Rust expertise — solution must be in Python or Go]
- [e.g. Must be deployable without a maintenance window]

---

## 4. Proposed Solution

[Describe the proposed approach clearly and specifically. Include enough detail that an engineer could begin implementing from this document, but don't write the code — that is for the PR.]

### 4.1 High-Level Approach

[1–3 paragraphs describing the overall solution. Explain the key idea and why it solves the problem.]

### 4.2 Architecture

```
[ASCII diagram of the proposed architecture — what the system looks like after this RFC is implemented]

  ┌──────────┐     ┌──────────────────┐     ┌──────────────┐
  │  Client  │────▶│  [New Component] │────▶│  [Service B] │
  └──────────┘     └──────────────────┘     └──────────────┘
                           │                       │
                           ▼                       ▼
                   ┌──────────────┐       ┌──────────────┐
                   │  [Store A]   │       │  [Store B]   │
                   └──────────────┘       └──────────────┘
```

### 4.3 Detailed Design

[Break the solution into its key components or decisions. For each, explain what it does and why it was designed this way.]

**Component / Decision 1: [Name]**

[Description of this component — what it does, how it works, why this approach was chosen.]

```
// Example interface, API contract, or pseudocode (not implementation code)
[Relevant schema, API definition, data flow, or pseudocode]
```

**Component / Decision 2: [Name]**

[Description]

**Component / Decision 3: [Name]**

[Description]

### 4.4 API Changes

*Complete this section if the RFC introduces or modifies any API endpoints, events, or interfaces.*

**New endpoints / events:**
```
[HTTP method + path or event name]
Request: { ... }
Response: { ... }
```

**Modified endpoints:**
- `[endpoint]`: [what changes and why; backward compatibility note]

**Deprecated endpoints:**
- `[endpoint]`: deprecated in favour of `[new endpoint]` — removal timeline: [date/version]

### 4.5 Data Model Changes

*Complete this section if any database schema or data structure changes are required.*

[Describe schema changes at a high level. Reference the database-migration-plan skill for detailed migration steps.]

```sql
-- Key schema changes (abbreviated — full migration in [link])
[DDL statements for key additions/changes]
```

---

## 5. Alternatives Considered

*Every alternative must include an explicit reason why it was rejected. "We went with the proposed solution" is not a reason.*

### Alternative 1: [Name]

**Description:**
[What this alternative would involve.]

**Pros:**
- [Pro 1]
- [Pro 2]

**Cons:**
- [Con 1]
- [Con 2]

**Why rejected:**
[Specific reason — e.g. "Requires 3× the infrastructure cost", "Incompatible with multi-region requirement", "Team has no expertise in this technology and the ramp-up would miss the Q3 deadline"]

---

### Alternative 2: [Name]

**Description:**
[What this alternative would involve.]

**Pros:**
- [Pro 1]
- [Pro 2]

**Cons:**
- [Con 1]
- [Con 2]

**Why rejected:**
[Specific reason]

---

### Alternative 3: Do nothing / defer

**Description:**
Accept the current state and revisit the problem in [timeframe].

**Why rejected:**
[Why deferring is not acceptable — reference the impact of not solving this from Section 1.]

---

## 6. Implementation Plan

**Estimated effort:** [X engineer-weeks] | **Target completion:** [Date / Quarter]
**Team:** [Who is building this — names or roles]

| Phase | Description | Duration | Dependencies | Owner |
|---|---|---|---|---|
| 1 | [e.g. Core implementation — new component built and tested] | [X weeks] | [None] | [Name] |
| 2 | [e.g. Integration — connect new component to existing services] | [X weeks] | [Phase 1 complete] | [Name] |
| 3 | [e.g. Rollout — canary deploy, then full rollout] | [X weeks] | [Phase 2 + staging validated] | [Name] |
| 4 | [e.g. Cleanup — deprecate old system, remove feature flags] | [X weeks] | [Phase 3 stable for X weeks] | [Name] |

**Key milestones:**
- [ ] [Date]: [Milestone — e.g. "Core implementation complete and code-reviewed"]
- [ ] [Date]: [Milestone — e.g. "Staging environment validation complete"]
- [ ] [Date]: [Milestone — e.g. "10% canary traffic without regression"]
- [ ] [Date]: [Milestone — e.g. "Full rollout complete"]
- [ ] [Date]: [Milestone — e.g. "Old system decommissioned"]

---

## 7. Migration Plan

*Complete this section if the RFC requires migrating existing users, data, or API consumers.*

**Migration strategy:** [Big-bang / Phased / Parallel-run / Opt-in]

**Who is affected:**
- [e.g. All existing API v1 consumers — requires updated client libraries]
- [e.g. X million rows in the `orders` table require backfilling]

**Migration steps:**
1. [Step 1 — describe action, who does it, estimated duration]
2. [Step 2]
3. [Step 3]

**Backward compatibility window:** [How long will the old system/API remain available?]

**Communication plan:**
- [Who needs to be notified, when, and how — e.g. "API consumers will receive a deprecation notice 3 months before the old endpoint is removed"]

---

## 8. Security Implications

[Describe the security impact of this change. If there are no security implications, state that explicitly with reasoning — do not leave this section blank.]

| Concern | Impact | Mitigation |
|---|---|---|
| [e.g. New API endpoint exposed to internet] | [e.g. New attack surface] | [e.g. Rate limiting, auth required, WAF rules] |
| [e.g. New data stored — user PII] | [e.g. GDPR scope expanded] | [e.g. Encrypted at rest, access log, data retention policy] |
| [e.g. Service-to-service communication] | [e.g. Token forgery risk] | [e.g. mTLS between services] |

**Has a threat model been produced or updated?** [Yes — link / No — required before implementation / Not required — reason]

---

## 9. Performance Implications

[Describe the expected performance impact. Include projections for the new system and how it was estimated.]

| Metric | Current | Projected | Measurement method |
|---|---|---|---|
| [e.g. P99 latency — /api/auth] | [120ms] | [≤50ms] | [Load test results — link] |
| [e.g. Database query count per request] | [12] | [3] | [Query logging in staging] |
| [e.g. Memory per instance] | [512MB] | [768MB] | [Profiling — link] |
| [e.g. Infrastructure cost] | [$X/month] | [$Y/month] | [AWS cost calculator estimate] |

**Load testing:** [Has load testing been done? Link to results. If not, when will it be done?]

**Performance risks:**
- [Risk 1 — e.g. "New component adds a network hop that may increase tail latency under congestion — needs validation at 2× peak load"]

---

## 10. Observability Changes

*Describe what new or changed metrics, logs, traces, and alerts this RFC introduces.*

**New metrics:**
| Metric name | Type | Description | Alert threshold |
|---|---|---|---|
| `[service].[component].[metric]` | [counter/gauge/histogram] | [What it measures] | [e.g. P99 > 100ms for 5 min] |

**New log events:**
| Event | Level | When emitted | Key fields |
|---|---|---|---|
| `[event.name]` | INFO | [When] | `user_id`, `duration_ms`, `result` |

**Distributed tracing:** [Are spans added for new components? Which operations are instrumented?]

**Dashboard changes:** [New dashboard / updated existing dashboard — link]

---

## 11. Rollout Plan

**Rollout strategy:** [Feature flag / Canary / Blue-green / Gradual traffic shift / Full deploy]

| Stage | Traffic % | Duration | Success criteria | Rollback trigger |
|---|---|---|---|---|
| Internal testing | 0% (dogfood) | [X days] | [No errors in internal usage] | Any error |
| Canary | 1% | [X hours] | [Error rate <0.1%; P99 latency within budget] | Error rate >0.5% |
| Limited rollout | 10% | [X days] | [As above + business metrics stable] | Error rate >0.2% |
| Full rollout | 100% | — | [All success metrics from Section 2 met] | Any SLO breach |

**Feature flag:** [Name of feature flag, if applicable] — managed in [LaunchDarkly / Unleash / config]

**Rollback procedure:**
```
// How to roll back if the rollout needs to be reversed
1. [Step 1 — e.g. Toggle feature flag to off]
2. [Step 2 — e.g. Deploy previous version]
3. [Step 3 — e.g. Notify stakeholders]
```

---

## 12. Open Questions

[List any unresolved questions, design decisions not yet made, or areas where the author is specifically seeking feedback. Assign an owner and a resolution deadline for each.]

| # | Question | Owner | Deadline | Resolution |
|---|---|---|---|---|
| 1 | [e.g. Should we use optimistic or pessimistic locking for concurrent updates to [resource]?] | [Name] | [Date] | [Pending / [Answer]] |
| 2 | [e.g. What is the retention policy for [new data type]?] | [Name] | [Date] | [Pending / [Answer]] |
| 3 | [e.g. Do we need a read replica for this query pattern at launch, or can we defer it?] | [Name] | [Date] | [Pending / [Answer]] |

---

## 13. Decision

*To be filled in after the review period closes.*

**Decision:** [Approved / Rejected / Approved with modifications]
**Decision date:** [Date]
**Decision makers:** [Names]

**Summary of key feedback addressed:**
- [Feedback item and how it was resolved]

**Conditions of approval (if any):**
- [e.g. Must complete load testing before Phase 2 begins]

---

## Quality Checks

- [ ] The problem statement is specific and quantified — not "the current system is slow" but "P99 latency is 800ms; budget is 200ms"
- [ ] Goals section includes measurable success metrics, not aspirational statements
- [ ] Every alternative has an explicit rejection reason — not just a list of cons
- [ ] Security implications section is completed, not left blank
- [ ] Performance implications include projected numbers, not just "should be better"
- [ ] Open questions are assigned to named owners with deadlines — not floating
- [ ] The RFC is written to be read by someone who was not in the planning conversations
- [ ] Migration plan addresses all affected parties — users, API consumers, data — not just the technical steps