New skills added to pm-engineering bundle (now 10 skills total): - debugging-log-analyser: stack trace → structured root cause diagnosis + fix - pr-description-writer: diff/commits → reviewer-ready PR description - system-design-interview: full system design with capacity, components, trade-offs - changelog-generator: git log → polished Keep a Changelog entry - test-strategy-doc: spec/PRD → complete test strategy with P0/P1 test cases - runbook-writer: operational runbooks with exact commands, rollback, and escalation README updates: - 5 shields.io badges (stars, skill count, version, install, license) - "See It in Action" demo section - pm-engineering added to Quick Install list - Star Milestone Tracker (100/250/500/1000 stars roadmap) - Engineering table extended from 4 to 10 skills (41–50) - Article 14 link resolved from remote merge Config updates: - marketplace.json: v6.0.0 → v7.0.0, "106 skills" - pm-engineering plugin.json: v1.0.0 → v2.0.0 New file: SKILL_REQUEST.md — community skill voting board Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.4 KiB
name, description
| name | description |
|---|---|
| system-design-interview | Structure a complete system design answer for interview questions or real architecture sessions. Use when asked to design a system, answer a system design interview question, or architect a solution at scale. Produces a structured answer covering requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations. Optimised for Opus 4.7 and newer models. |
System Design Interview Skill
Structures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions.
Required Inputs
Ask for these if not provided:
- The system to design (e.g. "design a URL shortener", "design a notification service", "design Twitter's feed")
- Scope (interview prep / real architecture decision / practice run)
- Scale target (rough numbers: DAU, requests/sec, data volume — or "assume typical web scale")
- Constraints or priorities (e.g. prioritise availability over consistency, minimise cost, low-latency reads)
Output Structure
1. Clarifying Questions
Before designing, list 4–6 questions that would change the design. Examples:
- Read-heavy or write-heavy? (affects caching and DB choice)
- Global or single-region? (affects latency requirements)
- Strong or eventual consistency? (affects storage and replication)
- Acceptable latency targets? (p50 / p99)
- Any existing infrastructure constraints?
Then proceed with stated assumptions if answering an interview question.
2. Functional Requirements
Core features (must have):
- [Feature 1]
- [Feature 2]
- [Feature 3]
Out of scope (for this design):
- [What's deliberately excluded and why]
3. Non-Functional Requirements
| Requirement | Target |
|---|---|
| Availability | [e.g. 99.9% / 99.99%] |
| Latency | [e.g. p95 < 100ms for reads] |
| Throughput | [e.g. 10k writes/sec peak] |
| Consistency | [Strong / Eventual] |
| Durability | [e.g. 99.999% — no data loss] |
4. Capacity Estimation
Traffic:
- DAU: [X]
- Reads/sec: [X] (peak: [X])
- Writes/sec: [X] (peak: [X])
Storage:
- Per record size: [X bytes]
- Records per day: [X]
- 5-year storage: [X GB/TB]
Bandwidth:
- Inbound: [X MB/s]
- Outbound: [X MB/s]
5. High-Level Architecture
[Client] → [CDN/Edge] → [Load Balancer] → [API Servers] → [Cache] → [DB]
→ [Message Queue] → [Workers]
Describe each layer in 1–2 sentences explaining its role and technology choice.
6. Component Deep-Dive
Pick the 2–3 most critical/interesting components and go deep:
[Component 1: e.g. Database Layer]
- Choice: [Technology and why — e.g. PostgreSQL for ACID guarantees, Cassandra for write throughput]
- Schema design (high-level): [Key tables/collections and their structure]
- Indexing strategy: [What gets indexed and why]
- Replication: [Primary-replica / Multi-primary — and why]
[Component 2: e.g. Caching Strategy]
- Cache type: [Redis / Memcached — and why]
- What gets cached: [Hot data — e.g. user sessions, frequent reads]
- Cache invalidation: [TTL / Write-through / Write-behind — trade-offs]
- Cache hit rate target: [e.g. 95%]
[Component 3: e.g. API Design]
- Key endpoints: [List the 3–5 most important API calls]
- Authentication: [JWT / OAuth / API keys]
- Rate limiting: [Where and at what rate]
7. Data Flow
Walk through the two most critical paths end-to-end:
Write path: [Step 1 → Step 2 → Step 3...] Read path: [Step 1 → Step 2 → Step 3...]
8. Scaling Bottlenecks and Mitigations
| Bottleneck | Mitigation |
|---|---|
| [e.g. DB write throughput] | [e.g. sharding by user_id, write batching] |
| [e.g. Hot-key cache misses] | [e.g. local in-process cache, probabilistic early expiry] |
| [e.g. Single region latency] | [e.g. multi-region deployment, GeoDNS routing] |
9. Trade-offs and Alternatives
Be explicit about what was chosen and what was sacrificed:
| Decision | Why | Trade-off |
|---|---|---|
| [e.g. Eventual consistency] | [Higher availability, lower latency] | [Stale reads possible] |
| [e.g. SQL over NoSQL] | [Complex queries, ACID transactions] | [Harder to shard horizontally] |
| [e.g. Async processing via queue] | [Decoupled, more resilient] | [Eventual delivery, harder to debug] |
10. Follow-up Considerations
Things to tackle in production but out of scope for this design session:
- Monitoring and alerting (what metrics matter)
- Disaster recovery and backup strategy
- Security (auth, encryption at rest/transit, rate limiting)
- Cost optimisation at scale
- Gradual rollout and feature flagging
Quality Checks
- Clarifying questions are design-changing (not generic filler)
- Capacity estimates use real numbers (not just "it scales")
- At least 2 component deep-dives with technology choices justified
- Trade-offs section is honest (not just benefits of chosen approach)
- Data flow is described end-to-end for the critical path
Example Trigger Phrases
- "Help me answer a system design interview: [question]"
- "Design [system] for a system design interview"
- "How would I architect [system] at scale?"
- "I have a system design interview — the question is [X]"
- "Design a [URL shortener / chat system / notification service / feed]"