Files
pm-claude-skills/plugins/pm-engineering/skills/system-design-interview/SKILL.md
T
mohitagw15856 929fa3ad7f Restore trigger phrases as ## Usage Examples across 10 engineering skills
Renamed ## Example Trigger Phrases → ## Usage Examples to make the section
clearly human-facing documentation rather than a system instruction.
Restores content that was removed in the previous quality pass.

Skills updated (both skills/ and plugins/pm-engineering/skills/):
code-review-checklist, debugging-log-analyser, changelog-generator,
pr-description-writer, system-design-interview, test-strategy-doc,
runbook-writer, incident-postmortem, api-docs-writer,
architecture-decision-record

https://claude.ai/code/session_01C3HwChrccJd145vJ6Z7ajF
2026-05-20 12:06:26 +00:00

5.3 KiB
Raw Blame History

name, description
name description
system-design-interview Structure a complete system design answer for interview questions or real architecture sessions. Use when asked to design a system, answer a system design interview question, or architect a solution at scale. Produces a structured answer covering requirements, capacity estimates, high-level design, component deep-dives, trade-offs, and follow-up considerations.

System Design Interview Skill

Structures a complete, interview-grade system design response — covering clarifying questions, requirements, capacity estimates, architecture, component design, and trade-offs. Works equally well for real architecture sessions.

Required Inputs

Ask for these if not provided:

  • The system to design (e.g. "design a URL shortener", "design a notification service", "design Twitter's feed")
  • Scope (interview prep / real architecture decision / practice run)
  • Scale target (rough numbers: DAU, requests/sec, data volume — or "assume typical web scale")
  • Constraints or priorities (e.g. prioritise availability over consistency, minimise cost, low-latency reads)

Output Format

1. Clarifying Questions

Before designing, list 46 questions that would change the design. Examples:

  • Read-heavy or write-heavy? (affects caching and DB choice)
  • Global or single-region? (affects latency requirements)
  • Strong or eventual consistency? (affects storage and replication)
  • Acceptable latency targets? (p50 / p99)
  • Any existing infrastructure constraints?

Then proceed with stated assumptions if answering an interview question.

2. Functional Requirements

Core features (must have):

  • [Feature 1]
  • [Feature 2]
  • [Feature 3]

Out of scope (for this design):

  • [What's deliberately excluded and why]

3. Non-Functional Requirements

Requirement Target
Availability [e.g. 99.9% / 99.99%]
Latency [e.g. p95 < 100ms for reads]
Throughput [e.g. 10k writes/sec peak]
Consistency [Strong / Eventual]
Durability [e.g. 99.999% — no data loss]

4. Capacity Estimation

Traffic:

  • DAU: [X]
  • Reads/sec: [X] (peak: [X])
  • Writes/sec: [X] (peak: [X])

Storage:

  • Per record size: [X bytes]
  • Records per day: [X]
  • 5-year storage: [X GB/TB]

Bandwidth:

  • Inbound: [X MB/s]
  • Outbound: [X MB/s]

5. High-Level Architecture

[Client] → [CDN/Edge] → [Load Balancer] → [API Servers] → [Cache] → [DB]
                                                         → [Message Queue] → [Workers]

Describe each layer in 12 sentences explaining its role and technology choice.

6. Component Deep-Dive

Pick the 23 most critical/interesting components and go deep:

[Component 1: e.g. Database Layer]

  • Choice: [Technology and why — e.g. PostgreSQL for ACID guarantees, Cassandra for write throughput]
  • Schema design (high-level): [Key tables/collections and their structure]
  • Indexing strategy: [What gets indexed and why]
  • Replication: [Primary-replica / Multi-primary — and why]

[Component 2: e.g. Caching Strategy]

  • Cache type: [Redis / Memcached — and why]
  • What gets cached: [Hot data — e.g. user sessions, frequent reads]
  • Cache invalidation: [TTL / Write-through / Write-behind — trade-offs]
  • Cache hit rate target: [e.g. 95%]

[Component 3: e.g. API Design]

  • Key endpoints: [List the 35 most important API calls]
  • Authentication: [JWT / OAuth / API keys]
  • Rate limiting: [Where and at what rate]

7. Data Flow

Walk through the two most critical paths end-to-end:

Write path: [Step 1 → Step 2 → Step 3...] Read path: [Step 1 → Step 2 → Step 3...]

8. Scaling Bottlenecks and Mitigations

Bottleneck Mitigation
[e.g. DB write throughput] [e.g. sharding by user_id, write batching]
[e.g. Hot-key cache misses] [e.g. local in-process cache, probabilistic early expiry]
[e.g. Single region latency] [e.g. multi-region deployment, GeoDNS routing]

9. Trade-offs and Alternatives

Be explicit about what was chosen and what was sacrificed:

Decision Why Trade-off
[e.g. Eventual consistency] [Higher availability, lower latency] [Stale reads possible]
[e.g. SQL over NoSQL] [Complex queries, ACID transactions] [Harder to shard horizontally]
[e.g. Async processing via queue] [Decoupled, more resilient] [Eventual delivery, harder to debug]

10. Follow-up Considerations

Things to tackle in production but out of scope for this design session:

  • Monitoring and alerting (what metrics matter)
  • Disaster recovery and backup strategy
  • Security (auth, encryption at rest/transit, rate limiting)
  • Cost optimisation at scale
  • Gradual rollout and feature flagging

Quality Checks

  • Clarifying questions are design-changing (not generic filler)
  • Capacity estimates use real numbers (not just "it scales")
  • At least 2 component deep-dives with technology choices justified
  • Trade-offs section is honest (not just benefits of chosen approach)
  • Data flow is described end-to-end for the critical path

Usage Examples

  • "Help me answer a system design interview: [question]"
  • "Design [system] for a system design interview"
  • "How would I architect [system] at scale?"
  • "I have a system design interview — the question is [X]"
  • "Design a [URL shortener / chat system / notification service / feed]"