feat: add 4 community skills (partial batch 1/3) — instagram downloader, substack scraper, notes humanizer, last-30-days research

https://claude.ai/code/session_01E4bTUWxx4Zo5rsFpad5X5B
This commit is contained in:
Mohit Aggarwal
2026-05-27 09:30:04 +00:00
parent d213ccde1c
commit 2c92636980
4 changed files with 1157 additions and 0 deletions
+665
View File
@@ -0,0 +1,665 @@
---
name: instagram-post-downloader
description: "Download Instagram posts — single images or full carousels — directly from a URL. Fetches high-resolution files from Instagram's CDN, saves them into a named folder, and stitches carousel slides into a single PDF. Supports batch downloading of multiple URLs at once. Use when asked to download, save, or archive an Instagram post, reel thumbnail, or carousel."
---
# Instagram Post Downloader Skill
Downloads Instagram posts at full resolution from Instagram's CDN — no screenshots, no compression. Handles single images, carousels (multi-slide posts), and Reel cover images. For carousels, produces individual slide files plus a single stitched PDF. Supports batch URLs in one run.
---
## PREREQUISITE — Domain Allowlist
Before this skill can fetch any media, you must add Instagram's CDN domain to Claude Code's allowlist:
**Settings → Capabilities → Domain allowlist → Add:**
```
*.cdninstagram.com
```
Without this, all CDN fetch calls will be blocked. If you see a permission error when Claude attempts a fetch to `cdninstagram.com`, this is the fix.
---
## Required Inputs
Claude will ask for these if not provided upfront:
| Input | Required | Notes |
|---|---|---|
| Instagram post URL(s) | Yes | One per line, or comma-separated. `https://www.instagram.com/p/XXXX/` or `https://www.instagram.com/reel/XXXX/` format |
| Output directory | No | Defaults to `./instagram-downloads/` in the current working directory |
| PDF stitch for carousels | No | Defaults to **yes** — produces `carousel.pdf` alongside individual slides |
| File naming prefix | No | Optional prefix added before slide filenames, e.g. `brand_``brand_slide_01.jpg` |
**Batch input example:**
```
https://www.instagram.com/p/ABC123/
https://www.instagram.com/p/DEF456/
https://www.instagram.com/p/GHI789/
```
---
## Output Structure
For each URL processed, Claude creates a folder named after the post caption (first 40 characters, sanitised — spaces become underscores, special characters stripped). If no caption is available, the folder is named after the post shortcode.
### Single image post
```
instagram-downloads/
└── this_is_the_caption_first_40_chars/
├── image.jpg
└── metadata.txt
```
### Carousel post
```
instagram-downloads/
└── carousel_caption_first_40_chars/
├── slide_01.jpg
├── slide_02.jpg
├── slide_03.jpg
├── slide_04.jpg
├── carousel.pdf ← all slides stitched in order
└── metadata.txt
```
### Batch run (3 URLs)
```
instagram-downloads/
├── first_post_caption_sanitised/
│ ├── image.jpg
│ └── metadata.txt
├── second_post_carousel_caption/
│ ├── slide_01.jpg
│ ├── slide_02.jpg
│ ├── carousel.pdf
│ └── metadata.txt
└── third_post_caption_here/
├── image.jpg
└── metadata.txt
```
### metadata.txt format
```
Post URL: https://www.instagram.com/p/XXXX/
Shortcode: XXXX
Type: carousel | single_image | reel
Slide count: 4 (carousel only)
Caption: [full caption text]
Username: @username
Fetched at: 2026-05-27T14:32:00Z
CDN URLs:
slide_01.jpg https://scontent.cdninstagram.com/v/...
slide_02.jpg https://scontent.cdninstagram.com/v/...
```
### Completion summary (printed to terminal)
```
Instagram Post Downloader — Batch Complete
==========================================
URLs processed: 3
Posts saved: 3
Total files: 11 (9 images + 2 PDFs)
Skipped: 0
Output dir: /Users/you/project/instagram-downloads/
Results:
✓ this_is_the_caption_first_40_chars/ 1 image
✓ carousel_caption_first_40_chars/ 4 slides → carousel.pdf
✓ third_post_caption_here/ 1 image
```
---
## How Claude Should Execute This Skill
### Step 1 — Collect and validate inputs
1. Accept the URL(s) from the user. If the user pastes a comma-separated list, split on commas. If they paste one per line, split on newlines.
2. Validate each URL matches `instagram.com/p/`, `instagram.com/reel/`, or `instagram.com/tv/`. Flag malformed URLs before proceeding.
3. Confirm the output directory. If none provided, use `./instagram-downloads/` and tell the user.
4. Ask about PDF stitching preference only if the user hasn't said either way. Default is yes.
### Step 2 — For each URL: fetch the post page
Fetch the Instagram post page HTML:
```
GET https://www.instagram.com/p/{shortcode}/?__a=1&__d=dis
```
Instagram frequently changes its API surface. Use this fallback chain in order:
**Attempt A — JSON endpoint:**
```
https://www.instagram.com/p/{shortcode}/?__a=1&__d=dis
```
Parse the JSON response. Look for `graphql.shortcode_media` or `data.shortcode_media`.
**Attempt B — Embed page (most reliable):**
```
https://www.instagram.com/p/{shortcode}/embed/captioned/
```
Fetch this page's HTML and extract `og:image` meta tags and any `window.__additionalDataLoaded` or `window.__StaticData` JSON blobs embedded in `<script>` tags.
**Attempt C — oEmbed endpoint:**
```
https://api.instagram.com/oembed/?url=https://www.instagram.com/p/{shortcode}/&omitscript=true
```
This returns `thumbnail_url` — useful for single images, but only gives the first frame for carousels.
**Headers to include on all requests:**
```
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
Accept-Language: en-US,en;q=0.9
Accept: text/html,application/xhtml+xml,application/json
```
### Step 3 — Extract CDN image URLs
From the fetched data, extract all high-resolution CDN URLs. Instagram CDN URLs follow these patterns:
```
https://scontent.cdninstagram.com/v/...jpg?...
https://scontent-lax3-1.cdninstagram.com/v/...jpg?...
https://instagram.fXXX1-1.fbcdn.net/v/...jpg?...
```
**For single image posts:**
- Extract the single `display_url` or the largest `display_resources` entry (pick the one with the highest `config_width`).
**For carousel posts:**
- Look for `edge_sidecar_to_children.edges[]` in the JSON. Each edge has its own `node.display_url` and `node.display_resources[]`.
- Iterate all edges in order. This determines slide numbering.
- Pick the highest-resolution variant from each slide's `display_resources` array.
**For Reels:**
- The cover image is extractable the same way as a single image.
- The video file itself requires a third-party tool (see Bonus section).
**If JSON extraction fails**, fall back to scraping `<meta property="og:image">` tags from the page HTML — this gives at least one image URL (the first slide or only image).
### Step 4 — Sanitise folder name
Build the folder name from the post caption:
1. Take the first 40 characters of the caption.
2. Strip all characters that are not alphanumeric, spaces, or hyphens.
3. Replace spaces and hyphens with underscores.
4. Lowercase the result.
5. Strip leading/trailing underscores.
6. If the result is empty (e.g. caption was all emoji), use the post shortcode instead.
```python
import re
def sanitise_folder_name(caption: str, shortcode: str) -> str:
truncated = caption[:40]
cleaned = re.sub(r'[^a-zA-Z0-9 \-]', '', truncated)
underscored = re.sub(r'[\s\-]+', '_', cleaned).strip('_').lower()
return underscored if underscored else shortcode
```
### Step 5 — Create output folder structure
```python
import os
base_dir = "./instagram-downloads"
folder_name = sanitise_folder_name(caption, shortcode)
post_dir = os.path.join(base_dir, folder_name)
os.makedirs(post_dir, exist_ok=True)
```
If a folder with that name already exists (e.g. running the same URL twice), append the shortcode to avoid collision: `folder_name_SHORTCODE`.
### Step 6 — Download each image file
For each CDN URL, download the file with a streaming GET request:
```python
import requests
def download_file(url: str, dest_path: str) -> bool:
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
"Referer": "https://www.instagram.com/",
}
response = requests.get(url, headers=headers, stream=True, timeout=30)
response.raise_for_status()
with open(dest_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return True
```
Name files:
- Single image: `image.jpg`
- Carousel slides: `slide_01.jpg`, `slide_02.jpg`, ... (zero-padded to 2 digits, or 3 digits if >99 slides)
Detect file format from the `Content-Type` header or URL extension. Instagram serves JPEG for photos and may serve WebP in some cases — preserve the actual extension.
### Step 7 — Stitch carousel PDF (if applicable)
After all slides are downloaded, stitch them into a single PDF using Pillow:
```python
from PIL import Image
def stitch_to_pdf(image_paths: list[str], output_path: str) -> None:
"""
Combine a list of image files into a single multi-page PDF.
Each image becomes one page. Page size matches the image dimensions.
"""
images = []
for path in sorted(image_paths): # sort ensures slide_01, slide_02, ... order
img = Image.open(path).convert("RGB")
images.append(img)
if not images:
return
first = images[0]
rest = images[1:]
first.save(
output_path,
format="PDF",
save_all=True,
append_images=rest,
resolution=150.0,
)
```
Save as `carousel.pdf` in the post folder. If Pillow is not installed, run `pip install Pillow` first — or instruct the user to do so.
**Dependency check at start of skill:**
```python
try:
from PIL import Image
except ImportError:
print("Pillow not installed. Run: pip install Pillow")
print("PDF stitching will be skipped. Individual slides will still be downloaded.")
skip_pdf = True
```
### Step 8 — Write metadata.txt
Write a `metadata.txt` file into the post folder with all extracted metadata:
```python
from datetime import datetime, timezone
def write_metadata(post_dir, post_url, shortcode, post_type, caption, username, cdn_urls):
lines = [
f"Post URL: {post_url}",
f"Shortcode: {shortcode}",
f"Type: {post_type}",
]
if post_type == "carousel":
lines.append(f"Slide count: {len(cdn_urls)}")
lines += [
f"Caption: {caption}",
f"Username: @{username}",
f"Fetched at: {datetime.now(timezone.utc).isoformat()}",
"CDN URLs:",
]
for filename, url in cdn_urls.items():
lines.append(f" {filename:<16} {url}")
with open(os.path.join(post_dir, "metadata.txt"), "w", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n")
```
### Step 9 — Print completion summary
After processing all URLs, print the summary table to the terminal (format shown in Output Structure section above). Include:
- Total URLs attempted
- Posts successfully saved
- Total files written (images + PDFs separately)
- Any URLs that were skipped and the reason
### Step 10 — Handle errors gracefully
| Error scenario | Action |
|---|---|
| URL is not an Instagram URL | Skip with message: "Skipped — not an Instagram URL: [url]" |
| Post is private or requires login | Skip with message: "Skipped — post is private or login required: [url]" |
| CDN fetch returns 403/404 | Try alternate CDN URL if available; if none, skip slide and note in metadata |
| Pillow not installed | Skip PDF stitching, save slides only, note in summary |
| Network timeout | Retry once after 5 seconds; if still failing, skip and log |
| Folder name collision | Append shortcode suffix to folder name |
| Rate limiting (429) | Wait 10 seconds and retry; log if retry also fails |
---
## Bonus — Downloading Instagram Reels (Video)
This skill covers images and carousel PDFs. For Reels video files, Claude Code cannot download video directly without a third-party tool, because Instagram's video CDN uses signed URLs and additional auth tokens.
**Recommended approach for Reels:**
Use `yt-dlp`, a maintained open-source tool:
```bash
# Install
pip install yt-dlp
# Download a Reel
yt-dlp "https://www.instagram.com/reel/XXXX/" -o "%(title)s.%(ext)s"
# Download to a specific folder
yt-dlp "https://www.instagram.com/reel/XXXX/" \
-o "./instagram-downloads/%(uploader)s_%(id)s.%(ext)s"
# Download best quality
yt-dlp -f "bestvideo+bestaudio" "https://www.instagram.com/reel/XXXX/"
```
Claude can run this command via Bash if the user asks. `yt-dlp` handles the auth token extraction automatically for public Reels.
---
## Full Script Template
Claude should offer to write this as a standalone script (`instagram_downloader.py`) that the user can run independently:
```python
#!/usr/bin/env python3
"""
Instagram Post Downloader
Fetches high-res images from public Instagram posts and carousels.
Requires: pip install requests Pillow
"""
import os
import re
import sys
import json
import time
import requests
from datetime import datetime, timezone
from pathlib import Path
try:
from PIL import Image
PILLOW_AVAILABLE = True
except ImportError:
PILLOW_AVAILABLE = False
print("Warning: Pillow not installed. PDF stitching disabled. Run: pip install Pillow")
HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.instagram.com/",
}
def extract_shortcode(url: str) -> str:
match = re.search(r"instagram\.com/(?:p|reel|tv)/([A-Za-z0-9_-]+)", url)
if not match:
raise ValueError(f"Cannot extract shortcode from URL: {url}")
return match.group(1)
def fetch_post_data(shortcode: str) -> dict:
"""Try multiple endpoints to get post JSON data."""
# Attempt A: JSON endpoint
try:
url = f"https://www.instagram.com/p/{shortcode}/?__a=1&__d=dis"
r = requests.get(url, headers=HEADERS, timeout=15)
if r.status_code == 200:
data = r.json()
media = (data.get("graphql", {}).get("shortcode_media") or
data.get("data", {}).get("shortcode_media"))
if media:
return media
except Exception:
pass
# Attempt B: Embed page
try:
url = f"https://www.instagram.com/p/{shortcode}/embed/captioned/"
r = requests.get(url, headers=HEADERS, timeout=15)
html = r.text
# Look for JSON blob in script tags
matches = re.findall(r'window\.__additionalDataLoaded\([^,]+,(\{.+?\})\);', html)
for blob in matches:
try:
data = json.loads(blob)
media = (data.get("graphql", {}).get("shortcode_media") or
data.get("data", {}).get("shortcode_media"))
if media:
return media
except json.JSONDecodeError:
continue
except Exception:
pass
return {}
def get_cdn_urls(media: dict) -> list[tuple[str, str]]:
"""Return list of (filename, cdn_url) tuples."""
results = []
media_type = media.get("__typename", "")
if media_type == "GraphSidecar":
edges = media.get("edge_sidecar_to_children", {}).get("edges", [])
for i, edge in enumerate(edges, start=1):
node = edge.get("node", {})
resources = node.get("display_resources", [])
url = (max(resources, key=lambda r: r.get("config_width", 0)).get("src")
if resources else node.get("display_url", ""))
if url:
ext = "jpg" if "jpg" in url.lower() else "webp"
filename = f"slide_{i:02d}.{ext}"
results.append((filename, url))
else:
resources = media.get("display_resources", [])
url = (max(resources, key=lambda r: r.get("config_width", 0)).get("src")
if resources else media.get("display_url", ""))
if url:
ext = "jpg" if "jpg" in url.lower() else "webp"
results.append((f"image.{ext}", url))
return results
def sanitise_folder_name(caption: str, shortcode: str) -> str:
truncated = caption[:40] if caption else ""
cleaned = re.sub(r"[^a-zA-Z0-9 \-]", "", truncated)
underscored = re.sub(r"[\s\-]+", "_", cleaned).strip("_").lower()
return underscored if underscored else shortcode
def download_file(url: str, dest_path: str) -> bool:
r = requests.get(url, headers=HEADERS, stream=True, timeout=30)
r.raise_for_status()
with open(dest_path, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return True
def stitch_pdf(image_paths: list[str], output_path: str) -> None:
if not PILLOW_AVAILABLE:
return
images = [Image.open(p).convert("RGB") for p in sorted(image_paths)]
if images:
images[0].save(output_path, format="PDF", save_all=True,
append_images=images[1:], resolution=150.0)
def process_url(post_url: str, base_dir: str, stitch_pdf_flag: bool) -> dict:
result = {"url": post_url, "status": "ok", "files": [], "error": None}
try:
shortcode = extract_shortcode(post_url)
media = fetch_post_data(shortcode)
caption = ""
username = ""
if media:
caption_edges = media.get("edge_media_to_caption", {}).get("edges", [])
caption = caption_edges[0]["node"]["text"] if caption_edges else ""
owner = media.get("owner", {})
username = owner.get("username", "")
folder_name = sanitise_folder_name(caption, shortcode)
post_dir = os.path.join(base_dir, folder_name)
if os.path.exists(post_dir):
post_dir = f"{post_dir}_{shortcode}"
os.makedirs(post_dir, exist_ok=True)
cdn_urls = get_cdn_urls(media) if media else []
if not cdn_urls:
# Fallback: oEmbed
oembed_url = f"https://api.instagram.com/oembed/?url={post_url}&omitscript=true"
r = requests.get(oembed_url, headers=HEADERS, timeout=10)
if r.status_code == 200:
thumb = r.json().get("thumbnail_url", "")
if thumb:
cdn_urls = [("image.jpg", thumb)]
username = r.json().get("author_name", "")
downloaded_paths = []
cdn_map = {}
for filename, url in cdn_urls:
dest = os.path.join(post_dir, filename)
download_file(url, dest)
downloaded_paths.append(dest)
cdn_map[filename] = url
result["files"].append(filename)
if stitch_pdf_flag and len(downloaded_paths) > 1 and PILLOW_AVAILABLE:
pdf_path = os.path.join(post_dir, "carousel.pdf")
stitch_pdf(downloaded_paths, pdf_path)
result["files"].append("carousel.pdf")
post_type = "carousel" if len(cdn_urls) > 1 else "single_image"
write_metadata(post_dir, post_url, shortcode, post_type, caption, username, cdn_map)
result["files"].append("metadata.txt")
except Exception as e:
result["status"] = "error"
result["error"] = str(e)
return result
def write_metadata(post_dir, post_url, shortcode, post_type, caption, username, cdn_map):
lines = [
f"Post URL: {post_url}",
f"Shortcode: {shortcode}",
f"Type: {post_type}",
]
if post_type == "carousel":
lines.append(f"Slide count: {len([k for k in cdn_map if 'slide' in k])}")
lines += [
f"Caption: {caption}",
f"Username: @{username}",
f"Fetched at: {datetime.now(timezone.utc).isoformat()}",
"CDN URLs:",
]
for fn, url in cdn_map.items():
lines.append(f" {fn:<18} {url}")
with open(os.path.join(post_dir, "metadata.txt"), "w", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n")
def main(urls: list[str], base_dir: str = "./instagram-downloads", stitch: bool = True):
os.makedirs(base_dir, exist_ok=True)
results = []
for url in urls:
url = url.strip()
if not url:
continue
print(f"Processing: {url}")
r = process_url(url, base_dir, stitch)
results.append(r)
time.sleep(1) # polite delay between requests
# Summary
ok = [r for r in results if r["status"] == "ok"]
err = [r for r in results if r["status"] == "error"]
total_files = sum(len(r["files"]) for r in ok)
print("\nInstagram Post Downloader — Batch Complete")
print("==========================================")
print(f"URLs processed: {len(results)}")
print(f"Posts saved: {len(ok)}")
print(f"Total files: {total_files}")
print(f"Errors: {len(err)}")
print(f"Output dir: {os.path.abspath(base_dir)}\n")
for r in results:
if r["status"] == "ok":
print(f" OK {r['url']}")
else:
print(f" ERR {r['url']}{r['error']}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python instagram_downloader.py <url1> [url2] ...")
sys.exit(1)
main(sys.argv[1:])
```
---
## Quality Checks
Before marking the task complete, verify each item:
- [ ] Domain allowlist confirmed — `*.cdninstagram.com` is added before any fetch attempts
- [ ] All provided URLs validated as Instagram URLs before processing begins
- [ ] CDN URLs are the highest-resolution variants available (largest `config_width` selected)
- [ ] Folder name is sanitised — no special characters, no spaces, max 40 chars from caption
- [ ] Folder collision handled — shortcode appended if folder already exists
- [ ] Carousel slides numbered sequentially with zero-padding (`slide_01`, `slide_02`, ...)
- [ ] PDF includes all slides in correct order (not alphabetical — by slide index)
- [ ] metadata.txt written to every post folder, including full CDN URLs
- [ ] Pillow dependency checked at startup — graceful fallback if not available
- [ ] Batch completion summary printed with file counts and any errors
- [ ] Private post errors caught and reported — not silently skipped
- [ ] Rate limiting handled — at least 1 second delay between requests
- [ ] No credential or cookie storage — skill operates on public posts only
---
## Example Trigger Phrases
- "Download this Instagram post for me: https://www.instagram.com/p/ABC123/"
- "Save that carousel to my downloads folder"
- "Can you grab all the slides from this Instagram post and make a PDF?"
- "Download these 5 Instagram posts" [followed by list of URLs]
- "Archive this IG post before it gets deleted"
- "I need the full-res images from this carousel"
- "Download the images from this Instagram URL and stitch them into a PDF"
- "Batch download these Instagram posts" [followed by URLs]
- "Save the slides from this Instagram carousel as individual JPEGs"
- "Get me the high-res version of this Instagram image"
---
## Notes on Instagram's Anti-Scraping Measures
Instagram actively changes its page structure and API endpoints. If all three fetch attempts fail:
1. The embed page method (`/embed/captioned/`) is historically the most stable — start there.
2. CDN URLs expire. Download immediately after fetching — do not store URLs and download later.
3. Instagram may return a login wall for some posts even if they're technically public. If this happens, the skill cannot proceed without authentication (which is out of scope).
4. If Instagram returns a 429, wait 1030 seconds before retrying. Reduce batch size for large lists.
This skill is designed for public posts only. It does not support login, sessions, or private content.
---
*Originally inspired by a skill from Frank and Diana Dovgopol (Write, Prompt, Scale) — adapted and extended for this library.*
+150
View File
@@ -0,0 +1,150 @@
---
name: last-30-days-research
description: Multi-platform research skill that gathers recent (last 30 days) opinions, sentiment, and signal on any topic from Reddit, X/Twitter, and the web. Cuts through SEO-stuffed results to surface what real people are actually saying.
---
# Last 30 Days Research
## The Problem
Googling gives SEO-stuffed "best of" lists written six months ago by someone who has never used the thing. Real honest takes live on Reddit threads, X replies, and niche communities — but chasing them across platforms eats your afternoon. This skill does the chase for you.
## Required Inputs
| Input | Required | Notes |
|-------|----------|-------|
| Topic | Yes | Tool, trend, feature, product, event, company — anything with a name |
| Date scope | No | Defaults to last 30 days. Can override to last 7 days or last 90 days |
| Angle | No | e.g. "focus on developer sentiment" or "looking for pricing complaints specifically" |
## Output Structure
The output is a structured research report with the following sections, delivered in this exact order:
```
## Last 30 Days Research: [Topic]
Research window: [Date 30 days ago] → [Today's date]
---
## What People Agree On
[Consensus points that appear across multiple platforms — most reliable signal]
## Where People Disagree
[Active debates, contrasting views — include which side has more weight]
## Pain Points That Keep Coming Up
[Recurring complaints and frustrations — strongest signal of real problems]
## Positive Signals
[What people genuinely praise — not PR, but unprompted appreciation]
## Most Interesting Takes
[Contrarian, unexpected, or surprisingly insightful comments worth noting]
## Sources
[Links to the most useful threads/posts found — 510 links with brief labels]
## Signal Confidence
[High / Medium / Low — with a one-line rationale based on data volume and consistency]
```
Each section should contain substantive content, not placeholders. If a section has no findings (e.g. no positive signals found), state that explicitly rather than leaving it empty or fabricating content.
## Instructions for Claude
### Step 1 — Calculate the date window
Determine today's date and subtract 30 days to get the research start date. Format: YYYY-MM-DD. Use these dates explicitly in every search query.
### Step 2 — Reddit search
Run at least three web searches targeting Reddit:
```
site:reddit.com "[topic]" after:[30-days-ago-date]
site:reddit.com "[topic]" 2025
reddit.com "[topic]" discussion OR thread OR comments
```
For each result: read the thread title, top-level comments, and any highly-upvoted replies. Record the key claims and the URL.
If the topic has common synonyms or abbreviations, run additional searches with those (e.g. "Claude Code" and "claude.code" and "Anthropic coding tool").
### Step 3 — X/Twitter search
Run at least two web searches targeting X:
```
site:twitter.com OR site:x.com "[topic]" after:[30-days-ago-date]
"[topic]" site:x.com -is:retweet
```
Note: X search via web has limitations. If results are sparse, supplement with searches for specific accounts known to discuss the topic area (e.g. tech journalists, domain experts).
### Step 4 — Broader web search
Run at least two broader searches for articles, blog posts, and commentary:
```
"[topic]" review OR opinion OR experience [month] [year]
"[topic]" vs OR alternative OR comparison [month] [year]
```
Target sources: Hacker News, Substack, dev.to, personal blogs, product communities. Avoid press releases and vendor-authored content.
### Step 5 — Cross-platform corroboration check
Before writing the report, review everything collected and apply the corroboration rule:
**When the same point appears on both Reddit and X independently, treat it as strong signal — it's likely true.**
A point mentioned only once on one platform is a data point, not a finding. Weight your sections accordingly.
### Step 6 — Write the report
Populate each section of the output structure. Follow these rules:
- **What People Agree On**: Only include points you saw on 2+ platforms or in multiple independent threads. These are your most reliable findings.
- **Where People Disagree**: Name the sides. "Some say X, others say Y — and the X camp seems louder based on upvote counts / engagement."
- **Pain Points**: Be specific. "Performance issues" is weak. "Cold start times over 4 seconds on the free tier" is useful.
- **Positive Signals**: Must be unprompted praise, not from product marketing or sponsored content.
- **Most Interesting Takes**: At least 2, maximum 5. Quote or closely paraphrase where possible.
- **Sources**: Include the actual URLs. Label each one briefly (e.g. "Reddit thread: 'Has anyone switched from X to Y?'").
- **Signal Confidence**: Rate High/Medium/Low based on:
- High = 10+ sources, consistent signal across platforms
- Medium = 510 sources, some inconsistency
- Low = fewer than 5 sources, or highly fragmented signal
### Step 7 — Sanity check before delivering
Before outputting the report, verify:
- [ ] Every claim in the report traces to an actual source found during research (not prior knowledge)
- [ ] The date window was actually applied to searches, not ignored
- [ ] No fabricated or hallucinated URLs in the Sources section
- [ ] Signal Confidence rating reflects the actual data volume, not optimism
## Quality Checks
- [ ] At minimum 3 Reddit searches were run with the date filter applied
- [ ] At minimum 2 X/Twitter searches were run
- [ ] At minimum 2 broader web searches were run
- [ ] Cross-platform corroboration principle was applied (same point on multiple platforms = stronger signal)
- [ ] Pain Points section contains specific, concrete details — not vague generalisations
- [ ] Sources section contains real URLs (not hallucinated), verified during research
- [ ] Signal Confidence is rated and justified
- [ ] If a section has no findings, it says so explicitly rather than being omitted or padded
- [ ] No vendor-authored content or press releases treated as independent signal
- [ ] Synonyms and alternative names for the topic were searched
## Example Trigger Phrases
- "What are people saying about Cursor AI from the last 30 days?"
- "Research Vercel's recent sentiment"
- "Last 30 days on the Arc browser shutdown"
- "What's the current vibe on Supabase?"
- "What are developers saying about Claude Code lately?"
- "Research [topic] from the last 30 days"
- "Give me a signal report on [product]"
- "What's the Reddit and Twitter take on [trend]?"
+166
View File
@@ -0,0 +1,166 @@
---
name: notes-humanizer
description: Strips AI writing patterns from text and rewrites it to sound genuinely human — not by softening it, but by removing statistical defaults and injecting the specific signals that human writers produce.
---
# Notes Humanizer
"Humanize this" prompts don't work because they don't know what to remove. AI text has specific, identifiable defaults — em dashes used as parenthetical substitutes, rule-of-three lists where all items have identical rhythm, sentences that hover between 15 and 20 words. Fix those defaults, add the signals human writers actually produce, and the text stops reading as synthetic. This skill does that systematically, in two phases, and shows you exactly what changed and why.
> Credit: Originally created by Orel (TheIndiepreneur) — adapted and extended for this library.
---
## Required Inputs
| Input | Format | Notes |
|---|---|---|
| Text to humanize | Paste directly into the chat | Any length. Works on paragraphs, full articles, social posts, emails. |
No other inputs required. Claude will not ask clarifying questions before starting — it works with what's given.
---
## Output Structure
### Section 1: What Was Found
A plain-language audit of the AI patterns detected in the original text, before any rewriting:
```
PATTERNS DETECTED
─────────────────
Em dashes used as parenthetical substitutes: 3
Filler openers ("Let's dive in", "It's worth noting", etc.): 2
Rule-of-three lists with identical rhythm: 1
Sentence length variance: low (avg 17 words, range 1421)
Hedging qualifiers: 4
Passive constructions where active is cleaner: 2
```
### Section 2: Side-by-Side Comparison
| Original | Rewritten |
|---|---|
| [original paragraph] | [rewritten paragraph] |
(One row per paragraph or logical block. Short texts get the full comparison in one table. Long texts get the table collapsed to changed sections only, with unchanged sections noted.)
### Section 3: Change Log
Every specific change made, with the reason:
```
CHANGES MADE
────────────────────────────────────────────────
1. Removed em dash in "success — and it shows"
→ Rewritten as "success (and it shows)"
Why: em dash here is a parenthetical substitute, not a genuine pause
2. Deleted "It's worth noting that"
Why: pure filler — the sentence is stronger without it
3. Broke rule-of-three list "X, Y, and Z"
→ "X and Y. Z is different — [expanded thought]"
Why: all three items had identical rhythm; broke the pattern
4. Added short sentence: "That's the problem."
Why: needed a sub-8-word sentence to vary rhythm
5. Added sentence starting with "But"
Why: human writers do this; AI avoids it as a statistical default
6. Added specific example: [detail added]
Why: the original made an abstract claim with no grounding detail
7. Added aside: "(I've watched this fail three times in a row)"
Why: breaks fourth wall slightly; signals genuine perspective
```
### Section 4: Clean Output
The full rewritten text, ready to copy and paste — no annotations, no formatting artifacts.
```
[Full rewritten text here]
```
---
## Instructions for Claude
### Phase 1: Audit
Read the full text before making any changes. Identify and count every instance of these patterns:
**Patterns to remove or rewrite:**
| Pattern | Action |
|---|---|
| Em dash used as parenthetical substitute (`word — word` where a comma or parenthesis would work) | Replace with parentheses or rewrite the clause |
| "Let's dive in" | Delete or replace with a direct first sentence |
| "In conclusion" | Delete or rewrite as a genuine closing thought |
| "It's worth noting that" | Delete — the sentence stands without it |
| "At its core" | Delete or rewrite |
| "Game-changer" | Replace with what the thing actually changes |
| "Delve" | Replace with look, dig, explore — or rewrite the sentence |
| "Navigate" used metaphorically for non-navigation tasks | Replace with a direct verb |
| Rule-of-three lists where all three items have identical grammatical structure and similar word count | Break the third item out as its own sentence or expand it |
| Sentences where every sentence in a paragraph falls in the 1422 word range | Deliberately add one very short sentence and one longer one |
| "Needless to say" | Delete |
| "It's important to note that" | Delete |
| Passive constructions where the active form is more direct | Flip to active |
Do not remove every em dash — only the ones used as parenthetical substitutes. Do not remove all hedging — only empty hedging that adds no information.
### Phase 2: Inject
After stripping patterns, add the following signals. Each one should emerge from the actual content — don't add generic filler:
1. **One genuine opinion or take.** The author appears to actually believe something specific. State it without hedging. ("This approach works, and I think most people underestimate how rarely the alternative does.")
2. **One specific detail, example, or number.** Ground the most abstract claim in the text with something concrete. If the text says "this happens frequently," add a real or illustrative number. If it says "many companies do this," name the type of company.
3. **One aside or parenthetical thought that breaks the fourth wall slightly.** This is the signal most synthetic text lacks — the writer momentarily steps out of the formal argument to say something human. ("(I've seen this specific mistake made by people who absolutely should have known better.)")
4. **At least one sentence under 8 words.** Make it land on a point, not a transition.
5. **One sentence that starts with "And" or "But."** Place it where the rhythm earns it, not randomly.
### Phase 3: Report
Present the output in the four-section structure defined above. The change log must list every individual change — not categories of change, but specific instances. If you changed three em dashes, list all three separately.
### Handling edge cases
- **If the text is already mostly clean:** Report what you found (or didn't find), make the few remaining changes, and note explicitly that the original was close. Don't invent problems.
- **If the text is very short (under 100 words):** Skip the comparison table. Show original, then rewritten, then change log.
- **If the text is over 1,500 words:** Process the full text but collapse the comparison table to changed sections only.
---
## Quality Checks
- [ ] Audit was completed before rewriting (patterns counted, not just detected)
- [ ] Every removed pattern is listed in the change log with a specific reason
- [ ] Em dashes were assessed individually — only parenthetical-substitute uses were removed
- [ ] Rule-of-three lists: the rhythm was actually checked, not just the fact that there were three items
- [ ] At least one sentence under 8 words was added (or was already present)
- [ ] At least one sentence starts with "And" or "But" in the final text
- [ ] The specific detail or example added connects to an actual claim in the text, not floated in generically
- [ ] The aside breaks the fourth wall slightly without being forced or cutesy
- [ ] The change log lists specific instances, not categories
- [ ] The clean output section has no annotations or formatting artifacts — ready to paste
- [ ] If the original was already clean, that was stated explicitly rather than changes invented
---
## Example Trigger Phrases
- "Humanize this text: [paste]"
- "Use the notes-humanizer skill on this draft"
- "This reads like ChatGPT wrote it — fix it: [paste]"
- "Strip the AI out of this and make it sound like a real person wrote it"
- "Run the humanizer on this LinkedIn post: [paste]"
- "This has too many em dashes and rule-of-three lists — clean it up: [paste]"
- "Make this email sound less robotic: [paste]"
+176
View File
@@ -0,0 +1,176 @@
---
name: substack-notes-scraper
description: Scrapes a Substack Notes page and exports engagement data (likes, comments, restacks) to a formatted .xlsx file with conditional formatting and summary stats.
---
# Substack Notes Scraper
Substack has no public API for Notes analytics. You can't see likes, comments, and restacks in one place without scrolling through your feed manually. This skill scrapes the rendered Notes page, filters to only your original content, and exports everything to a spreadsheet you can actually analyze.
> Credit: Originally created by a Substack newsletter author — adapted and extended for this library.
---
## Required Inputs
| Input | Format | Example |
|---|---|---|
| Notes URL | Full URL to the Notes tab | `https://substack.com/@handle/notes` |
| Author handle or name | Exact handle or display name | `@handle` or `Jane Smith` |
| Date range | Plain English or explicit range | `last 30 days` or `Jan 2026 Mar 2026` |
Claude will ask for these if not provided upfront.
---
## Output Structure
### File
```
substack-notes-[handle]-[YYYY-MM-DD].xlsx
```
### Sheet: "Notes Data"
| Column | Description |
|---|---|
| Date | Publication date (YYYY-MM-DD) |
| Text Preview | First 200 characters of the note |
| Full Text | Complete note text |
| Likes | Like count at time of scrape |
| Comments | Comment count |
| Restacks | Restack count |
| Total Engagement | Likes + Comments + Restacks |
| Link | Direct URL to the note |
| Note Type | `original` or `restack` |
**Formatting applied:**
- Row 1: frozen header row
- Auto-filter enabled on all columns
- Top 20% by Likes column: highlighted yellow (`#FFF2CC`)
- Column widths: auto-fit to content, min 12, max 60
### Sheet: "Summary"
```
Scrape Date: [YYYY-MM-DD HH:MM UTC]
Author: [handle]
Date Range: [start] [end]
Total Notes: [n]
Original Notes: [n]
Restacks Filtered: [n]
Avg Likes: [n.n]
Avg Comments: [n.n]
Avg Restacks: [n.n]
Avg Total Eng: [n.n]
Best Note (Likes): [date] — [first 80 chars] — [n] likes
Best Note (Eng): [date] — [first 80 chars] — [n] total engagement
```
---
## Instructions for Claude
### Step 1: Validate inputs
Confirm the three required inputs are present. If any are missing, ask before proceeding. Parse the date range into a concrete start date and end date (convert relative ranges like "last 30 days" to explicit dates using today's date).
### Step 2: Fetch the Notes page
Use `WebFetch` to load the Notes URL. Substack Notes pages are JavaScript-rendered — request the full rendered HTML. If WebFetch returns a skeleton page without note content, note this in your response and ask the user to paste the page HTML manually or confirm browser access is available.
### Step 3: Paginate through all notes in the date window
Substack Notes load incrementally. Repeat fetching or scrolling until either:
- A note's date falls outside the target date range (stop loading more), or
- No new content loads on the next request.
Rate-limit: wait 2 seconds between each paginated request. Do not hammer the endpoint.
### Step 4: Parse each note
For every note element found on the page, extract:
- **Date**: the timestamp on the note (convert to YYYY-MM-DD)
- **Author**: the display name or handle shown on the note
- **Full text**: complete body text, stripping HTML tags
- **Text preview**: first 200 characters of full text
- **Likes count**: the number shown on the like/heart counter
- **Comments count**: the number shown on the comment counter
- **Restacks count**: the number shown on the restack counter
- **Link**: the direct permalink to the note
- **Note type**: `original` if the author matches the specified author; `restack` if it belongs to someone else
### Step 5: Filter
Keep ALL rows in the data (restacks included as rows with `Note Type = restack`). The Summary sheet stats should count only `original` notes. Mark restacks clearly so the user can filter them out themselves in Excel if preferred.
Apply date filter: exclude any note outside the specified date range.
### Step 6: Calculate Total Engagement
For each row: `Total Engagement = Likes + Comments + Restacks`
### Step 7: Identify top 20% by Likes
Sort original notes by Likes descending. Mark the top 20% (round up) for conditional formatting. These rows will be highlighted yellow in the output file.
### Step 8: Build the .xlsx file
Use Python with `openpyxl` to generate the file. Structure:
```python
# Required libraries
import openpyxl
from openpyxl.styles import PatternFill, Font, Alignment
from openpyxl.utils import get_column_letter
from datetime import datetime
# Sheet 1: Notes Data
# - Write header row, bold, freeze row 1
# - Write all data rows
# - Apply auto-filter: ws.auto_filter.ref = ws.dimensions
# - Apply yellow fill to top-20% rows by likes
# - Auto-size columns (iterate cells to find max length)
# Sheet 2: Summary
# - Write summary stats as key-value pairs, no table format
```
Name the file `substack-notes-[handle]-[YYYY-MM-DD].xlsx` using today's date.
### Step 9: Report back
After generating the file, report:
- File path
- Total notes found, original vs. restacks
- Date range actually covered
- Top 3 notes by total engagement (date + preview + stats)
- Any notes or warnings (e.g., page didn't fully load, some dates were ambiguous)
---
## Quality Checks
- [ ] All three required inputs were confirmed before starting
- [ ] Rate limiting honored: 2-second delay between paginated requests
- [ ] Author filter applied correctly — restacks are included as rows but flagged, not silently dropped
- [ ] Date range filter applied — no notes outside the window appear in the data
- [ ] Total Engagement column is Likes + Comments + Restacks (not hardcoded)
- [ ] Top 20% highlight is based on the actual data distribution, not a fixed threshold
- [ ] Header row is frozen and auto-filter is active
- [ ] Summary sheet stats reference only `original` notes, not restacks
- [ ] File is named with the author handle and today's date
- [ ] If the page failed to load properly, the user was told — not silently given an empty file
---
## Example Trigger Phrases
- "Scrape my Substack Notes and export to Excel — my handle is @handle, last 60 days"
- "Use the substack-notes-scraper skill on https://substack.com/@handle/notes for Q1 2026"
- "Pull my notes engagement data into a spreadsheet"
- "Export my Substack Notes stats with likes and restacks — author: Jane Smith, JanMar 2026"
- "Run the Substack scraper on my notes page and show me which posts performed best"