Note
Newsletter System Spec
Newsletter System Spec
Internal planning document. This folder is excluded from notes.iri-ai.com auto-sync.
Purpose
A private, daily newsletter for a modern AI engineer — someone who builds, trains, deploys, or researches AI systems and needs to stay at the frontier without drowning in noise. Coverage spans AI research, AI industry, world news (AI-relevant), Korean tech, GitHub tooling, Reddit practitioner discussions, and curated social commentary from key AI voices.
Runs on a schedule: 5am ET (early brief) and 8am ET (full edition) daily.
Architecture
gather.py
|-- RSS feeds (world, tech, AI, Korean)
|-- arXiv API (cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph)
|-- GitHub trending scraper
|-- Reddit API (r/MachineLearning, r/LocalLLaMA, r/artificial, etc.)
|-- DuckDuckGo News search (DDG news)
|-- Semantic Scholar API (citation enrichment for arXiv papers)
|-- Social scraper (Karpathy, LeCun, Altman, Musk, Hotz)
|
v
Gathered JSON (all items tagged with date, source, body, citation fields)
|
v
editorial.py <-- Claude Opus 4.6 via NVIDIA Bedrock (aws/anthropic/bedrock-claude-opus-4-6)
|-- analyze_world_news()
|-- analyze_tech_news()
|-- analyze_ai_news()
|-- analyze_ai_papers() <-- uses Semantic Scholar citation_count / influential_citations
|-- analyze_github_tools()
|-- analyze_space_quantum()
|-- analyze_physical_ai()
|-- analyze_reddit()
|-- analyze_social()
|-- analyze_korean()
|
v
Editorial dict (section -> LLM-analyzed markdown bullets)
|
v
spa.py ──────────────────────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.html
| (HTML single-page app, right-side TOC panel)
v
md_writer.py (or inline) ─────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.md
(Full Obsidian markdown, all sections)
All output files land in the Hermes working directory and are also written to the Obsidian vault at /Newsletter/.
Data Sources
RSS Feeds
| Category | Feeds |
|---|---|
| World News | BBC World Service, CNN, Reuters |
| Tech / AI | TechCrunch, VentureBeat |
| Korean | Korea Herald, Yonhap News Agency, JoongAng Daily |
RSS items are filtered to same-day only. Any item older than today's date (ET) is discarded before editorial analysis.
arXiv
- Categories polled: cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph, eess.SY
- Filtered to papers submitted or updated within the last 48 hours
- Each paper is enriched via Semantic Scholar API with
citation_countandinfluential_citationsbefore being passed to editorial
GitHub Trending
- Scraped from github.com/trending (daily and weekly)
- Filtered for AI/ML/systems-relevant repos by keyword and topic tags
- Subreddits: r/MachineLearning, r/LocalLLaMA, r/artificial, r/singularity, r/nvidia, r/comfyui
- Sorted by top/hot, filtered for substantive technical threads (score threshold)
DuckDuckGo News
- Targeted queries: "AI", "LLM", "GPU compute", "AI regulation", Korean AI companies
- Used to supplement gaps in RSS coverage on breaking stories
Semantic Scholar
- Called for each arXiv paper to retrieve
citation_countandinfluential_citations - Used in
analyze_ai_papers()to gate inclusion of low-signal papers
Social (Persona Tracking)
Tracked personas (public posts/statements):
- Andrej Karpathy (@karpathy)
- Yann LeCun (@ylecun)
- Sam Altman (@sama)
- Elon Musk (@elonmusk)
- George Hotz (@realGeorgeHotz)
Hard Requirements
Data Freshness
- ALL news items must be same-day only (today's date in ET timezone)
- Items from yesterday or earlier are silently dropped before LLM analysis
- Date (YYYY-MM-DD) must appear on every bullet in the output
Formatting (enforced in PERSONA system prompt)
- Every insight is a bullet starting with
-(dash space). No numbered lists, no prose paragraphs - Bold key terms with
**double asterisks** - First use of any technical term, acronym, or niche concept must include an inline parenthetical definition
- Each bullet must be self-contained — a reader seeing only that bullet understands what happened and why it matters
- NEVER use ellipsis (
...) or (…) anywhere. If a quote or summary must be shortened, end at a complete sentence or clause. No trailing dots. - Every item referencing a paper, repo, or event must include the date:
(YYYY-MM-DD) - End every section with
**Recommended action:** <specific, concrete step>
Paper Quality Gate
- Only high-impact papers are included in the AI Papers section
- Inclusion criteria: citation_count >= 5, OR influential_citations >= 1, OR genuine architectural novelty for 0-citation papers in last 48h
- Citation count must be stated in the bullet: e.g.
(47 citations as of today) - Incremental ablations, sub-5% benchmark improvements, and domain-application papers without architectural novelty are explicitly excluded
UI: Right-Side TOC Panel
- The HTML SPA output must include a fixed right-side Table of Contents panel
- TOC links jump to each section anchor
- TOC highlights the currently visible section on scroll
- Panel collapses on mobile
Editorial Model
- Model:
aws/anthropic/bedrock-claude-opus-4-6 - Provider: NVIDIA Inference API (
https://inference-api.nvidia.com/v1/chat/completions) - Auth:
NVIDIA_API_KEYenvironment variable (also acceptsOPENAI_API_KEYfallback) - Temperature: 0.7
- System prompt:
PERSONAstring innewsletter_editorial.py
Korean Coverage
Dedicated section (analyze_korean()) covering:
- Companies: Samsung (semiconductor + AI), Kakao (AI services), Naver (HyperCLOVA, search AI), SK Telecom / SK Hynix (HBM, AI infra)
- Government: MSIT (Ministry of Science and ICT) AI policy, NRF grants, K-AI national strategy, AI safety regulation proposals
- Startups: Korean AI startups gaining traction (funding rounds, product launches, international expansion)
- Competitive positioning: honest assessment of whether Korean AI development is genuinely differentiating vs. following the global playbook 12-18 months behind
Sources: Korea Herald RSS, Yonhap RSS, JoongAng RSS, DDG Korean-language queries
Deployment
Runs via Hermes cronjobs on the local machine:
| Time (ET) | Job | Description |
|---|---|---|
| 5:00 AM | Newsletter early run | gather.py → editorial.py → spa.py + md output |
| 8:00 AM | Newsletter full run | Same pipeline, catches any late-breaking items |
Cron entries managed by Hermes scheduler. Output files written to:
- HTML:
/Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.html - MD:
/Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.md
Output Format
HTML SPA (spa.py)
- Single self-contained HTML file (no external CDN dependencies in critical path)
- Dark/light theme toggle
- Fixed right-side TOC panel with section jump links and scroll-spy highlighting
- Sections: World News | Tech | AI News | AI Papers | GitHub | Space & Quantum | Physical AI | Reddit | Social Voices | Korean Tech
- Responsive: TOC collapses to hamburger on narrow screens
- Inline CSS + minimal vanilla JS only (no React, no build step)
Obsidian Markdown
- Full markdown version of the same content
- Compatible with Obsidian's markdown renderer
- Frontmatter includes:
title,date,tags: [newsletter, daily],sync_exclude: true - All sections use
##headers for Obsidian outline nav - Internal links to any referenced concept pages where they exist in the vault
Sync Exclusion
The /Newsletter/ folder in the Obsidian vault is excluded from the notes.iri-ai.com auto-sync.
Reason: newsletter files are large, auto-generated daily, and not intended for the shared/published notes site. They are local-only documents.
Exclusion is enforced via the sync configuration — the Newsletter folder does not appear in the sync manifest.
File Locations (Scripts)
| Script | Path |
|---|---|
| Data gathering | /Users/hkder/.hermes/scripts/newsletter_gather.py |
| Editorial analysis | /Users/hkder/.hermes/scripts/newsletter_editorial.py |
| HTML SPA renderer | /Users/hkder/.hermes/scripts/newsletter_spa.py |
| Markdown writer | /Users/hkder/.hermes/scripts/newsletter_md.py (or inline in gather/spa) |
| Orchestrator | /Users/hkder/.hermes/scripts/newsletter_run.py (or run_newsletter.sh) |
Improvement Log
2026-03-24
- Spec document created: This file written to Obsidian vault as system planning document.
- Ellipsis ban enforced: Added explicit rule to PERSONA system prompt in
newsletter_editorial.py—NEVER use ellipsis (...) or (…) anywhere. - Semantic Scholar citation gating: Updated
analyze_ai_papers()prompt innewsletter_editorial.pyto usecitation_countandinfluential_citationsfields for inclusion decisions. Papers with citation_count >= 5 or influential_citations >= 1 are prioritized; 0-citation papers require genuine architectural novelty. Citation count now stated in every paper bullet. - Architecture documented: Full gather → editorial → SPA + MD pipeline documented with data sources, deployment schedule, and hard requirements.
Open TODOs
- Confirm 5am/8am ET cron entries are active in Hermes scheduler
- Verify Semantic Scholar API calls are wired into gather.py and passing citation fields to editorial.py
- Add health check: if gather.py produces 0 same-day items, alert before running editorial
- Consider adding Hacker News (top 10 stories) as a supplementary source
- Add Korean-language DDG queries for deeper local coverage
- Test SPA TOC scroll-spy on mobile viewport