Note

Newsletter System Spec

◇ obsidian_vault Newsletter March 24, 2026 newslettersystemspecinternal

Newsletter System Spec

Internal planning document. This folder is excluded from notes.iri-ai.com auto-sync.

Purpose

A private, daily newsletter for a modern AI engineer — someone who builds, trains, deploys, or researches AI systems and needs to stay at the frontier without drowning in noise. Coverage spans AI research, AI industry, world news (AI-relevant), Korean tech, GitHub tooling, Reddit practitioner discussions, and curated social commentary from key AI voices.

Runs on a schedule: 5am ET (early brief) and 8am ET (full edition) daily.

Architecture

gather.py
    |-- RSS feeds (world, tech, AI, Korean)
    |-- arXiv API (cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph)
    |-- GitHub trending scraper
    |-- Reddit API (r/MachineLearning, r/LocalLLaMA, r/artificial, etc.)
    |-- DuckDuckGo News search (DDG news)
    |-- Semantic Scholar API (citation enrichment for arXiv papers)
    |-- Social scraper (Karpathy, LeCun, Altman, Musk, Hotz)
         |
         v
    Gathered JSON (all items tagged with date, source, body, citation fields)
         |
         v
editorial.py   <-- Claude Opus 4.6 via NVIDIA Bedrock (aws/anthropic/bedrock-claude-opus-4-6)
    |-- analyze_world_news()
    |-- analyze_tech_news()
    |-- analyze_ai_news()
    |-- analyze_ai_papers()       <-- uses Semantic Scholar citation_count / influential_citations
    |-- analyze_github_tools()
    |-- analyze_space_quantum()
    |-- analyze_physical_ai()
    |-- analyze_reddit()
    |-- analyze_social()
    |-- analyze_korean()
         |
         v
    Editorial dict (section -> LLM-analyzed markdown bullets)
         |
         v
    spa.py  ──────────────────────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.html
         |                                           (HTML single-page app, right-side TOC panel)
         v
    md_writer.py (or inline) ─────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.md
                                                     (Full Obsidian markdown, all sections)

All output files land in the Hermes working directory and are also written to the Obsidian vault at /Newsletter/.

Data Sources

RSS Feeds

Category	Feeds
World News	BBC World Service, CNN, Reuters
Tech / AI	TechCrunch, VentureBeat
Korean	Korea Herald, Yonhap News Agency, JoongAng Daily

RSS items are filtered to same-day only. Any item older than today's date (ET) is discarded before editorial analysis.

arXiv

Categories polled: cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph, eess.SY
Filtered to papers submitted or updated within the last 48 hours
Each paper is enriched via Semantic Scholar API with citation_count and influential_citations before being passed to editorial

GitHub Trending

Scraped from github.com/trending (daily and weekly)
Filtered for AI/ML/systems-relevant repos by keyword and topic tags

Subreddits: r/MachineLearning, r/LocalLLaMA, r/artificial, r/singularity, r/nvidia, r/comfyui
Sorted by top/hot, filtered for substantive technical threads (score threshold)

DuckDuckGo News

Targeted queries: "AI", "LLM", "GPU compute", "AI regulation", Korean AI companies
Used to supplement gaps in RSS coverage on breaking stories

Semantic Scholar

Called for each arXiv paper to retrieve citation_count and influential_citations
Used in analyze_ai_papers() to gate inclusion of low-signal papers

Social (Persona Tracking)

Tracked personas (public posts/statements):

Andrej Karpathy (@karpathy)
Yann LeCun (@ylecun)
Sam Altman (@sama)
Elon Musk (@elonmusk)
George Hotz (@realGeorgeHotz)

Hard Requirements

Data Freshness

ALL news items must be same-day only (today's date in ET timezone)
Items from yesterday or earlier are silently dropped before LLM analysis
Date (YYYY-MM-DD) must appear on every bullet in the output

Formatting (enforced in PERSONA system prompt)

Every insight is a bullet starting with - (dash space). No numbered lists, no prose paragraphs
Bold key terms with **double asterisks**
First use of any technical term, acronym, or niche concept must include an inline parenthetical definition
Each bullet must be self-contained — a reader seeing only that bullet understands what happened and why it matters
NEVER use ellipsis (...) or (…) anywhere. If a quote or summary must be shortened, end at a complete sentence or clause. No trailing dots.
Every item referencing a paper, repo, or event must include the date: (YYYY-MM-DD)
End every section with **Recommended action:** <specific, concrete step>

Paper Quality Gate

Only high-impact papers are included in the AI Papers section
Inclusion criteria: citation_count >= 5, OR influential_citations >= 1, OR genuine architectural novelty for 0-citation papers in last 48h
Citation count must be stated in the bullet: e.g. (47 citations as of today)
Incremental ablations, sub-5% benchmark improvements, and domain-application papers without architectural novelty are explicitly excluded

UI: Right-Side TOC Panel

The HTML SPA output must include a fixed right-side Table of Contents panel
TOC links jump to each section anchor
TOC highlights the currently visible section on scroll
Panel collapses on mobile

Editorial Model

Model: aws/anthropic/bedrock-claude-opus-4-6
Provider: NVIDIA Inference API (https://inference-api.nvidia.com/v1/chat/completions)
Auth: NVIDIA_API_KEY environment variable (also accepts OPENAI_API_KEY fallback)
Temperature: 0.7
System prompt: PERSONA string in newsletter_editorial.py

Korean Coverage

Dedicated section (analyze_korean()) covering:

Companies: Samsung (semiconductor + AI), Kakao (AI services), Naver (HyperCLOVA, search AI), SK Telecom / SK Hynix (HBM, AI infra)
Government: MSIT (Ministry of Science and ICT) AI policy, NRF grants, K-AI national strategy, AI safety regulation proposals
Startups: Korean AI startups gaining traction (funding rounds, product launches, international expansion)
Competitive positioning: honest assessment of whether Korean AI development is genuinely differentiating vs. following the global playbook 12-18 months behind

Sources: Korea Herald RSS, Yonhap RSS, JoongAng RSS, DDG Korean-language queries

Deployment

Runs via Hermes cronjobs on the local machine:

Time (ET)	Job	Description
5:00 AM	Newsletter early run	gather.py → editorial.py → spa.py + md output
8:00 AM	Newsletter full run	Same pipeline, catches any late-breaking items

Cron entries managed by Hermes scheduler. Output files written to:

HTML: /Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.html
MD: /Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.md

Output Format

HTML SPA (`spa.py`)

Single self-contained HTML file (no external CDN dependencies in critical path)
Dark/light theme toggle
Fixed right-side TOC panel with section jump links and scroll-spy highlighting
Sections: World News | Tech | AI News | AI Papers | GitHub | Space & Quantum | Physical AI | Reddit | Social Voices | Korean Tech
Responsive: TOC collapses to hamburger on narrow screens
Inline CSS + minimal vanilla JS only (no React, no build step)

Obsidian Markdown

Full markdown version of the same content
Compatible with Obsidian's markdown renderer
Frontmatter includes: title, date, tags: [newsletter, daily], sync_exclude: true
All sections use ## headers for Obsidian outline nav
Internal links to any referenced concept pages where they exist in the vault

Sync Exclusion

The /Newsletter/ folder in the Obsidian vault is excluded from the notes.iri-ai.com auto-sync.

Reason: newsletter files are large, auto-generated daily, and not intended for the shared/published notes site. They are local-only documents.

Exclusion is enforced via the sync configuration — the Newsletter folder does not appear in the sync manifest.

File Locations (Scripts)

Script	Path
Data gathering	`/Users/hkder/.hermes/scripts/newsletter_gather.py`
Editorial analysis	`/Users/hkder/.hermes/scripts/newsletter_editorial.py`
HTML SPA renderer	`/Users/hkder/.hermes/scripts/newsletter_spa.py`
Markdown writer	`/Users/hkder/.hermes/scripts/newsletter_md.py` (or inline in gather/spa)
Orchestrator	`/Users/hkder/.hermes/scripts/newsletter_run.py` (or `run_newsletter.sh`)

Improvement Log

2026-03-24

Spec document created: This file written to Obsidian vault as system planning document.
Ellipsis ban enforced: Added explicit rule to PERSONA system prompt in newsletter_editorial.py — NEVER use ellipsis (...) or (…) anywhere.
Semantic Scholar citation gating: Updated analyze_ai_papers() prompt in newsletter_editorial.py to use citation_count and influential_citations fields for inclusion decisions. Papers with citation_count >= 5 or influential_citations >= 1 are prioritized; 0-citation papers require genuine architectural novelty. Citation count now stated in every paper bullet.
Architecture documented: Full gather → editorial → SPA + MD pipeline documented with data sources, deployment schedule, and hard requirements.

Open TODOs

Confirm 5am/8am ET cron entries are active in Hermes scheduler
Verify Semantic Scholar API calls are wired into gather.py and passing citation fields to editorial.py
Add health check: if gather.py produces 0 same-day items, alert before running editorial
Consider adding Hacker News (top 10 stories) as a supplementary source
Add Korean-language DDG queries for deeper local coverage
Test SPA TOC scroll-spy on mobile viewport

Newsletter System Spec

Newsletter System Spec

Purpose

Architecture

Data Sources

RSS Feeds

arXiv

GitHub Trending

Reddit

DuckDuckGo News

Semantic Scholar

Social (Persona Tracking)

Hard Requirements

Data Freshness

Formatting (enforced in PERSONA system prompt)

Paper Quality Gate

UI: Right-Side TOC Panel

Editorial Model

Korean Coverage

Deployment

Output Format

HTML SPA (spa.py)

Obsidian Markdown

Sync Exclusion

File Locations (Scripts)

Improvement Log

2026-03-24

Open TODOs

HTML SPA (`spa.py`)