Note

Newsletter System Spec

obsidian_vault Newsletter March 24, 2026 newslettersystemspecinternal

Newsletter System Spec

Internal planning document. This folder is excluded from notes.iri-ai.com auto-sync.


Purpose

A private, daily newsletter for a modern AI engineer — someone who builds, trains, deploys, or researches AI systems and needs to stay at the frontier without drowning in noise. Coverage spans AI research, AI industry, world news (AI-relevant), Korean tech, GitHub tooling, Reddit practitioner discussions, and curated social commentary from key AI voices.

Runs on a schedule: 5am ET (early brief) and 8am ET (full edition) daily.


Architecture

gather.py
    |-- RSS feeds (world, tech, AI, Korean)
    |-- arXiv API (cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph)
    |-- GitHub trending scraper
    |-- Reddit API (r/MachineLearning, r/LocalLLaMA, r/artificial, etc.)
    |-- DuckDuckGo News search (DDG news)
    |-- Semantic Scholar API (citation enrichment for arXiv papers)
    |-- Social scraper (Karpathy, LeCun, Altman, Musk, Hotz)
         |
         v
    Gathered JSON (all items tagged with date, source, body, citation fields)
         |
         v
editorial.py   <-- Claude Opus 4.6 via NVIDIA Bedrock (aws/anthropic/bedrock-claude-opus-4-6)
    |-- analyze_world_news()
    |-- analyze_tech_news()
    |-- analyze_ai_news()
    |-- analyze_ai_papers()       <-- uses Semantic Scholar citation_count / influential_citations
    |-- analyze_github_tools()
    |-- analyze_space_quantum()
    |-- analyze_physical_ai()
    |-- analyze_reddit()
    |-- analyze_social()
    |-- analyze_korean()
         |
         v
    Editorial dict (section -> LLM-analyzed markdown bullets)
         |
         v
    spa.py  ──────────────────────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.html
         |                                           (HTML single-page app, right-side TOC panel)
         v
    md_writer.py (or inline) ─────────────────────> /Newsletter/YYYY-MM-DD Daily Newsletter.md
                                                     (Full Obsidian markdown, all sections)

All output files land in the Hermes working directory and are also written to the Obsidian vault at /Newsletter/.


Data Sources

RSS Feeds

Category Feeds
World News BBC World Service, CNN, Reuters
Tech / AI TechCrunch, VentureBeat
Korean Korea Herald, Yonhap News Agency, JoongAng Daily

RSS items are filtered to same-day only. Any item older than today's date (ET) is discarded before editorial analysis.

arXiv

  • Categories polled: cs.AI, cs.LG, cs.CL, cs.CV, cs.RO, quant-ph, eess.SY
  • Filtered to papers submitted or updated within the last 48 hours
  • Each paper is enriched via Semantic Scholar API with citation_count and influential_citations before being passed to editorial

GitHub Trending

  • Scraped from github.com/trending (daily and weekly)
  • Filtered for AI/ML/systems-relevant repos by keyword and topic tags

Reddit

  • Subreddits: r/MachineLearning, r/LocalLLaMA, r/artificial, r/singularity, r/nvidia, r/comfyui
  • Sorted by top/hot, filtered for substantive technical threads (score threshold)

DuckDuckGo News

  • Targeted queries: "AI", "LLM", "GPU compute", "AI regulation", Korean AI companies
  • Used to supplement gaps in RSS coverage on breaking stories

Semantic Scholar

  • Called for each arXiv paper to retrieve citation_count and influential_citations
  • Used in analyze_ai_papers() to gate inclusion of low-signal papers

Social (Persona Tracking)

Tracked personas (public posts/statements):

  • Andrej Karpathy (@karpathy)
  • Yann LeCun (@ylecun)
  • Sam Altman (@sama)
  • Elon Musk (@elonmusk)
  • George Hotz (@realGeorgeHotz)

Hard Requirements

Data Freshness

  • ALL news items must be same-day only (today's date in ET timezone)
  • Items from yesterday or earlier are silently dropped before LLM analysis
  • Date (YYYY-MM-DD) must appear on every bullet in the output

Formatting (enforced in PERSONA system prompt)

  • Every insight is a bullet starting with - (dash space). No numbered lists, no prose paragraphs
  • Bold key terms with **double asterisks**
  • First use of any technical term, acronym, or niche concept must include an inline parenthetical definition
  • Each bullet must be self-contained — a reader seeing only that bullet understands what happened and why it matters
  • NEVER use ellipsis (...) or () anywhere. If a quote or summary must be shortened, end at a complete sentence or clause. No trailing dots.
  • Every item referencing a paper, repo, or event must include the date: (YYYY-MM-DD)
  • End every section with **Recommended action:** <specific, concrete step>

Paper Quality Gate

  • Only high-impact papers are included in the AI Papers section
  • Inclusion criteria: citation_count >= 5, OR influential_citations >= 1, OR genuine architectural novelty for 0-citation papers in last 48h
  • Citation count must be stated in the bullet: e.g. (47 citations as of today)
  • Incremental ablations, sub-5% benchmark improvements, and domain-application papers without architectural novelty are explicitly excluded

UI: Right-Side TOC Panel

  • The HTML SPA output must include a fixed right-side Table of Contents panel
  • TOC links jump to each section anchor
  • TOC highlights the currently visible section on scroll
  • Panel collapses on mobile

Editorial Model

  • Model: aws/anthropic/bedrock-claude-opus-4-6
  • Provider: NVIDIA Inference API (https://inference-api.nvidia.com/v1/chat/completions)
  • Auth: NVIDIA_API_KEY environment variable (also accepts OPENAI_API_KEY fallback)
  • Temperature: 0.7
  • System prompt: PERSONA string in newsletter_editorial.py

Korean Coverage

Dedicated section (analyze_korean()) covering:

  • Companies: Samsung (semiconductor + AI), Kakao (AI services), Naver (HyperCLOVA, search AI), SK Telecom / SK Hynix (HBM, AI infra)
  • Government: MSIT (Ministry of Science and ICT) AI policy, NRF grants, K-AI national strategy, AI safety regulation proposals
  • Startups: Korean AI startups gaining traction (funding rounds, product launches, international expansion)
  • Competitive positioning: honest assessment of whether Korean AI development is genuinely differentiating vs. following the global playbook 12-18 months behind

Sources: Korea Herald RSS, Yonhap RSS, JoongAng RSS, DDG Korean-language queries


Deployment

Runs via Hermes cronjobs on the local machine:

Time (ET) Job Description
5:00 AM Newsletter early run gather.py → editorial.py → spa.py + md output
8:00 AM Newsletter full run Same pipeline, catches any late-breaking items

Cron entries managed by Hermes scheduler. Output files written to:

  • HTML: /Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.html
  • MD: /Users/hkder/Documents/obsidian_vault/Newsletter/YYYY-MM-DD Daily Newsletter.md

Output Format

HTML SPA (spa.py)

  • Single self-contained HTML file (no external CDN dependencies in critical path)
  • Dark/light theme toggle
  • Fixed right-side TOC panel with section jump links and scroll-spy highlighting
  • Sections: World News | Tech | AI News | AI Papers | GitHub | Space & Quantum | Physical AI | Reddit | Social Voices | Korean Tech
  • Responsive: TOC collapses to hamburger on narrow screens
  • Inline CSS + minimal vanilla JS only (no React, no build step)

Obsidian Markdown

  • Full markdown version of the same content
  • Compatible with Obsidian's markdown renderer
  • Frontmatter includes: title, date, tags: [newsletter, daily], sync_exclude: true
  • All sections use ## headers for Obsidian outline nav
  • Internal links to any referenced concept pages where they exist in the vault

Sync Exclusion

The /Newsletter/ folder in the Obsidian vault is excluded from the notes.iri-ai.com auto-sync.

Reason: newsletter files are large, auto-generated daily, and not intended for the shared/published notes site. They are local-only documents.

Exclusion is enforced via the sync configuration — the Newsletter folder does not appear in the sync manifest.


File Locations (Scripts)

Script Path
Data gathering /Users/hkder/.hermes/scripts/newsletter_gather.py
Editorial analysis /Users/hkder/.hermes/scripts/newsletter_editorial.py
HTML SPA renderer /Users/hkder/.hermes/scripts/newsletter_spa.py
Markdown writer /Users/hkder/.hermes/scripts/newsletter_md.py (or inline in gather/spa)
Orchestrator /Users/hkder/.hermes/scripts/newsletter_run.py (or run_newsletter.sh)

Improvement Log

2026-03-24

  • Spec document created: This file written to Obsidian vault as system planning document.
  • Ellipsis ban enforced: Added explicit rule to PERSONA system prompt in newsletter_editorial.pyNEVER use ellipsis (...) or (…) anywhere.
  • Semantic Scholar citation gating: Updated analyze_ai_papers() prompt in newsletter_editorial.py to use citation_count and influential_citations fields for inclusion decisions. Papers with citation_count >= 5 or influential_citations >= 1 are prioritized; 0-citation papers require genuine architectural novelty. Citation count now stated in every paper bullet.
  • Architecture documented: Full gather → editorial → SPA + MD pipeline documented with data sources, deployment schedule, and hard requirements.

Open TODOs

  • Confirm 5am/8am ET cron entries are active in Hermes scheduler
  • Verify Semantic Scholar API calls are wired into gather.py and passing citation fields to editorial.py
  • Add health check: if gather.py produces 0 same-day items, alert before running editorial
  • Consider adding Hacker News (top 10 stories) as a supplementary source
  • Add Korean-language DDG queries for deeper local coverage
  • Test SPA TOC scroll-spy on mobile viewport