Note

Reading Log Index

Reading Log

A running log of papers, blog posts, and technical knowledge I read.


Index by Topic

LLMs / Foundation Models

Date Title Type Source
2026-02-12 GLM-5 - From Vibe Coding to Agentic Engineering Blog Post Z.ai (Zhipu AI)
2026-02-12 DeepSeek-V3.2 Paper Paper DeepSeek-AI

Attention / Architecture

Date Title Type Source
2026-02-12 Attention Is All You Need Paper Vaswani et al. (Google)
2026-02-12 DeepSeek-V3.2 Paper Paper DeepSeek-AI

Long Context / Inference Scaling

Date Title Type Source
2026-02-12 Recursive Language Models Paper Zhang, Kraska, Khattab
2026-02-12 DeepSeek-V3.2 Paper Paper DeepSeek-AI

Reinforcement Learning

Date Title Type Source
2026-02-12 GLM-5 - From Vibe Coding to Agentic Engineering Blog Post Z.ai (Zhipu AI)
2026-02-12 DeepSeek-V3.2 Paper Paper DeepSeek-AI

Timeline

2026

February

  • 2026-02-12Attention Is All You Need — The foundational 2017 paper introducing the Transformer architecture. Replaces RNNs with self-attention (Q/K/V dot-product mechanism), enabling O(1) sequential operations and massive parallelism. Every modern LLM descends from this.
  • 2026-02-12DeepSeek-V3.2 Paper — Introduces DeepSeek Sparse Attention (DSA): a two-stage pipeline (Lightning Indexer + top-k selection) that reduces O(L²) attention to O(L*k), cutting long-context inference cost by 2x with negligible quality loss. Also features scaled RL and agentic data synthesis. Frontier-level performance.
  • 2026-02-12GLM-5 - From Vibe Coding to Agentic Engineering — Z.ai launches GLM-5: 744B MoE (40B active), MIT-licensed, trained on Huawei Ascend. Best open-source on reasoning/coding/agentic benchmarks. Integrates DSA for 200K context. Introduces "slime" async RL infrastructure.
  • 2026-02-12Recursive Language Models — RLMs treat prompts as external variables in a REPL, letting the model write code to recursively decompose and process inputs up to 100x beyond context windows. 8B model post-trained with 48 H100 hours approaches GPT-5 on long-context tasks. Solves long-context from the algorithm side (vs DSA from the architecture side).

Stats

  • Total entries: 4
  • Blog posts: 1
  • Papers: 3