Drag to rotate · scroll to zoom · hover a node for its title · click to open the doc
Pre-ML Era: Classical NLP
Rule-based systems and statistical NLP — what came before learning, and why it hit a wall.
Supervised Learning → Neural Nets
ML as function search, loss and gradient descent, neural network anatomy — with an animated forward pass.
Transformers → Inference
Attention, training, MoE, FlashAttention — plus an animated prefill/decode loop showing why the KV cache exists.
Mathematics of AI
The linear algebra, calculus, and probability that make the previous two docs precise.
Prompt & Context Engineering
Tokens as currency: harnessing AI, saving tokens, fighting hallucination.
Embeddings & RAG
Meaning as geometry: vector search, HNSW, the animated RAG pipeline, and where retrieval fails.
Agents End to End
The tool-use loop animated, MCP, and the design patterns that survive production.
Evaluation
Why benchmarks mislead, the animated eval-building loop, and the LLM-judge problem.
Sampling & Decoding
Temperature, top-p, grammar-constrained JSON, speculative decoding — the animated sampling funnel.
Multimodal & Diffusion
How models see (VLMs) vs how they paint (diffusion) — animated denoising, and why image models can't spell.
Safety & Alignment
The alignment stack, jailbreaks vs prompt injection (animated attack walkthrough), and defense in depth.
The Developer Perspective
How the paradigm of coding is shifting for working engineers.
The Developer Harness
Skills, rules, workflows, memory banks, subagents — software answers to the context problem.
CPU vs GPU
Serial genius vs parallel army; memory bandwidth as the real bottleneck — with live data-traffic animation.
AI & Human Evolution
Knowledge transfer, genes to GPUs — the long arc.
Inference Anatomy
Prefill vs decode, KV cache math, batching — animated request lifecycle from Send to streamed tokens.
Context Caching & Cost
Tokens as currency: animated cache hit vs miss, cache-friendly prompt anatomy, the three token price classes.
GPU Memory Hierarchy
The bandwidth wall, arithmetic intensity, and an animated naive-vs-FlashAttention walkthrough.
Software Context Solutions
Skills, rules, workflows, memory banks, subagents — each mapped to the hardware bottleneck it relieves.
The Inference Optimization Stack
Model → Memory → Runtime → Cluster: MoE, SSMs, GQA, quantization, PagedAttention, kernel fusion, tensor parallelism, Splitwise — one animated mental model.