AI Learning · End to End

The Shifting Paradigm of Code

From static code to static + dynamic. From CPU cycles to tokens as currency. What, how, and why — connecting software approaches down to hardware bottlenecks.

Knowledge graph — the whole series at a glance
Foundations 1–5 Systems 6–10 Practice 11–16 Perception

Drag to rotate · scroll to zoom · hover a node for its title · click to open the doc

Theory track — concepts in order
00 · Prehistory

Pre-ML Era: Classical NLP

Rule-based systems and statistical NLP — what came before learning, and why it hit a wall.

01 · Foundations

Supervised Learning → Neural Nets

ML as function search, loss and gradient descent, neural network anatomy — with an animated forward pass.

02 · The Engine

Transformers → Inference

Attention, training, MoE, FlashAttention — plus an animated prefill/decode loop showing why the KV cache exists.

03 · The Math

Mathematics of AI

The linear algebra, calculus, and probability that make the previous two docs precise.

04 · The Craft

Prompt & Context Engineering

Tokens as currency: harnessing AI, saving tokens, fighting hallucination.

Practice track — building with AI
11 · Retrieval

Embeddings & RAG

Meaning as geometry: vector search, HNSW, the animated RAG pipeline, and where retrieval fails.

12 · Action

Agents End to End

The tool-use loop animated, MCP, and the design patterns that survive production.

13 · Measurement

Evaluation

Why benchmarks mislead, the animated eval-building loop, and the LLM-judge problem.

14 · The Dice

Sampling & Decoding

Temperature, top-p, grammar-constrained JSON, speculative decoding — the animated sampling funnel.

15 · Vision

Multimodal & Diffusion

How models see (VLMs) vs how they paint (diffusion) — animated denoising, and why image models can't spell.

16 · Trust

Safety & Alignment

The alignment stack, jailbreaks vs prompt injection (animated attack walkthrough), and defense in depth.

Perception track — the engineer's lens
Perspective

The Developer Perspective

How the paradigm of coding is shifting for working engineers.

Harness

The Developer Harness

Skills, rules, workflows, memory banks, subagents — software answers to the context problem.

Hardware

CPU vs GPU

Serial genius vs parallel army; memory bandwidth as the real bottleneck — with live data-traffic animation.

Big picture

AI & Human Evolution

Knowledge transfer, genes to GPUs — the long arc.

Systems track — software meets hardware
06 · The Pipeline

Inference Anatomy

Prefill vs decode, KV cache math, batching — animated request lifecycle from Send to streamed tokens.

07 · The Economics

Context Caching & Cost

Tokens as currency: animated cache hit vs miss, cache-friendly prompt anatomy, the three token price classes.

08 · The Silicon

GPU Memory Hierarchy

The bandwidth wall, arithmetic intensity, and an animated naive-vs-FlashAttention walkthrough.

09 · Capstone

Software Context Solutions

Skills, rules, workflows, memory banks, subagents — each mapped to the hardware bottleneck it relieves.

10 · The Map

The Inference Optimization Stack

Model → Memory → Runtime → Cluster: MoE, SSMs, GQA, quantization, PagedAttention, kernel fusion, tensor parallelism, Splitwise — one animated mental model.