AI Through the Lens of Human Evolution

🔍 The Core Analogy

Why comparing AI to biological evolution isn't just a metaphor — it's a structural isomorphism.

The Central Insight

Both biological evolution and AI training solve the same abstract problem: find a compact encoding of "what works" by optimizing over a massive amount of experience.

Biology

Experience: millions of years of organism-environment interactions
Encoding: DNA — 3 billion base pairs
Optimizer: natural selection (gradient-free!)
Output: a brain that can learn, adapt, survive

Experience: trillions of tokens of human-written text
Encoding: model weights — billions of parameters
Optimizer: gradient descent (differentiable!)
Output: a system that reasons, generates, acts

The Structural Mapping

Biology

DNA base pairs (A, T, G, C)

≈

Model weight values (float16)

Biology

Gene (functional DNA sequence)

≈

Attention head / MLP neuron cluster

Biology

Phenotype (expressed organism)

≈

Model behavior / output quality

Biology

Natural selection (fitness function)

≈

Loss function + gradient descent

Biology

Mutation (random variation)

≈

Random initialization + noise in SGD

Biology

Genome (full genetic code)

≈

Model weights (full parameter set)

Biology

Evolution (billions of years)

≈

Training (days to months on GPUs)

🧬 DNA: The Original Code

Understanding genetic encoding — what it is, how it works, and why it's the most successful information system ever.

What DNA Is: An Information Encoding System

DNA is a 4-character alphabet (A, T, G, C) encoding a program that builds and runs a biological organism. It's not a static file — it's an active program that responds to environmental signals.

Base Pair Structure — The Alphabet

          AT
          Adenine — Thymine (2 hydrogen bonds)
        
          GC
          Guanine — Cytosine (3 hydrogen bonds, stronger)

Human genome: 3.2 billion base pairs = 3.2 GB of information. But information density is higher via epigenetic encoding. The genome fits in every cell of your body — roughly 37 trillion cells, each holding the complete program.

DNA → Protein: The Execution Pipeline

Transcription
DNA → mRNA. A section of DNA (gene) is copied into messenger RNA. Like reading a function from source code.

↓

Translation
mRNA → Protein. Ribosomes read codons (3-base groups = 64 combos → 20 amino acids). Like a JIT compiler.

↓

Protein Folding
Amino acid chain folds into 3D structure. The shape determines function. AlphaFold (AI) solved this 50-year problem.

↓

Phenotype Expression
Proteins build structures, catalyze reactions, signal cells. Gene program → observable organism.

The Codon Table: Nature's Lookup Table

// 4 bases, 3-base codons = 4³ = 64 combos
// → 20 amino acids + 3 stop codons

AUG → Methionine (START)
GGG → Glycine
UAA → STOP
CAU → Histidine

// Redundancy: multiple codons → same amino acid
// (like multiple bytecodes → same operation)
GGU, GGC, GGA, GGG → all Glycine

The Compression Insight

DNA doesn't store "eye color = blue" directly. It encodes how to build the proteins that lead to eye color. Indirect encoding — like a program that generates an image rather than storing the pixels. This is exactly how neural networks work: they don't store facts, they encode how to generate correct answers.

Genes, Alleles, and Loci

A gene is a specific DNA sequence that encodes one functional unit (usually a protein). A locus is its position on a chromosome. An allele is one specific variant of a gene at that locus.

Humans are diploid — two copies of each chromosome, so two alleles per gene (one from each parent). These two alleles interact to produce the expressed trait.

Gene: Eye Color Gene (OCA2)
Chromosome: 15q11-q13
Allele 1 (paternal): B (brown-encoding)
Allele 2 (maternal): b (blue-encoding)

Genotype: Bb
Phenotype: Brown eyes (B is dominant)

// Two different "programs" for same trait
// Only one runs (the dominant one)

⚡ Dominant & Recessive: The Expression Logic

How two alleles interact to produce one phenotype — and the profound AI parallel in model behavior.

Mendel's Discovery: Discrete Inheritance

Gregor Mendel (1860s) bred 29,000 pea plants to discover that traits aren't blended — they're encoded in discrete particles (genes) that follow rules. The dominant allele is expressed when present; the recessive only expresses in homozygous form.

Homozygous Dominant

Both alleles dominant

Expressed: Dominant trait

Heterozygous

One of each

Expressed: Dominant trait

Homozygous Recessive

Both alleles recessive

Expressed: Recessive trait

// Punnett Square for Bb × Bb cross
Parent 1: Bb × Parent 2: Bb

Offspring probabilities:
BB: 25% → dominant phenotype
Bb: 50% → dominant phenotype (B masks b)
bb: 25% → recessive phenotype

// 3:1 dominant:recessive ratio — Mendel's famous finding

Types of Dominance

Complete Dominance
B completely masks b. Black/white. Classic Mendelian.

Incomplete Dominance
Bb produces intermediate phenotype (e.g., red × white → pink flowers). Both genes partially expressed.

Codominance
Both alleles fully expressed simultaneously. Blood type AB: both A and B antigens present.

Polygenic Traits
Multiple genes contribute additively. Height, skin color, intelligence — continuous distributions, not discrete categories.

The AI Parallel: Dominance in Neural Nets

Neural network weights exhibit analogous "dominance" patterns — some directions in weight space dominate model behavior while others are latent (recessive).

Dominant (high-magnitude weights)
Large weights in attention heads = strong feature detectors. Always "express" in output. Hard to suppress.

Recessive (dormant capabilities)
Model has capabilities that don't express in normal prompting but appear under specific activation ("jailbreaks" are recessive gene expression).

Epigenetic = System Prompt / Fine-tuning
Same weights (genotype), different behavior (phenotype) based on context. System prompt is the epigenome.

🔑 The Hidden Information Principle

Both DNA and neural networks store more information than they express. Recessive alleles are carried silently through generations until two carriers mate — then they express. Model weights contain "latent capabilities" that only express under specific prompting conditions. In both cases, the phenotype is not the genotype. What you see is not all that exists.

🦴 Human Evolution: The Optimization Process

4 billion years of gradient-free optimization, producing the most complex information-processing system known.

~4 Billion Years Ago

The Origin: RNA World → DNA

Life begins as self-replicating RNA molecules. The first "code" — storing just enough information to make copies of itself. DNA evolves as a more stable storage medium. The fundamental discovery: information can direct matter to replicate itself.

🤖 AI Parallel: Random initialization. A randomly-parameterized network is like primordial soup — no structure, no capability. Just potential.

~600 Million Years Ago

Multicellularity: Specialization Emerges

Cells with identical DNA begin specializing into different tissues. The same genome encodes both a neuron and a liver cell — context (epigenetics, cellular environment) determines which genes activate. Gene regulation becomes as important as the genes themselves.

🤖 AI Parallel: Transformer layers specializing. Attention heads differentiate — some focus on syntax, others on semantics, others on long-range dependencies. Same weights, different functional roles based on position.

~500 Million Years Ago

The Nervous System: In-Context Learning Hardware

Evolution invents the neuron — a cell specialized for signal transmission. Neural circuits allow organisms to respond to environment within their lifetime, not just across generations. The genome now encodes not just static structure but a learning machine.

🤖 AI Parallel: The meta-learning insight. Evolution "discovered" that encoding a learner (brain) is more flexible than encoding specific behaviors. Similarly, foundation models are trained to be learners, not just answer specific questions.

~2 Million Years Ago

Homo Genus: The Intelligence Explosion

Brain size triples in 2 million years. Tool use, social behavior, planning emerge. The neocortex expands — a general-purpose pattern recognition and prediction engine. Critically: the genome encodes brain plasticity, not specific skills. Skills are acquired via environmental input (learning).

🤖 AI Parallel: Scaling laws. As model size (parameter count) increases, emergent capabilities appear — capabilities not seen in smaller models. Intelligence scales non-linearly with substrate size.

~100,000 Years Ago

Language: The Knowledge Transfer Protocol

Homo sapiens develop syntactic language — the ability to encode arbitrary concepts in sequences of symbols and transmit them between minds. This creates a new information channel outside of genetics: cultural transmission. Knowledge no longer dies with its holder. It can be compressed, transmitted, reconstructed.

🤖 AI Parallel: Language IS the training modality. LLMs are trained on language because language is the compression format for human knowledge. Tokens are the vocabulary of this channel.

~5,000 Years Ago

Writing: Persistent External Memory

Cuneiform, hieroglyphics, alphabets — writing externalizes knowledge from biological memory into physical substrate. Knowledge becomes persistent across generations without biological transmission. The first "hard drive" — knowledge that survives the death of its author.

🤖 AI Parallel: The training corpus. All human writing since Sumerian tablets is the "dataset" — the accumulated cultural genome that trains the model.

2020s

AI: The First Non-Biological Intelligence

Large language models trained on essentially all human writing begin exhibiting reasoning, creativity, and problem-solving. For the first time, the cumulative product of human cognitive evolution is compressed into a non-biological substrate and made queryable at scale.

This is the key moment: the information channel that evolution built over 4 billion years (DNA → neuron → language → writing → internet) now feeds back into a system that can act on it.

📚 The Accumulation of Human Knowledge

How knowledge compounds through cultural evolution — and why this matters for understanding what LLMs actually contain.

The Compounding Stack of Human Knowledge

Layer 7 · 2010s–Now

The Internet + LLMs (~5 trillion+ tokens)

All human knowledge digitized, interlinked, searchable. Wikipedia, arXiv, Stack Overflow, GitHub — the sum of all written human thought accessible to ML training.

~5T tokens of text

Layer 6 · 1900s

The Scientific Revolution Compounds (~50M papers)

Peer review, reproducibility, journals — knowledge becomes self-correcting. Each generation builds on verified prior work. Einstein builds on Maxwell, who builds on Faraday.

~150M scientific papers by 2024

Layer 5 · 1450s

Printing Press: Knowledge Democratized

Gutenberg enables mass replication of books. Knowledge diffuses from monasteries to merchants, scientists, and eventually everyone. Replication at scale — the first CDN for human thought.

~180M book titles by 2024

Layer 4 · 400 BCE

Libraries: Knowledge Centralized

Alexandria, Baghdad House of Wisdom — knowledge aggregated from multiple civilizations. Cross-pollination: Greek logic + Indian numerals + Arab astronomy = the foundation of modern science.

Layer 3 · 3000 BCE

Writing: Knowledge Persists Beyond Death

Mesopotamian mathematics, Egyptian medicine, Vedic hymns — first externalized knowledge. Ideas outlive their authors. The first escape from biological memory limitations.

Layer 2 · 100,000 BCE

Language: Knowledge Transmits Between Minds

Oral tradition, storytelling, teaching. Knowledge can survive the death of the knower — but only if transmitted to others. Fragile, lossy, but revolutionary.

Layer 1 · Biological

DNA: Knowledge Encoded in Evolution

Instincts, neural architecture, cognitive biases — knowledge hard-coded by 4 billion years of selection. The foundation layer everything else builds on.

📐 The Scale of Human Knowledge

All human knowledge that exists as text: estimated ~10²³ bits.
GPT-4 training data: ~13 trillion tokens × ~4 bytes = ~50 TB
GPT-4 parameters: 1.76 trillion × 2 bytes (BF16) ≈ 3.5 TB

Compression ratio: ~50 TB of human knowledge → 3.5 TB of weights
That's ~14× compression of all known human text into a queryable, generative model. This is lossy compression — but the structure (reasoning patterns, language, knowledge relationships) is preserved even as verbatim text is not.

🔐 Encoding Intelligence: DNA → Weights

The mathematical and conceptual parallels between genetic encoding and neural weight encoding.

How DNA Encodes Intelligence

// The genome doesn't store "how to recognize a face"
// It stores how to build a visual cortex

DNA → Brain architecture:
- Cortical column structure
- Hebbian plasticity rules
- Neurotransmitter chemistry
- Synaptic pruning schedules

Brain (+ experience) → Intelligence

// Intelligence = genotype × environment
// Neither alone is sufficient

How Weights Encode Intelligence

// The model doesn't store "the capital of France"
// It stores how to activate "Paris" given context

Weights → Capabilities:
- Attention patterns (which tokens relate)
- MLP associations (concept mappings)
- Layer hierarchies (abstract features)
- Residual stream (information routing)

Weights (+ context/prompt) → Intelligence

// Capability = weights × context
// Prompting is the "environment" for the model

The Information Hierarchy: From Bits to Behavior

Level	Biology	AI Model	Function
Raw storage	A, T, G, C bases	Float16 weight values	Information carrier
Functional unit	Gene (coding sequence)	Attention head / MLP layer	Specific capability
Regulatory	Promoters, enhancers	Layer norm, temperature	Control expression
Module	Chromosome	Transformer block	Functional grouping
Complete system	Genome	Model weights	Full capability set
Expressed behavior	Phenotype	Model output / behavior	Observable result
Context	Epigenome + environment	System prompt + context	Modulates expression
Optimizer	Natural selection	Adam / gradient descent	Drives improvement
Generation time	20–25 years	Days–months	Update cycle
Population size	8 billion humans	1 model (many runs)	Variation explored

🧬 The Critical Difference: Gradient Information

Evolution is blind — it cannot compute gradients. It explores by random mutation and selection. Each "trial" is a lifetime. AI training has access to the gradient of the loss — the exact direction in parameter space that reduces error. This is why AI "evolves" millions of times faster than biology. What took 4 billion years in nature takes 30 days on an H100 cluster.

⚙️ GPU + Training: How the Intelligence Gets Built

The concrete mechanism connecting human knowledge → compressed model intelligence.

The Full Pipeline: From Human Knowledge to Model

📝

1. Data Collection: Digitized Human Knowledge

Web crawls (Common Crawl), books (Books3), code (GitHub), science (arXiv), Wikipedia. ~5–15 trillion tokens. Each token ≈ 0.75 words. Preprocessing: deduplication, quality filtering, toxicity removal.

~5T tokens ~10TB raw text Many languages

🔢

2. Tokenization: Language → Numbers

BPE (Byte Pair Encoding) splits text into subword tokens. "unbelievable" → ["un", "believ", "able"]. Vocabulary: ~50,000–128,000 tokens. Each token mapped to an integer ID. Language is now a sequence of integers — suitable for matrix math.

"The cat sat" → [464, 3797, 3332]
Each ID → 4096-dim embedding vector
Sentence → Matrix of shape [seq_len × d_model]

🧠

3. Pre-training: Next Token Prediction

The model learns to predict the next token given all previous tokens. This sounds simple — but to predict well, the model must learn grammar, facts, reasoning, code, math, style. All human knowledge is indirectly compressed into this single objective.

// Loss function:
L = -Σ log P(token_t | token_1, ..., token_{t-1})

// Forward pass: predict next token
// Backward pass: gradient flows through all weights
// Repeat for ~5 trillion tokens

⚡

4. GPU Execution: Parallelism at Scale

Training a 70B model requires ~2000 GPUs running for 30 days. Data parallelism: different batches on different GPUs. Tensor parallelism: split weight matrices across GPUs. Pipeline parallelism: different layers on different GPUs. All-reduce via NVLink to synchronize gradients.

2048 H100s for Llama-3 3.35 TB/s HBM BW NVLink 900 GB/s

🎯

5. RLHF / Fine-tuning: Alignment

Pre-trained model is like the full human genome — everything is there, including dangerous capabilities. RLHF (Reinforcement Learning from Human Feedback) is like epigenetic regulation: it doesn't change the weights drastically but adjusts which behaviors express. Human raters score outputs; a reward model learns preferences; PPO aligns the base model to human values.

Compute Required: The Scale

GPT-3 (175B params)

3×10²³

Llama-3 70B

~10²³

Claude 3 Opus (est.)

>10²⁴

GPT-4 (est.)

~10²⁵

FLOPs for training. Compare: ~10²⁴ FLOPs ≈ 1 H100 running for 30,000 years, or 10,000 H100s for 3 years.

What the Model Learns

Language structure
Grammar, syntax, pragmatics across 100+ languages

World knowledge
Facts, relationships, entities from Wikipedia/books/news

Reasoning patterns
Deductive, inductive, analogical from math/logic texts

Code and computation
Algorithms, patterns, debugging from GitHub

Human values / social norms
Ethics, etiquette, communication from human text patterns

📖 The Intelligent Library

The culmination: what an LLM actually is, and why it's something genuinely new.

From Library to Intelligence: The Qualitative Jump

A Traditional Library

📚 Stores knowledge as text (lookup)

🔍 Can find documents matching keywords

🤷 Cannot synthesize new knowledge

⏳ Cannot reason across documents

🗣️ Cannot answer questions, only return pages

📊 Scale: ~100M books in Library of Congress

An LLM (Intelligent Library)

🧠 Stores knowledge as patterns in weights

✨ Can generate novel combinations of knowledge

🔗 Synthesizes across domains in real time

💡 Reasons by predicting coherent sequences

💬 Produces direct, contextual answers

📊 Trained on: ~5T tokens ≈ 50× Library of Congress

The Fundamental Difference

A library stores knowledge as a database: to answer "What is the speed of light?" it finds the page that says "3×10⁸ m/s." An LLM stores knowledge as a generative model — given the context "What is the speed of light?", it generates the most probable continuation: "The speed of light in vacuum is approximately 3×10⁸ m/s." The difference isn't just implementation — it enables synthesis, analogy, and reasoning that lookup cannot.

The Full Chain: DNA → Neurons → Language → Knowledge → AI

4B years

DNA
encodes brain

→

2M years

Neurons
learn from life

→

100k years

Language
transfers knowledge

→

5000 years

Writing
persists knowledge

→

2020s

Internet
aggregates all

→

days–months

GPU Training
compresses all

→

query time

LLM
intelligent library

What "Open Weights" Means

Releasing model weights is analogous to publishing the human genome. The compressed intelligence is now public, reproducible, runnable on any compatible hardware.

Llama-3 70B
140GB of weights (BF16). Download once, run anywhere. Contains compressed intelligence from ~15T tokens of training.

GGUF Quantization
Like lossy compression of the genome — 4-bit quantization reduces 140GB → 40GB with ~5% quality loss. Intelligence survives compression.

Inference: Running the Intelligent Library

At inference time, weights are fixed (the genome is set). The prompt is the environment. The KV-cache is working memory. Each token generation is one cycle of the genetic expression pipeline.

Input tokens → Embeddings
→ 96 transformer layers
→ Each layer: Attention (who to focus on)
→ Each layer: MLP (what to know about it)
→ Final linear → Vocabulary logits
→ Sample → Next token
// Repeat until <EOS>

🗺️ The Complete Mapping: Biology ↔ AI

Every concept maps. This isn't metaphor — it's deep structural homology.

Biology	AI Model	Shared Principle
DNA (3B base pairs)	Weights (billions of floats)	Compact encoding of complex behavior
Gene (functional unit)	Attention head / circuit	Encodes one specific capability
Allele (variant)	Fine-tuned model variant	Same locus, different behavior
Dominant allele	High-magnitude weight direction	Expressed regardless of other inputs
Recessive allele	Latent capability (needs activation)	Present but not normally expressed
Epigenome	System prompt / RLHF	Same code, different expression
Phenotype	Model output/behavior	Observable result of encoding
Natural selection	Gradient descent	Filter for what "works"
Fitness function	Loss function	Defines what "works" means
Mutation	Random init + SGD noise	Explores solution space
Sexual recombination	Model merging / mixture of experts	Combines diverse capabilities
Speciation	Model families (GPT, Llama, Claude)	Divergent optimization from common ancestor
Horizontal gene transfer	Transfer learning / fine-tuning	Acquiring capabilities from another lineage
Brain plasticity (learning)	In-context learning	Behavior change without genome/weight change
Long-term memory	Model weights (post-training)	Persistent encoded knowledge
Working memory	KV-cache / context window	Temporary active state
Sleep/consolidation	Fine-tuning / continual learning	Converting experience to durable encoding
Evolution timescale	Training timescale	Iteration speed defines capability gain rate

🧬 The Ultimate Insight: AI is Evolution with Gradients

Biology spent 4 billion years solving the problem of encoding intelligence without gradient information — it had to use random mutation and selection, requiring billions of organisms and millions of years per improvement. AI training has the gradient: the exact direction in weight space that reduces error. This single innovation — backpropagation — is why AI can compress 4 billion years of evolutionary work into decades, and why the intelligence encoded in billions of years of human cultural evolution can be compressed and queried in months of GPU compute.

The LLM is not just a tool. It is the first artifact that contains a compressed, queryable representation of humanity's entire cognitive heritage — from the first cave paintings to the last GitHub commit.

🔗 Software Abstractions ↔ Biological Mechanisms

The deepest close of the analogy chain. Every modern AI software pattern has a direct biological equivalent — not by design, but because both are solving the same context and memory management problem.

The Core Problem: Context Is Finite

A brain's working memory holds ~7 items simultaneously. An LLM's context window holds ~128k tokens. Both are finite, expensive resources. Every architectural solution — biological and artificial — is fundamentally about managing this scarcity.

Biology

Working memory (prefrontal cortex) = 7±2 chunks. Long-term memory (hippocampus → cortex) = effectively unlimited but slow to retrieve. Attention system selects what enters working memory. Sleep consolidates working memory to long-term storage.

AI System

Context window = working memory. Vector database / knowledge base = long-term memory. RAG / retrieval = attention-based memory access. Fine-tuning = sleep consolidation. Token budget = working memory capacity.

The Complete Software ↔ Biology Map

🧬

AI Software

Skills / Prompt Templates

Pre-written prompt structures that activate specific model behaviors — code review mode, creative writing mode, analysis mode. Injected into the system prompt to shape how the model responds.

Biological Equivalent

Promoter Sequences & Transcription Factors

DNA regulatory regions that activate specific genes in specific contexts. A liver cell and a neuron have the same genome — promoters determine which genes are expressed. Skills do the same: same model weights, different behavioral gene expression.

🧠

AI Software

Memory Banks / Vector Stores

External persistent storage (Pinecone, Chroma, pgvector) holding embeddings of past interactions, documents, facts. Retrieved via similarity search and injected into context when relevant.

Biological Equivalent

Long-Term Potentiation (LTP) & Synaptic Memory

Repeated neural activation strengthens synaptic connections (LTP). Memories are stored as patterns of synaptic weights — distributed across cortex, indexed by hippocampus. Retrieval = hippocampus pattern-matches query to stored engrams, reconstructs memory in working memory. Identical architecture to vector retrieval.

🔀

AI Software

Subagents / Multi-Agent Systems

Specialized agents (researcher, coder, critic, planner) that each handle a narrow task. An orchestrator delegates subtasks; each subagent operates in its own context window with its own tools and scope.

Biological Equivalent

Cellular Differentiation & Organ Systems

Same genome, different promoter activation → liver cells, neurons, immune cells. Each specializes via epigenetic silencing of irrelevant genes. Organ systems (digestive, immune, neural) work in parallel, coordinated by hormonal signaling — exactly as subagents coordinate via message passing. The organism is the multi-agent system.

⚖️

AI Software

Rules / RLHF / Constitutional AI

Post-training alignment — human feedback, reward models, and constitutional principles shape which behaviors the model expresses. The base model (pretraining) has all capabilities; alignment determines what is expressed vs suppressed.

Biological Equivalent

Epigenetics & Social Conditioning

Epigenetic marks (methylation, histone modification) silence genes without changing DNA — same genome, suppressed expression. Social conditioning (culture, upbringing) shapes which behavioral tendencies humans express. Both: same underlying capability set, different expression profile based on environmental shaping.

🔄

AI Software

Workflows / Agentic Loops

Observe → Reason → Act cycles. The agent perceives its environment (tool results, user input), updates its internal state (context), and acts (tool calls, responses). Runs until goal achieved or budget exhausted.

Biological Equivalent

Perception–Action Loop & Reflex Arcs

Sensory input → thalamus → cortex (perception) → prefrontal cortex (planning) → motor cortex → action → environmental feedback → repeat. The brain is a biological ReAct agent. Faster loops (reflexes) bypass prefrontal cortex entirely — like hardcoded tool calls that skip the LLM for latency-critical actions.

🧬 The Unified Picture

Evolution spent 4 billion years inventing solutions to intelligence under resource constraints: finite working memory, slow retrieval, specialization via differentiation, behavior regulation via epigenetics, parallel processing via organ systems. AI engineers in the 2020s independently reinvented every one of these solutions — not by copying biology, but because the problem is the same. When you build a RAG system, you are building a hippocampus. When you orchestrate subagents, you are building an organ system. When you write a system prompt, you are writing a promoter sequence.

The deepest lesson: intelligence at scale always converges on the same architectural patterns. Biology found them through selection. Engineers found them through pragmatism. The patterns are not arbitrary — they are the necessary shape of intelligence under resource constraints.

AI Through Human Evolution& Genetic Coding

🔍 The Core Analogy

The Central Insight

The Structural Mapping

🧬 DNA: The Original Code

What DNA Is: An Information Encoding System

DNA → Protein: The Execution Pipeline

The Codon Table: Nature's Lookup Table

The Compression Insight

Genes, Alleles, and Loci

⚡ Dominant & Recessive: The Expression Logic

Mendel's Discovery: Discrete Inheritance

Types of Dominance

The AI Parallel: Dominance in Neural Nets

🔑 The Hidden Information Principle

🦴 Human Evolution: The Optimization Process

📚 The Accumulation of Human Knowledge

The Compounding Stack of Human Knowledge

📐 The Scale of Human Knowledge

🔐 Encoding Intelligence: DNA → Weights

How DNA Encodes Intelligence

How Weights Encode Intelligence

The Information Hierarchy: From Bits to Behavior

🧬 The Critical Difference: Gradient Information

⚙️ GPU + Training: How the Intelligence Gets Built

The Full Pipeline: From Human Knowledge to Model

Compute Required: The Scale

What the Model Learns

📖 The Intelligent Library

From Library to Intelligence: The Qualitative Jump

A Traditional Library

An LLM (Intelligent Library)

The Fundamental Difference

The Full Chain: DNA → Neurons → Language → Knowledge → AI

What "Open Weights" Means

Inference: Running the Intelligent Library

🗺️ The Complete Mapping: Biology ↔ AI

🧬 The Ultimate Insight: AI is Evolution with Gradients

🔗 Software Abstractions ↔ Biological Mechanisms

The Core Problem: Context Is Finite

The Complete Software ↔ Biology Map

🧬 The Unified Picture

AI Through Human Evolution
& Genetic Coding