NanoChat Architecture Documentation
NanoChat Architecture Documentation
Welcome to the comprehensive architectural documentation for nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable codebase.
Project Overview
NanoChat is designed as the "best ChatGPT that $100 can buy" - a complete end-to-end pipeline that trains a conversational AI model from scratch. It covers tokenization, pretraining, finetuning, evaluation, inference, and web serving in approximately 8,000 lines of code across 45 files.
Source: README.md:1-20
Quick Navigation
ποΈ Architecture
- System Overview - High-level system design and key components
- Training Pipeline - Complete training flow from tokenization to deployment
π§© Core Components
- GPT Model - Transformer architecture with modern improvements
- Inference Engine - Efficient inference with KV caching and tool use
- Tokenizer - BPE tokenization in GPT-4 style
- Data Pipeline - Distributed data loading and preprocessing
π Scripts & Tools
- Training Scripts - Base training, mid-training, and supervised fine-tuning
- Evaluation Scripts - Model evaluation and benchmarking
- Inference Scripts - CLI and web-based chat interfaces
π― Task Framework
- Evaluation Tasks - Multi-choice and generative evaluation framework
- Task Implementations - ARC, GSM8K, MMLU, and other benchmarks
π Developer Guides
- Getting Started - Quick start guide for new developers
- Training Your Model - Step-by-step training walkthrough
- Customization Guide - Adding personality and abilities
π Reference
- Glossary - Key terms and concepts
- API Reference - Function and class documentation
Key Features
Modern Transformer Architecture
- Rotary Positional Embeddings (RoPE) instead of learned positional embeddings
- QK Normalization for improved training stability
- ReLUΒ² activation in MLP layers
- Group Query Attention (GQA) for efficient inference
- No bias terms in linear layers for simplicity
Source: nanochat/gpt.py:1-15
Efficient Training Pipeline
- Distributed training with PyTorch DDP across multiple GPUs
- Mixed optimizer approach: AdamW for embeddings, Muon for linear layers
- Gradient accumulation for large effective batch sizes
- Automatic device detection (CUDA, CPU, MPS)
Production-Ready Inference
- KV cache for efficient autoregressive generation
- Tool use capabilities with Python calculator integration
- Streaming responses for real-time chat experience
- Multi-GPU serving with worker pools
Comprehensive Evaluation
- CORE metric from DCLM paper for base model quality
- Multiple choice tasks: ARC, MMLU for knowledge assessment
- Generative tasks: GSM8K for mathematical reasoning
- Chat evaluation for conversational ability
Training Stages
Each stage builds upon the previous, gradually transforming the model from a basic language model to a capable chat assistant.
Repository Structure
nanochat/
βββ nanochat/ # Core library modules
β βββ gpt.py # Transformer model implementation
β βββ engine.py # Inference engine with KV cache
β βββ tokenizer.py # BPE tokenizer
β βββ dataloader.py # Distributed data loading
β βββ ...
βββ scripts/ # Training and inference scripts
β βββ base_train.py # Base model pretraining
β βββ chat_sft.py # Supervised fine-tuning
β βββ chat_web.py # Web interface server
β βββ ...
βββ tasks/ # Evaluation task framework
β βββ common.py # Base Task class
β βββ gsm8k.py # Math reasoning evaluation
β βββ arc.py # Science QA evaluation
β βββ ...
βββ speedrun.sh # Complete $100 training pipeline
Source: README.md:310-350
Hardware Requirements
- Recommended: 8x H100 GPUs for full training pipeline (~$100 for 4-hour training)
- Minimum: Single GPU with 16GB+ VRAM (will take longer, requires parameter tuning)
- CPU/MPS: Supported but not practical for full training
Getting Started
- Quick Demo: Run
bash speedrun.shfor complete $100 training pipeline - Chat Interface: After training, run
python -m scripts.chat_webfor web UI - Development: See Getting Started Guide
External Resources
- Live Demo: nanochat.karpathy.ai (d34 model)
- GitHub Repository: github.com/karpathy/nanochat
- Course Context: Part of LLM101n course by Eureka Labs
Generated: 2025-01-10T15:30:00Z
Sources: README.md, pyproject.toml, nanochat/init.py