NanoChat Architecture Documentation

NanoChat Architecture Documentation

Welcome to the comprehensive architectural documentation for nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable codebase.

Project Overview

NanoChat is designed as the "best ChatGPT that $100 can buy" - a complete end-to-end pipeline that trains a conversational AI model from scratch. It covers tokenization, pretraining, finetuning, evaluation, inference, and web serving in approximately 8,000 lines of code across 45 files.

Source: README.md:1-20

Quick Navigation

πŸ—οΈ Architecture

🧩 Core Components

πŸ“œ Scripts & Tools

🎯 Task Framework

πŸ“š Developer Guides

πŸ“– Reference

Key Features

Modern Transformer Architecture

  • Rotary Positional Embeddings (RoPE) instead of learned positional embeddings
  • QK Normalization for improved training stability
  • ReLUΒ² activation in MLP layers
  • Group Query Attention (GQA) for efficient inference
  • No bias terms in linear layers for simplicity

Source: nanochat/gpt.py:1-15

Efficient Training Pipeline

  • Distributed training with PyTorch DDP across multiple GPUs
  • Mixed optimizer approach: AdamW for embeddings, Muon for linear layers
  • Gradient accumulation for large effective batch sizes
  • Automatic device detection (CUDA, CPU, MPS)

Production-Ready Inference

  • KV cache for efficient autoregressive generation
  • Tool use capabilities with Python calculator integration
  • Streaming responses for real-time chat experience
  • Multi-GPU serving with worker pools

Comprehensive Evaluation

  • CORE metric from DCLM paper for base model quality
  • Multiple choice tasks: ARC, MMLU for knowledge assessment
  • Generative tasks: GSM8K for mathematical reasoning
  • Chat evaluation for conversational ability

Training Stages

flowchart LR Tokenizer[Tokenizer Training] --> Base[Base Pretraining] Base --> Mid[Mid-training] Mid --> SFT[Supervised Fine-tuning] SFT --> RL[Reinforcement Learning] RL --> Deploy[Web Deployment] Base -.-> BaseEval[Base Evaluation] Mid -.-> ChatEval[Chat Evaluation] SFT -.-> SFTEval[SFT Evaluation] RL -.-> RLEval[RL Evaluation]

Each stage builds upon the previous, gradually transforming the model from a basic language model to a capable chat assistant.

Repository Structure

text
nanochat/
β”œβ”€β”€ nanochat/              # Core library modules
β”‚   β”œβ”€β”€ gpt.py            # Transformer model implementation
β”‚   β”œβ”€β”€ engine.py         # Inference engine with KV cache
β”‚   β”œβ”€β”€ tokenizer.py      # BPE tokenizer
β”‚   β”œβ”€β”€ dataloader.py     # Distributed data loading
β”‚   └── ...
β”œβ”€β”€ scripts/              # Training and inference scripts
β”‚   β”œβ”€β”€ base_train.py     # Base model pretraining
β”‚   β”œβ”€β”€ chat_sft.py       # Supervised fine-tuning
β”‚   β”œβ”€β”€ chat_web.py       # Web interface server
β”‚   └── ...
β”œβ”€β”€ tasks/                # Evaluation task framework
β”‚   β”œβ”€β”€ common.py         # Base Task class
β”‚   β”œβ”€β”€ gsm8k.py         # Math reasoning evaluation
β”‚   β”œβ”€β”€ arc.py           # Science QA evaluation
β”‚   └── ...
└── speedrun.sh          # Complete $100 training pipeline

Source: README.md:310-350

Hardware Requirements

  • Recommended: 8x H100 GPUs for full training pipeline (~$100 for 4-hour training)
  • Minimum: Single GPU with 16GB+ VRAM (will take longer, requires parameter tuning)
  • CPU/MPS: Supported but not practical for full training

Getting Started

  1. Quick Demo: Run bash speedrun.sh for complete $100 training pipeline
  2. Chat Interface: After training, run python -m scripts.chat_web for web UI
  3. Development: See Getting Started Guide

External Resources


Generated: 2025-01-10T15:30:00Z
Sources: README.md, pyproject.toml, nanochat/init.py

Last updated: 1/10/2026