NanoChat Architecture Documentation

Welcome to the comprehensive architectural documentation for nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable codebase.

Project Overview

NanoChat is designed as the "best ChatGPT that $100 can buy" - a complete end-to-end pipeline that trains a conversational AI model from scratch. It covers tokenization, pretraining, finetuning, evaluation, inference, and web serving in approximately 8,000 lines of code across 45 files.

Source: README.md:1-20

🏗️ Architecture

System Overview - High-level system design and key components
Training Pipeline - Complete training flow from tokenization to deployment

🧩 Core Components

GPT Model - Transformer architecture with modern improvements
Inference Engine - Efficient inference with KV caching and tool use
Tokenizer - BPE tokenization in GPT-4 style
Data Pipeline - Distributed data loading and preprocessing

📜 Scripts & Tools

Training Scripts - Base training, mid-training, and supervised fine-tuning
Evaluation Scripts - Model evaluation and benchmarking
Inference Scripts - CLI and web-based chat interfaces

🎯 Task Framework

Evaluation Tasks - Multi-choice and generative evaluation framework
Task Implementations - ARC, GSM8K, MMLU, and other benchmarks

📚 Developer Guides

Getting Started - Quick start guide for new developers
Training Your Model - Step-by-step training walkthrough
Customization Guide - Adding personality and abilities

📖 Reference

Glossary - Key terms and concepts
API Reference - Function and class documentation

Key Features

Modern Transformer Architecture

Rotary Positional Embeddings (RoPE) instead of learned positional embeddings
QK Normalization for improved training stability
ReLU² activation in MLP layers
Group Query Attention (GQA) for efficient inference
No bias terms in linear layers for simplicity

Source: nanochat/gpt.py:1-15

Efficient Training Pipeline

Distributed training with PyTorch DDP across multiple GPUs
Mixed optimizer approach: AdamW for embeddings, Muon for linear layers
Gradient accumulation for large effective batch sizes
Automatic device detection (CUDA, CPU, MPS)

Production-Ready Inference

KV cache for efficient autoregressive generation
Tool use capabilities with Python calculator integration
Streaming responses for real-time chat experience
Multi-GPU serving with worker pools

Comprehensive Evaluation

CORE metric from DCLM paper for base model quality
Multiple choice tasks: ARC, MMLU for knowledge assessment
Generative tasks: GSM8K for mathematical reasoning
Chat evaluation for conversational ability

Training Stages

flowchart LR Tokenizer[Tokenizer Training] --> Base[Base Pretraining] Base --> Mid[Mid-training] Mid --> SFT[Supervised Fine-tuning] SFT --> RL[Reinforcement Learning] RL --> Deploy[Web Deployment] Base -.-> BaseEval[Base Evaluation] Mid -.-> ChatEval[Chat Evaluation] SFT -.-> SFTEval[SFT Evaluation] RL -.-> RLEval[RL Evaluation]

Each stage builds upon the previous, gradually transforming the model from a basic language model to a capable chat assistant.

Repository Structure

text

nanochat/
├── nanochat/              # Core library modules
│   ├── gpt.py            # Transformer model implementation
│   ├── engine.py         # Inference engine with KV cache
│   ├── tokenizer.py      # BPE tokenizer
│   ├── dataloader.py     # Distributed data loading
│   └── ...
├── scripts/              # Training and inference scripts
│   ├── base_train.py     # Base model pretraining
│   ├── chat_sft.py       # Supervised fine-tuning
│   ├── chat_web.py       # Web interface server
│   └── ...
├── tasks/                # Evaluation task framework
│   ├── common.py         # Base Task class
│   ├── gsm8k.py         # Math reasoning evaluation
│   ├── arc.py           # Science QA evaluation
│   └── ...
└── speedrun.sh          # Complete $100 training pipeline

Source: README.md:310-350

Hardware Requirements

Recommended: 8x H100 GPUs for full training pipeline (~$100 for 4-hour training)
Minimum: Single GPU with 16GB+ VRAM (will take longer, requires parameter tuning)
CPU/MPS: Supported but not practical for full training

Getting Started

Quick Demo: Run bash speedrun.sh for complete $100 training pipeline
Chat Interface: After training, run python -m scripts.chat_web for web UI
Development: See Getting Started Guide

External Resources

Live Demo: nanochat.karpathy.ai (d34 model)
GitHub Repository: github.com/karpathy/nanochat
Course Context: Part of LLM101n course by Eureka Labs

Generated: 2025-01-10T15:30:00Z
Sources: README.md, pyproject.toml, nanochat/init.py

# NanoChat Architecture Documentation

# NanoChat Architecture Documentation

# Project Overview

# Quick Navigation

# 🏗️ Architecture

# 🧩 Core Components

# 📜 Scripts & Tools

# 🎯 Task Framework

# 📚 Developer Guides

# 📖 Reference

# Key Features

# Modern Transformer Architecture

# Efficient Training Pipeline

# Production-Ready Inference

# Comprehensive Evaluation

# Training Stages

# Repository Structure

# Hardware Requirements

# Getting Started

# External Resources

NanoChat Architecture Documentation

NanoChat Architecture Documentation

Project Overview

Quick Navigation

🏗️ Architecture

🧩 Core Components

📜 Scripts & Tools

🎯 Task Framework

📚 Developer Guides

📖 Reference

Key Features

Modern Transformer Architecture

Efficient Training Pipeline

Production-Ready Inference

Comprehensive Evaluation

Training Stages

Repository Structure

Hardware Requirements

Getting Started

External Resources