Customization Guide

Comprehensive guide to customizing nanochat for your specific use cases and requirements.

Overview

Nanochat is designed to be highly modular and customizable. You can modify:

Model Architecture - Layer counts, dimensions, attention mechanisms
Tokenizer Configuration - Vocabulary size, special tokens, encoding
Training Data - Custom datasets, conversation formats
Task Definitions - New evaluation tasks and benchmarks
Inference Behavior - Generation parameters, conversation handling

This guide covers the most common customization scenarios.

Model Architecture

Modifying Model Size

Create custom model configurations by editing the model size definitions:

python

# In nanochat/gpt.py - add custom model size
MODEL_CONFIGS = {
    "tiny": {"n_layer": 12, "n_head": 12, "n_embd": 768},
    "small": {"n_layer": 12, "n_head": 12, "n_embd": 768}, 
    "medium": {"n_layer": 24, "n_head": 16, "n_embd": 1024},
    "large": {"n_layer": 36, "n_head": 20, "n_embd": 1280},
    # Add your custom size
    "custom": {"n_layer": 18, "n_head": 14, "n_embd": 896},
}

Advanced Architecture Changes

For deeper architectural modifications:

python

# Custom attention mechanism
class CustomGPT(GPT):
    def __init__(self, config):
        super().__init__(config)
        # Replace standard attention with custom implementation
        self.transformer.h = nn.ModuleList([
            CustomBlock(config) for _ in range(config.n_layer)
        ])

Common Modifications:

Attention Variants: Multi-query, grouped-query, sliding window
Position Encodings: ALiBi, learned positions, relative positions
Activation Functions: SwiGLU, GeGLU, GELU variants
Normalization: RMSNorm, LayerNorm variants, pre/post-norm

Model Configuration

Key configuration parameters in nanochat/gpt.py:

python

@dataclass
class GPTConfig:
    block_size: int = 1024      # Context length
    vocab_size: int = 32768     # Vocabulary size  
    n_layer: int = 12           # Number of layers
    n_head: int = 12            # Attention heads
    n_embd: int = 768           # Hidden dimension
    dropout: float = 0.0        # Dropout rate
    bias: bool = True           # Use bias in linear layers
    rope_base: float = 10000.0  # RoPE frequency base

Tokenizer Customization

Custom Vocabulary

Create domain-specific tokenizers:

python

# Custom tokenizer training
python -m scripts.tok_train \\
    --vocab-size 50000 \\
    --custom-tokens "special1,special2,special3" \\
    --domain-data /path/to/domain/data

Special Token Configuration

Add custom special tokens:

python

# In nanochat/tokenizer.py
SPECIAL_TOKENS = {
    "<|bos|>": 0,
    "<|eos|>": 1,
    "<|user_start|>": 2,
    "<|user_end|>": 3,
    "<|assistant_start|>": 4,
    "<|assistant_end|>": 5,
    # Add custom tokens
    "<|system|>": 6,
    "<|function_call|>": 7,
    "<|code|>": 8,
}

Conversation Format

Customize conversation rendering:

python

class CustomTokenizer(HuggingFaceTokenizer):
    def render_conversation(self, messages):
        # Custom conversation format
        tokens = [self.get_bos_token_id()]
        
        for message in messages:
            if message["role"] == "system":
                tokens.append(self.encode_special("<|system|>"))
                tokens.extend(self.encode(message["content"]))
                
        return tokens

Custom Training Data

Data Format

Nanochat expects conversation format:

json

{
  "messages": [
    {"role": "user", "content": "Question text"},
    {"role": "assistant", "content": "Response text"}
  ]
}

Adding Data Sources

Create custom data loaders:

python

# Custom dataset in nanochat/dataset.py
class CustomDataset:
    def __init__(self, data_path):
        self.data = self.load_data(data_path)
    
    def __iter__(self):
        for item in self.data:
            yield self.format_conversation(item)
    
    def format_conversation(self, item):
        return {
            "messages": [
                {"role": "user", "content": item["question"]},
                {"role": "assistant", "content": item["answer"]}
            ]
        }

Data Mixing

Combine multiple data sources:

python

# In training scripts
data_sources = [
    ("custom_data", 0.3),      # 30% custom data
    ("fineweb", 0.5),          # 50% general web data  
    ("code_data", 0.2),        # 20% code data
]

Custom Tasks

Creating New Tasks

Implement custom evaluation tasks:

python

# tasks/custom_task.py
from tasks.common import Task

class CustomTask(Task):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.data = self.load_data()
    
    @property
    def eval_type(self):
        return 'generative'  # or 'categorical'
    
    def get_example(self, index):
        item = self.data[index]
        return {
            "messages": [
                {"role": "user", "content": item["prompt"]},
                {"role": "assistant", "content": item["solution"]}
            ]
        }
    
    def evaluate(self, conversation, completion):
        # Custom evaluation logic
        return self.check_correctness(completion)

Task Mixtures

Create custom task combinations:

python

from tasks.common import TaskMixture
from tasks.gsm8k import GSM8K  
from tasks.custom_task import CustomTask

# Custom task mixture
custom_mixture = TaskMixture([
    GSM8K(subset="main", split="train"),
    CustomTask(),
    CustomTask(),  # Oversample by including twice
])

Inference Customization

Custom Generation

Modify generation behavior:

python

# Custom engine with modified generation
class CustomEngine(Engine):
    def generate(self, tokens, **kwargs):
        # Custom generation logic
        temperature = kwargs.get('temperature', 0.7)
        top_k = kwargs.get('top_k', 50)
        
        # Add custom sampling strategies
        if self.use_custom_sampling:
            return self.custom_sample(tokens, temperature, top_k)
        else:
            return super().generate(tokens, **kwargs)

Response Processing

Add custom post-processing:

python

class CustomChatInterface:
    def __init__(self, engine):
        self.engine = engine
        self.response_filters = [
            self.safety_filter,
            self.format_filter,
            self.length_filter,
        ]
    
    def process_response(self, response):
        for filter_fn in self.response_filters:
            response = filter_fn(response)
        return response

Tool Integration

Add custom tool support:

python

# Custom tool for domain-specific tasks
class CustomTool:
    def __init__(self):
        self.name = "custom_calculator"
    
    def execute(self, code):
        # Domain-specific computation
        result = self.custom_compute(code)
        return {"result": result, "success": True}

# Integrate into engine
engine = Engine(model, tokenizer)
engine.add_tool(CustomTool())

Training Customization

Custom Loss Functions

Implement domain-specific losses:

python

# Custom loss for specialized training
class CustomLoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.ce_loss = nn.CrossEntropyLoss()
        self.auxiliary_loss = AuxiliaryLoss()
    
    def forward(self, logits, targets):
        main_loss = self.ce_loss(logits, targets)
        aux_loss = self.auxiliary_loss(logits, targets)
        return main_loss + 0.1 * aux_loss

Custom Optimizers

Add specialized optimization:

python

# Custom optimizer configuration
def get_custom_optimizer(model, config):
    # Separate learning rates for different components
    embedding_params = model.transformer.wte.parameters()
    attention_params = [p for block in model.transformer.h 
                       for p in block.attn.parameters()]
    
    return torch.optim.AdamW([
        {"params": embedding_params, "lr": config.embed_lr},
        {"params": attention_params, "lr": config.attn_lr},
    ])

Training Schedules

Implement custom learning rate schedules:

python

class CustomScheduler:
    def __init__(self, optimizer, total_steps):
        self.optimizer = optimizer
        self.total_steps = total_steps
    
    def step(self, current_step):
        # Custom schedule logic
        if current_step < 0.1 * self.total_steps:
            lr_mult = current_step / (0.1 * self.total_steps)
        else:
            lr_mult = 0.5 * (1 + math.cos(math.pi * current_step / self.total_steps))
        
        for param_group in self.optimizer.param_groups:
            param_group['lr'] = param_group['base_lr'] * lr_mult

Configuration Management

Environment Variables

Use environment variables for easy customization:

bash

# Set custom model path
export NANOCHAT_MODEL_PATH=/path/to/custom/models

# Custom data directory  
export NANOCHAT_DATA_DIR=/custom/data/path

# Training configuration
export NANOCHAT_BATCH_SIZE=64
export NANOCHAT_LEARNING_RATE=1e-4

Configuration Files

Create custom configuration files:

yaml

# custom_config.yaml
model:
  size: custom
  context_length: 2048
  vocab_size: 40000

training:
  batch_size: 128
  learning_rate: 3e-4
  total_steps: 100000
  
data:
  sources:
    - name: custom_domain
      weight: 0.5
    - name: general
      weight: 0.5

Load configurations:

python

import yaml

def load_custom_config(config_path):
    with open(config_path) as f:
        config = yaml.safe_load(f)
    return config

Deployment Customization

Custom Web Interface

Modify the web UI:

html

<!-- Custom UI template -->
<div class="custom-chat-interface">
    <div class="custom-header">
        <h1>My Custom Assistant</h1>
    </div>
    <div class="custom-chat-area">
        <!-- Custom chat interface -->
    </div>
</div>

API Extensions

Add custom API endpoints:

python

# Custom API endpoints
@app.post("/custom/analyze")
async def custom_analysis(request: CustomRequest):
    result = await custom_processing(request.data)
    return {"analysis": result}

@app.get("/custom/stats")
async def custom_statistics():
    return get_custom_metrics()

Production Scaling

Configure for production deployment:

python

# Custom production configuration
class ProductionConfig:
    def __init__(self):
        self.max_workers = 16
        self.timeout = 30
        self.memory_limit = "32GB"
        self.gpu_memory_fraction = 0.9

Best Practices

Code Organization

Keep customizations in separate modules
Use configuration files for hyperparameters
Implement proper error handling
Add comprehensive logging

Testing Custom Components

Unit test custom implementations
Validate on small datasets first
Compare against baseline implementations
Monitor performance metrics

Documentation

Document custom configuration options
Provide usage examples
Maintain compatibility notes
Update relevant documentation

Version Control

Use git branches for experiments
Tag stable custom versions
Maintain configuration history
Document breaking changes

Sources:

nanochat/gpt.py (model architecture)
nanochat/tokenizer.py (tokenizer customization)
nanochat/engine.py (inference customization)
tasks/common.py (task framework)

# Customization Guide

# Customization Guide

# Overview

# Model Architecture

# Modifying Model Size

# Advanced Architecture Changes

# Model Configuration

# Tokenizer Customization

# Custom Vocabulary

# Special Token Configuration

# Conversation Format

# Custom Training Data

# Data Format

# Adding Data Sources

# Data Mixing

# Custom Tasks

# Creating New Tasks

# Task Mixtures

# Inference Customization

# Custom Generation

# Response Processing

# Tool Integration

# Training Customization

# Custom Loss Functions

# Custom Optimizers

# Training Schedules

# Configuration Management

# Environment Variables

# Configuration Files

# Deployment Customization

# Custom Web Interface

# API Extensions

# Production Scaling

# Best Practices

# Code Organization

# Testing Custom Components

# Documentation

# Version Control

# Related Pages

Customization Guide

Customization Guide

Overview

Model Architecture

Modifying Model Size

Advanced Architecture Changes

Model Configuration

Tokenizer Customization

Custom Vocabulary

Special Token Configuration

Conversation Format

Custom Training Data

Data Format

Adding Data Sources

Data Mixing

Custom Tasks

Creating New Tasks

Task Mixtures

Inference Customization

Custom Generation

Response Processing

Tool Integration

Training Customization

Custom Loss Functions

Custom Optimizers

Training Schedules

Configuration Management

Environment Variables

Configuration Files

Deployment Customization

Custom Web Interface

API Extensions

Production Scaling

Best Practices

Code Organization

Testing Custom Components

Documentation

Version Control

Related Pages