Customization Guide

Customization Guide

Comprehensive guide to customizing nanochat for your specific use cases and requirements.

Overview

Nanochat is designed to be highly modular and customizable. You can modify:

  • Model Architecture - Layer counts, dimensions, attention mechanisms
  • Tokenizer Configuration - Vocabulary size, special tokens, encoding
  • Training Data - Custom datasets, conversation formats
  • Task Definitions - New evaluation tasks and benchmarks
  • Inference Behavior - Generation parameters, conversation handling

This guide covers the most common customization scenarios.

Model Architecture

Modifying Model Size

Create custom model configurations by editing the model size definitions:

python
# In nanochat/gpt.py - add custom model size
MODEL_CONFIGS = {
    "tiny": {"n_layer": 12, "n_head": 12, "n_embd": 768},
    "small": {"n_layer": 12, "n_head": 12, "n_embd": 768}, 
    "medium": {"n_layer": 24, "n_head": 16, "n_embd": 1024},
    "large": {"n_layer": 36, "n_head": 20, "n_embd": 1280},
    # Add your custom size
    "custom": {"n_layer": 18, "n_head": 14, "n_embd": 896},
}

Advanced Architecture Changes

For deeper architectural modifications:

python
# Custom attention mechanism
class CustomGPT(GPT):
    def __init__(self, config):
        super().__init__(config)
        # Replace standard attention with custom implementation
        self.transformer.h = nn.ModuleList([
            CustomBlock(config) for _ in range(config.n_layer)
        ])

Common Modifications:

  • Attention Variants: Multi-query, grouped-query, sliding window
  • Position Encodings: ALiBi, learned positions, relative positions
  • Activation Functions: SwiGLU, GeGLU, GELU variants
  • Normalization: RMSNorm, LayerNorm variants, pre/post-norm

Model Configuration

Key configuration parameters in nanochat/gpt.py:

python
@dataclass
class GPTConfig:
    block_size: int = 1024      # Context length
    vocab_size: int = 32768     # Vocabulary size  
    n_layer: int = 12           # Number of layers
    n_head: int = 12            # Attention heads
    n_embd: int = 768           # Hidden dimension
    dropout: float = 0.0        # Dropout rate
    bias: bool = True           # Use bias in linear layers
    rope_base: float = 10000.0  # RoPE frequency base

Tokenizer Customization

Custom Vocabulary

Create domain-specific tokenizers:

python
# Custom tokenizer training
python -m scripts.tok_train \\
    --vocab-size 50000 \\
    --custom-tokens "special1,special2,special3" \\
    --domain-data /path/to/domain/data

Special Token Configuration

Add custom special tokens:

python
# In nanochat/tokenizer.py
SPECIAL_TOKENS = {
    "<|bos|>": 0,
    "<|eos|>": 1,
    "<|user_start|>": 2,
    "<|user_end|>": 3,
    "<|assistant_start|>": 4,
    "<|assistant_end|>": 5,
    # Add custom tokens
    "<|system|>": 6,
    "<|function_call|>": 7,
    "<|code|>": 8,
}

Conversation Format

Customize conversation rendering:

python
class CustomTokenizer(HuggingFaceTokenizer):
    def render_conversation(self, messages):
        # Custom conversation format
        tokens = [self.get_bos_token_id()]
        
        for message in messages:
            if message["role"] == "system":
                tokens.append(self.encode_special("<|system|>"))
                tokens.extend(self.encode(message["content"]))
                
        return tokens

Custom Training Data

Data Format

Nanochat expects conversation format:

json
{
  "messages": [
    {"role": "user", "content": "Question text"},
    {"role": "assistant", "content": "Response text"}
  ]
}

Adding Data Sources

Create custom data loaders:

python
# Custom dataset in nanochat/dataset.py
class CustomDataset:
    def __init__(self, data_path):
        self.data = self.load_data(data_path)
    
    def __iter__(self):
        for item in self.data:
            yield self.format_conversation(item)
    
    def format_conversation(self, item):
        return {
            "messages": [
                {"role": "user", "content": item["question"]},
                {"role": "assistant", "content": item["answer"]}
            ]
        }

Data Mixing

Combine multiple data sources:

python
# In training scripts
data_sources = [
    ("custom_data", 0.3),      # 30% custom data
    ("fineweb", 0.5),          # 50% general web data  
    ("code_data", 0.2),        # 20% code data
]

Custom Tasks

Creating New Tasks

Implement custom evaluation tasks:

python
# tasks/custom_task.py
from tasks.common import Task

class CustomTask(Task):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.data = self.load_data()
    
    @property
    def eval_type(self):
        return 'generative'  # or 'categorical'
    
    def get_example(self, index):
        item = self.data[index]
        return {
            "messages": [
                {"role": "user", "content": item["prompt"]},
                {"role": "assistant", "content": item["solution"]}
            ]
        }
    
    def evaluate(self, conversation, completion):
        # Custom evaluation logic
        return self.check_correctness(completion)

Task Mixtures

Create custom task combinations:

python
from tasks.common import TaskMixture
from tasks.gsm8k import GSM8K  
from tasks.custom_task import CustomTask

# Custom task mixture
custom_mixture = TaskMixture([
    GSM8K(subset="main", split="train"),
    CustomTask(),
    CustomTask(),  # Oversample by including twice
])

Inference Customization

Custom Generation

Modify generation behavior:

python
# Custom engine with modified generation
class CustomEngine(Engine):
    def generate(self, tokens, **kwargs):
        # Custom generation logic
        temperature = kwargs.get('temperature', 0.7)
        top_k = kwargs.get('top_k', 50)
        
        # Add custom sampling strategies
        if self.use_custom_sampling:
            return self.custom_sample(tokens, temperature, top_k)
        else:
            return super().generate(tokens, **kwargs)

Response Processing

Add custom post-processing:

python
class CustomChatInterface:
    def __init__(self, engine):
        self.engine = engine
        self.response_filters = [
            self.safety_filter,
            self.format_filter,
            self.length_filter,
        ]
    
    def process_response(self, response):
        for filter_fn in self.response_filters:
            response = filter_fn(response)
        return response

Tool Integration

Add custom tool support:

python
# Custom tool for domain-specific tasks
class CustomTool:
    def __init__(self):
        self.name = "custom_calculator"
    
    def execute(self, code):
        # Domain-specific computation
        result = self.custom_compute(code)
        return {"result": result, "success": True}

# Integrate into engine
engine = Engine(model, tokenizer)
engine.add_tool(CustomTool())

Training Customization

Custom Loss Functions

Implement domain-specific losses:

python
# Custom loss for specialized training
class CustomLoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.ce_loss = nn.CrossEntropyLoss()
        self.auxiliary_loss = AuxiliaryLoss()
    
    def forward(self, logits, targets):
        main_loss = self.ce_loss(logits, targets)
        aux_loss = self.auxiliary_loss(logits, targets)
        return main_loss + 0.1 * aux_loss

Custom Optimizers

Add specialized optimization:

python
# Custom optimizer configuration
def get_custom_optimizer(model, config):
    # Separate learning rates for different components
    embedding_params = model.transformer.wte.parameters()
    attention_params = [p for block in model.transformer.h 
                       for p in block.attn.parameters()]
    
    return torch.optim.AdamW([
        {"params": embedding_params, "lr": config.embed_lr},
        {"params": attention_params, "lr": config.attn_lr},
    ])

Training Schedules

Implement custom learning rate schedules:

python
class CustomScheduler:
    def __init__(self, optimizer, total_steps):
        self.optimizer = optimizer
        self.total_steps = total_steps
    
    def step(self, current_step):
        # Custom schedule logic
        if current_step < 0.1 * self.total_steps:
            lr_mult = current_step / (0.1 * self.total_steps)
        else:
            lr_mult = 0.5 * (1 + math.cos(math.pi * current_step / self.total_steps))
        
        for param_group in self.optimizer.param_groups:
            param_group['lr'] = param_group['base_lr'] * lr_mult

Configuration Management

Environment Variables

Use environment variables for easy customization:

bash
# Set custom model path
export NANOCHAT_MODEL_PATH=/path/to/custom/models

# Custom data directory  
export NANOCHAT_DATA_DIR=/custom/data/path

# Training configuration
export NANOCHAT_BATCH_SIZE=64
export NANOCHAT_LEARNING_RATE=1e-4

Configuration Files

Create custom configuration files:

yaml
# custom_config.yaml
model:
  size: custom
  context_length: 2048
  vocab_size: 40000

training:
  batch_size: 128
  learning_rate: 3e-4
  total_steps: 100000
  
data:
  sources:
    - name: custom_domain
      weight: 0.5
    - name: general
      weight: 0.5

Load configurations:

python
import yaml

def load_custom_config(config_path):
    with open(config_path) as f:
        config = yaml.safe_load(f)
    return config

Deployment Customization

Custom Web Interface

Modify the web UI:

html
<!-- Custom UI template -->
<div class="custom-chat-interface">
    <div class="custom-header">
        <h1>My Custom Assistant</h1>
    </div>
    <div class="custom-chat-area">
        <!-- Custom chat interface -->
    </div>
</div>

API Extensions

Add custom API endpoints:

python
# Custom API endpoints
@app.post("/custom/analyze")
async def custom_analysis(request: CustomRequest):
    result = await custom_processing(request.data)
    return {"analysis": result}

@app.get("/custom/stats")
async def custom_statistics():
    return get_custom_metrics()

Production Scaling

Configure for production deployment:

python
# Custom production configuration
class ProductionConfig:
    def __init__(self):
        self.max_workers = 16
        self.timeout = 30
        self.memory_limit = "32GB"
        self.gpu_memory_fraction = 0.9

Best Practices

Code Organization

  • Keep customizations in separate modules
  • Use configuration files for hyperparameters
  • Implement proper error handling
  • Add comprehensive logging

Testing Custom Components

  • Unit test custom implementations
  • Validate on small datasets first
  • Compare against baseline implementations
  • Monitor performance metrics

Documentation

  • Document custom configuration options
  • Provide usage examples
  • Maintain compatibility notes
  • Update relevant documentation

Version Control

  • Use git branches for experiments
  • Tag stable custom versions
  • Maintain configuration history
  • Document breaking changes

Sources:

  • nanochat/gpt.py (model architecture)
  • nanochat/tokenizer.py (tokenizer customization)
  • nanochat/engine.py (inference customization)
  • tasks/common.py (task framework)
Last updated: 1/10/2026