Customization Guide
Customization Guide
Comprehensive guide to customizing nanochat for your specific use cases and requirements.
Overview
Nanochat is designed to be highly modular and customizable. You can modify:
- Model Architecture - Layer counts, dimensions, attention mechanisms
- Tokenizer Configuration - Vocabulary size, special tokens, encoding
- Training Data - Custom datasets, conversation formats
- Task Definitions - New evaluation tasks and benchmarks
- Inference Behavior - Generation parameters, conversation handling
This guide covers the most common customization scenarios.
Model Architecture
Modifying Model Size
Create custom model configurations by editing the model size definitions:
# In nanochat/gpt.py - add custom model size
MODEL_CONFIGS = {
"tiny": {"n_layer": 12, "n_head": 12, "n_embd": 768},
"small": {"n_layer": 12, "n_head": 12, "n_embd": 768},
"medium": {"n_layer": 24, "n_head": 16, "n_embd": 1024},
"large": {"n_layer": 36, "n_head": 20, "n_embd": 1280},
# Add your custom size
"custom": {"n_layer": 18, "n_head": 14, "n_embd": 896},
}
Advanced Architecture Changes
For deeper architectural modifications:
# Custom attention mechanism
class CustomGPT(GPT):
def __init__(self, config):
super().__init__(config)
# Replace standard attention with custom implementation
self.transformer.h = nn.ModuleList([
CustomBlock(config) for _ in range(config.n_layer)
])
Common Modifications:
- Attention Variants: Multi-query, grouped-query, sliding window
- Position Encodings: ALiBi, learned positions, relative positions
- Activation Functions: SwiGLU, GeGLU, GELU variants
- Normalization: RMSNorm, LayerNorm variants, pre/post-norm
Model Configuration
Key configuration parameters in nanochat/gpt.py:
@dataclass
class GPTConfig:
block_size: int = 1024 # Context length
vocab_size: int = 32768 # Vocabulary size
n_layer: int = 12 # Number of layers
n_head: int = 12 # Attention heads
n_embd: int = 768 # Hidden dimension
dropout: float = 0.0 # Dropout rate
bias: bool = True # Use bias in linear layers
rope_base: float = 10000.0 # RoPE frequency base
Tokenizer Customization
Custom Vocabulary
Create domain-specific tokenizers:
# Custom tokenizer training
python -m scripts.tok_train \\
--vocab-size 50000 \\
--custom-tokens "special1,special2,special3" \\
--domain-data /path/to/domain/data
Special Token Configuration
Add custom special tokens:
# In nanochat/tokenizer.py
SPECIAL_TOKENS = {
"<|bos|>": 0,
"<|eos|>": 1,
"<|user_start|>": 2,
"<|user_end|>": 3,
"<|assistant_start|>": 4,
"<|assistant_end|>": 5,
# Add custom tokens
"<|system|>": 6,
"<|function_call|>": 7,
"<|code|>": 8,
}
Conversation Format
Customize conversation rendering:
class CustomTokenizer(HuggingFaceTokenizer):
def render_conversation(self, messages):
# Custom conversation format
tokens = [self.get_bos_token_id()]
for message in messages:
if message["role"] == "system":
tokens.append(self.encode_special("<|system|>"))
tokens.extend(self.encode(message["content"]))
return tokens
Custom Training Data
Data Format
Nanochat expects conversation format:
{
"messages": [
{"role": "user", "content": "Question text"},
{"role": "assistant", "content": "Response text"}
]
}
Adding Data Sources
Create custom data loaders:
# Custom dataset in nanochat/dataset.py
class CustomDataset:
def __init__(self, data_path):
self.data = self.load_data(data_path)
def __iter__(self):
for item in self.data:
yield self.format_conversation(item)
def format_conversation(self, item):
return {
"messages": [
{"role": "user", "content": item["question"]},
{"role": "assistant", "content": item["answer"]}
]
}
Data Mixing
Combine multiple data sources:
# In training scripts
data_sources = [
("custom_data", 0.3), # 30% custom data
("fineweb", 0.5), # 50% general web data
("code_data", 0.2), # 20% code data
]
Custom Tasks
Creating New Tasks
Implement custom evaluation tasks:
# tasks/custom_task.py
from tasks.common import Task
class CustomTask(Task):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.data = self.load_data()
@property
def eval_type(self):
return 'generative' # or 'categorical'
def get_example(self, index):
item = self.data[index]
return {
"messages": [
{"role": "user", "content": item["prompt"]},
{"role": "assistant", "content": item["solution"]}
]
}
def evaluate(self, conversation, completion):
# Custom evaluation logic
return self.check_correctness(completion)
Task Mixtures
Create custom task combinations:
from tasks.common import TaskMixture
from tasks.gsm8k import GSM8K
from tasks.custom_task import CustomTask
# Custom task mixture
custom_mixture = TaskMixture([
GSM8K(subset="main", split="train"),
CustomTask(),
CustomTask(), # Oversample by including twice
])
Inference Customization
Custom Generation
Modify generation behavior:
# Custom engine with modified generation
class CustomEngine(Engine):
def generate(self, tokens, **kwargs):
# Custom generation logic
temperature = kwargs.get('temperature', 0.7)
top_k = kwargs.get('top_k', 50)
# Add custom sampling strategies
if self.use_custom_sampling:
return self.custom_sample(tokens, temperature, top_k)
else:
return super().generate(tokens, **kwargs)
Response Processing
Add custom post-processing:
class CustomChatInterface:
def __init__(self, engine):
self.engine = engine
self.response_filters = [
self.safety_filter,
self.format_filter,
self.length_filter,
]
def process_response(self, response):
for filter_fn in self.response_filters:
response = filter_fn(response)
return response
Tool Integration
Add custom tool support:
# Custom tool for domain-specific tasks
class CustomTool:
def __init__(self):
self.name = "custom_calculator"
def execute(self, code):
# Domain-specific computation
result = self.custom_compute(code)
return {"result": result, "success": True}
# Integrate into engine
engine = Engine(model, tokenizer)
engine.add_tool(CustomTool())
Training Customization
Custom Loss Functions
Implement domain-specific losses:
# Custom loss for specialized training
class CustomLoss(nn.Module):
def __init__(self):
super().__init__()
self.ce_loss = nn.CrossEntropyLoss()
self.auxiliary_loss = AuxiliaryLoss()
def forward(self, logits, targets):
main_loss = self.ce_loss(logits, targets)
aux_loss = self.auxiliary_loss(logits, targets)
return main_loss + 0.1 * aux_loss
Custom Optimizers
Add specialized optimization:
# Custom optimizer configuration
def get_custom_optimizer(model, config):
# Separate learning rates for different components
embedding_params = model.transformer.wte.parameters()
attention_params = [p for block in model.transformer.h
for p in block.attn.parameters()]
return torch.optim.AdamW([
{"params": embedding_params, "lr": config.embed_lr},
{"params": attention_params, "lr": config.attn_lr},
])
Training Schedules
Implement custom learning rate schedules:
class CustomScheduler:
def __init__(self, optimizer, total_steps):
self.optimizer = optimizer
self.total_steps = total_steps
def step(self, current_step):
# Custom schedule logic
if current_step < 0.1 * self.total_steps:
lr_mult = current_step / (0.1 * self.total_steps)
else:
lr_mult = 0.5 * (1 + math.cos(math.pi * current_step / self.total_steps))
for param_group in self.optimizer.param_groups:
param_group['lr'] = param_group['base_lr'] * lr_mult
Configuration Management
Environment Variables
Use environment variables for easy customization:
# Set custom model path
export NANOCHAT_MODEL_PATH=/path/to/custom/models
# Custom data directory
export NANOCHAT_DATA_DIR=/custom/data/path
# Training configuration
export NANOCHAT_BATCH_SIZE=64
export NANOCHAT_LEARNING_RATE=1e-4
Configuration Files
Create custom configuration files:
# custom_config.yaml
model:
size: custom
context_length: 2048
vocab_size: 40000
training:
batch_size: 128
learning_rate: 3e-4
total_steps: 100000
data:
sources:
- name: custom_domain
weight: 0.5
- name: general
weight: 0.5
Load configurations:
import yaml
def load_custom_config(config_path):
with open(config_path) as f:
config = yaml.safe_load(f)
return config
Deployment Customization
Custom Web Interface
Modify the web UI:
<!-- Custom UI template -->
<div class="custom-chat-interface">
<div class="custom-header">
<h1>My Custom Assistant</h1>
</div>
<div class="custom-chat-area">
<!-- Custom chat interface -->
</div>
</div>
API Extensions
Add custom API endpoints:
# Custom API endpoints
@app.post("/custom/analyze")
async def custom_analysis(request: CustomRequest):
result = await custom_processing(request.data)
return {"analysis": result}
@app.get("/custom/stats")
async def custom_statistics():
return get_custom_metrics()
Production Scaling
Configure for production deployment:
# Custom production configuration
class ProductionConfig:
def __init__(self):
self.max_workers = 16
self.timeout = 30
self.memory_limit = "32GB"
self.gpu_memory_fraction = 0.9
Best Practices
Code Organization
- Keep customizations in separate modules
- Use configuration files for hyperparameters
- Implement proper error handling
- Add comprehensive logging
Testing Custom Components
- Unit test custom implementations
- Validate on small datasets first
- Compare against baseline implementations
- Monitor performance metrics
Documentation
- Document custom configuration options
- Provide usage examples
- Maintain compatibility notes
- Update relevant documentation
Version Control
- Use git branches for experiments
- Tag stable custom versions
- Maintain configuration history
- Document breaking changes
Related Pages
Sources:
- nanochat/gpt.py (model architecture)
- nanochat/tokenizer.py (tokenizer customization)
- nanochat/engine.py (inference customization)
- tasks/common.py (task framework)