autoresearch¶
Autonomous LLM pretraining. Karpathy’s
autoresearch with a macOS/MPS
fork for Apple Silicon. autoresearch is a set of command-line scripts,
not an importable Python package, so this page is a runbook rather than
an autodoc dump.
Source: llamaclaw/autoresearch.
Entry points¶
Script |
Purpose |
|---|---|
|
Pre-training loop. Default depth=4, batch=2, TOTAL_BATCH=8192, TIME_BUDGET=1800 s (30 min). |
|
Same loop with frozen hyperparameters — the “known-good” baseline. |
|
TurboQuant PTQ evaluation against the trained checkpoint. |
|
LLM-driven autonomous training loop. Reads |
|
Rigorous multi-seed TurboQuant benchmark with RTN baseline, SQNR analysis, Pareto plot, JSON output. |
|
Convert an autoresearch |
|
5-phase orchestrator: train → quantize → benchmark → convert → verify. Pass |
|
Wrapper that launches |
Hardware¶
macOS / Apple Silicon: PyTorch MPS. Tested on M2 8 GB; batch=2 and
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7are required to avoid OOM.Linux / CUDA: upstream Karpathy branch.
Status files¶
train_status.txt, quant_status.txt, agent_status.txt, bench_status.txt
are plain key=value files that the shell scripts poll to render animated
progress bars. They are the observability surface for running experiments.
Results persistence¶
results.tsv and results_history.tsv are append-only — written after each
phase, never deferred to a “final documentation” phase. EXPERIMENT_LOG.md
appends with --- separators between runs. Per-run snapshots land in
logs/<RUN_TAG>/.
Reproducible experiment result (2026-04-07)¶
val_bpb ≈ 1.60TurboQuant: 4-bit 7.6× at cosine 0.995; 3-bit 10× at cosine 0.983; 2-bit 14.6× at cosine ~0.95.
Config:
depth=4,batch=2,TOTAL_BATCH=8192,TIME_BUDGET=1800,eval_tokens=10 × VAL_TOKENS.