Links
Selected references worth standing behind.
This is the broad index: papers, essays, posts, repos, tools, benchmarks, docs, hubs, and useful references.
AI Safety
Emergent Misalignment
Research writeup. Narrow finetuning can produce broad behavioral shifts.Narrow Misalignment is Hard, Emergent Misalignment is Easy
Essay. Why broad misalignment may be an easier solution than local harmful behavior.Training Large Language Models on Narrow Tasks Can Lead to Broad Misalignment
Paper. Journal version of the narrow-training-to-broad-misalignment result.Weird Generalization and Inductive Backdoors
Paper. Related work on unexpected generalization patterns and hidden failure modes.
Evaluation
Medmarks
Benchmark suite for medical capabilities in language models.LAB-Bench
Paper. Benchmark for language models doing biology research tasks.FOMO26
Challenge. Foundation model challenge for brain MRI.Open Graph Benchmark
Benchmark suite. Standardized graph ML datasets, loaders, and evaluators.RoboTwin
Paper. Dual-arm robot benchmark using generative digital twins for scalable task and data generation.
Tool Use And Agents
Harness Engineering
Essay. Building products with agents through environments, specs, and reliability loops.Code Mode
Post. Tool use through code interfaces rather than repeated chat-level tool calls.Context Mode
Post. Pattern for keeping agent context usable when tools produce large outputs.Agents Learn Their Runtime
Paper. Persistent versus reset Python interpreters in CodeAct-style training.AI Gave Birth to the 100x Engineer
Essay. Case study on compounding agent workflows with test harnesses and supporting tools.
ML Systems
Frontier Model Training Methodologies
Survey. Open frontier training recipes and implementation choices.Scaling LLMs with JAX
Book. Distributed training practice.Beyond Language Modeling
Paper. From-scratch multimodal pretraining study.How to Train the Best Embedding Model in the World
Essay. Embedding model training, label noise, verification, and dataset scale.GPU MODE
Community and resource hub for GPU programming.CUDA Writeups by Tushar Gautam
Posts. Implementation-forward notes on CUDA kernels and optimization.
Research Engineering
Distill
Essays. A high bar for clear technical exposition.You and Your Research
Essay. Hamming on choosing important problems and organizing a life around serious work.An Opinionated Guide to ML Research
Essay. Practical advice on developing taste in machine learning research.Principles of Effective Research
Essay. Research as a skill that can be deliberately improved.Fast
Essay. Examples of ambitious work happening faster than conventional expectations.Design Docs for Machine Learning Systems
Essay. Clear technical writing as part of engineering.Machine Learning Design Patterns
Essay. Recurring patterns in applied ML systems.Reproducing Deep Reinforcement Learning
Essay. Experimental fragility in deep RL.ML Productivity
Essay. Making ML experimentation less ad hoc.Harvard CS197: Communicating Computer Science Research
Course notes. Turning technical work into a clear research artifact.Your LLM-assisted scientific breakthrough probably isn’t real
Essay. Caution against mistaking plausible machine-generated research narratives for evidence.The Missing Semester
Practical computing tools that make technical work less fragile.How to Read a Paper
Compact guide to reading research papers deliberately.
Programmable Biology
Evo 2
Paper and code. Long-context genomic foundation model for sequence modeling and design.HyenaDNA
Paper. Long-context sequence models at nucleotide resolution.AlphaFold
Paper. Foundational protein structure prediction.AlphaFold 3
Paper. Structure prediction for biomolecular complexes and interactions.OpenFold
Open implementation and training stack for AlphaFold-style systems.Rosalind
Bioinformatics algorithms through concrete programming problems.