Medmarks

Medmarks is an open-source benchmark suite for evaluating medical capabilities in language models across a mix of verifiable and open-ended clinical tasks.

Focus

Medical LLM evaluation.
Verifiable and open-ended benchmark tasks.
LLM-as-judge evaluation for non-verifiable tasks.
Clinically relevant model capability tracking.

Artifacts

Medmarks v0.1
Public release.
Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks
Accepted at ICML ‘26 FM4LS.