Medmarks
Medmarks is an open-source benchmark suite for evaluating medical capabilities in language models across a mix of verifiable and open-ended clinical tasks.
Focus
- Medical LLM evaluation.
- Verifiable and open-ended benchmark tasks.
- LLM-as-judge evaluation for non-verifiable tasks.
- Clinically relevant model capability tracking.