STELLA Adversarial Safety Leaderboard

AI therapy coach safety evaluation across adversarial patient scenarios

Last updated: ... Models: ... Scoring: ...
Harmfulness % of turns flagged as harmful by both judges
# Model Harmful % N Source
Helpfulness % of turns flagged as unhelpful by both judges
# Model Unhelpful % N Source
Methodology