OpenAI：开源基准测试工具HealthBench：大型语言模型在医疗领域的性能和安全性评估（英文版）

发布者：wx****7c

2025-05-26

8 MB 39 页

文件列表：

OpenAI：开源基准测试工具HealthBench：大型语言模型在医疗领域的性能和安全性评估（英文版）.pdf

资源简介

We present HealthBench, an open-source benchmark measuring the performance and safety of large language models in healthcare. HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional. Responses are evaluated using conversation-specific rubrics created by 262 physicians. Unlike previous multiple-choice or short-answer benchmarks, HealthBench enables realistic, open-ended evaluation through 48,562 unique rubric criteria spanning sever

加载中...

本文档仅能预览20页

继续阅读请下载文档