×
img

OpenAI:开源基准测试工具HealthBench:大型语言模型在医疗领域的性能和安全性评估(英文版)

发布者:wx****7c
2025-05-26
8 MB 39 页
文件列表:
OpenAI:开源基准测试工具HealthBench:大型语言模型在医疗领域的性能和安全性评估(英文版).pdf
下载文档
We present HealthBench, an open-source benchmark measuring the performance and safety of large language models in healthcare. HealthBench consists of 5,000 multi-turn conversations between a model and an individual user or healthcare professional. Responses are evaluated using conversation-specific rubrics created by 262 physicians. Unlike previous multiple-choice or short-answer benchmarks, HealthBench enables realistic, open-ended evaluation through 48,562 unique rubric criteria spanning sever

加载中...

本文档仅能预览20页

继续阅读请下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>