×
img

CSET:2025年先进人工智能系统恶意利用可能性的评估方法研究(英文版)

发布者:wx****fa
2025-03-20
1 MB 15 页
网络安全
文件列表:
CSET:2025年先进人工智能系统恶意利用可能性的评估方法研究(英文版).pdf
下载文档

The simplest way to test whether there is a risk of system X being used for malicious behavior Y is to see if X can do Y, just once. Red-teamers and stress testers adopt an adversary’s mindset and probe an AI system for “identification of harmful capabilities, outputs, or infrastructure threats.”6 If a model does not produce harmful behavior on the first try, the next step is to iterate. Researchers use different techniques, including improving prompts (the input fed to the model, such a


加载中...

已阅读到文档的结尾了

下载文档

网友评论>

开通智库会员享超值特权
专享文档
免费下载
免广告
更多特权
立即开通

发布机构

更多>>