CSET：2025年先进人工智能系统恶意利用可能性的评估方法研究（英文版）

发布者：wx****fa

2025-03-20

1 MB 15 页

网络安全

文件列表：

CSET：2025年先进人工智能系统恶意利用可能性的评估方法研究（英文版）.pdf

下载文档

资源简介

The simplest way to test whether there is a risk of system X being used for malicious behavior Y is to see if X can do Y, just once. Red-teamers and stress testers adopt an adversary’s mindset and probe an AI system for “identification of harmful capabilities, outputs, or infrastructure threats.”6 If a model does not produce harmful behavior on the first try, the next step is to iterate. Researchers use different techniques, including improving prompts (the input fed to the model, such a

加载中...

已阅读到文档的结尾了

下载文档