Researchers uncover surprising method to hack the guardrails of LLMs

Researchers from Carnegie Mellon University and the Center for A.I. Safety have discovered a new prompt injection method to override the guardrails of large language models (LLMs). These guardrails are safety measures designed to prevent AI from generating harmful content.

Comments are closed.