Safety & Risk

AI Red Teaming

The practice of systematically probing AI systems for vulnerabilities, biases, and failure modes through adversarial testing.

Full Definition

AI Red Teaming is a security and safety practice where dedicated teams (human or automated) systematically probe AI systems to discover vulnerabilities, biases, undesirable behaviors, and edge cases that weren't caught during development and testing. For autonomous AI agents, red teaming involves testing for prompt injection susceptibility, jailbreaking attempts, bias in decision-making, hallucination tendencies, and unexpected behavior under adversarial conditions. Red teaming is increasingly mandated by regulations (the EU AI Act requires adversarial testing for high-risk systems) and is a key component of responsible AI deployment. Tools like garak, PyRIT, and custom attack frameworks automate parts of the red teaming process, while human experts bring creativity and domain-specific knowledge to discover novel vulnerabilities.