Safety & Risk

Privacy Leakage

The inadvertent or unauthorized exposure of personally identifiable information or sensitive data by an AI agent during reasoning or output generation.

Full Definition

Privacy Leakage occurs when an AI agent surfaces, transmits, or includes personally identifiable information (PII), confidential business data, or other sensitive content in its outputs or tool calls without authorization. Privacy leakage in autonomous agents can happen through multiple paths: the agent retrieves documents containing PII during RAG and includes it verbatim in its response; the agent's reasoning trace includes sensitive context it received from another system; the agent passes PII to an external API as part of a tool call parameter; or the agent generates outputs that inadvertently identify individuals from aggregate data. Unlike deliberate data breaches, privacy leakage is typically unintentional — a consequence of the agent's general-purpose reasoning operating on sensitive data without adequate guardrails. GDPR, HIPAA, and CCPA all impose liability for inadvertent PII disclosure regardless of intent. Prevention requires PII detection at both the output level (scanning generated text before delivery) and the tool-call level (scanning parameters before external transmission).