---
type: Glossary Term
title: Distillation Attack
description: "A distillation attack is a form of model theft in which an adversary repeatedly queries a proprietary AI model through its public interface, harvests the respon"
resource: "https://www.contextstudios.ai/glossary/distillation-attack"
category: security
language: en
timestamp: "2026-06-26T12:03:40.995Z"
---

# Distillation Attack

A distillation attack is a form of model theft in which an adversary repeatedly queries a proprietary AI model through its public interface, harvests the responses, and uses those outputs to train a competing model of their own. The attacker effectively clones a high-value model's behavior without ever touching its weights, training data, or architecture — the capability is reconstructed purely from observed inputs and outputs. Mechanically, the approach mirrors legitimate model distillation, where a provider deliberately trains a smaller student model on the outputs of its own larger teacher. The difference is consent: in an attack, another company's intellectual property is extracted without permission. The tactic gained prominence when Anthropic told the US Senate that Alibaba-linked operators had distilled Claude at scale. The exposure runs in both directions. If you operate your own model, a successful attack can replicate years of investment in a matter of days. If you rely on third-party models, the provenance of what you are building on becomes a question worth asking. Defenses range from rate limiting and anomaly detection to output watermarking and contractual usage restrictions.
