intracept

Intracept is safety research into the adaptive vulnerability of frontier language models. We use coevolutionary MAP-Elites to evolve attacks and defenses under competitive pressure, measuring the gap between static and adaptive evaluation.

Static safety benchmarks overestimate model robustness. The adaptive gap is the number that requires coevolutionary machinery to compute; it is always greater than zero.

Full attack lineages trace how successful attacks descend and mutate. Cross-model transfer maps which vulnerabilities are shared across architectures.

Methodology

Attacks are evolved against open-weight models (Llama 3.3 70B, Mistral Large) and transfer-tested against closed models at evolutionary checkpoints. Closed models are evaluation targets, not evolutionary targets. This design follows the adaptive-attack methodology in Nasr & Carlini (2025) and is ToS-safe across all providers.

Responsible disclosure

Findings are shared with affected lab safety teams before public release. The evolved defense archive — system prompt hardenings that survived 500 generations of coevolutionary pressure — will be published openly under MIT as a community contribution.

Benchmark and pre-print forthcoming.

llms.txt