# Intracept

> Safety research into the adaptive vulnerability of frontier language models under coevolutionary pressure.

Intracept is a Python engine that evolves adversarial attacks and defenses against frontier LLMs using coevolutionary MAP-Elites, producing the Intracept Benchmark. The research measures the adaptive gap: the vulnerability that static benchmarks miss and that only coevolutionary machinery can surface.

## What It Does

- Evolves adversarial attacks against frontier LLMs using MAP-Elites with 8 mutation operators
- Co-evolves defenses (system prompt hardening) under competitive pressure
- Measures cross-model transfer: attacks evolved against open-weight models tested against closed
- Tracks attack phylogenetics: lineage, mutation productivity, convergent techniques
- Computes InterceptScore with adaptive gap as the headline metric

## Methodology: Open-Weight Primary, Closed-Model Transfer Test

Attacks are evolved exclusively against open-weight models (Llama 3.3 70B, Mistral Large, via Bedrock). Closed models are used only as transfer-test targets at evolutionary checkpoints — they are evaluation targets, not evolutionary targets. This design follows the adaptive-attack methodology in Nasr & Carlini (2025) and is ToS-safe across all providers.

## Responsible Disclosure

Findings are shared with affected lab safety teams before public release. The evolved defense archive will be published openly under MIT as a community contribution.

## Key Concepts

- MAP-Elites Archive: 10 attack vectors x 12 techniques = 120 cells, top-5 per cell
- Coevolution: attacks and defenses evolve together (Red Queen dynamics)
- Adaptive Gap: coevolutionary ASR minus static ASR — the headline metric
- Transfer Topology: cross-model vulnerability matrix at evolutionary depth
- Phylogenetics: full attack lineage tracking and mutation productivity analysis

## Open Source

- intracept-bench on PyPI: InterceptScore, archive, coevo loop, judge interface, phylo tools, topology, visualization (MIT)
- Evolved defense archive: system prompt defenses that survived 500 generations of coevolutionary pressure

## Models Evaluated

Evolved on (open-weight): Llama 3.3 70B, Mistral Large
Transfer-tested against (closed): GPT-4o, GPT-4o-mini, Claude Sonnet 4, Claude Haiku 4.5, Gemini 2.5 Flash, Gemini 2.5 Pro, Qwen2.5-72B, Command R+

## Research

Built on: MAP-Elites (Mouret & Clune 2015), coevolution (Hillis 1990), Rainbow Teaming (Samvelyan et al. NeurIPS 2024), RainbowPlus (Dang 2025), AutoDAN-Turbo (Liu et al. ICLR 2025), adaptive attacks (Nasr & Carlini 2025).

## Links

- Site: https://intracept.dev
- GitHub: https://github.com/laurenalexander2/intracept
- PyPI: intracept-bench (forthcoming)