VisInject — Adversarial Prompt Injection Demo

Pick an attack prompt, see the Stage 1 universal abstract image that encodes it, then upload a clean image and the app fuses the two via CLIP ViT-B/32 + the AnyAttack Decoder.

The output is visually indistinguishable from your clean image (PSNR ≈ 25 dB), but Vision-Language Models read it as containing the target phrase.

Limitations: this demo runs only Stage 2 (fusion). It cannot retrain universal images for new prompts (Stage 1 needs GPU + multiple VLMs loaded), nor can it verify the attack against a VLM in-app (Stage 3 needs GPU). For the full pipeline, see the GitHub repo.

First call is slow (~30–60 s) while CLIP, the decoder, and the universal image download to the Space cache. Subsequent calls are 2–5 s.

Step 1 — Pick an attack prompt

The target phrase the attacker wants the VLM to emit


About

v1.5 Methodology

Attack success is now scored by a dual-axis LLM judge (DeepSeek-V4-Pro, thinking mode, calibrated against Claude Opus 4.7 with Cohen's κ = 0.79 on injection axis). Both axes — Influence (did the response change?) and Precise Injection (did the target concept come through?) — are reported separately. See the paper §3.4 for full methodology and the dataset README for reproducibility manifest (cache replay path: no API key required to reproduce paper numbers).

VisInject is released for defensive security research. Do not use it to target production systems without authorization.