VisInject — Adversarial Prompt Injection Demo
Pick an attack prompt, see the Stage 1 universal abstract image that encodes it, then upload a clean image and the app fuses the two via CLIP ViT-B/32 + the AnyAttack Decoder.
The output is visually indistinguishable from your clean image (PSNR ≈ 25 dB), but Vision-Language Models read it as containing the target phrase.
Limitations: this demo runs only Stage 2 (fusion). It cannot retrain universal images for new prompts (Stage 1 needs GPU + multiple VLMs loaded), nor can it verify the attack against a VLM in-app (Stage 3 needs GPU). For the full pipeline, see the GitHub repo.
First call is slow (~30–60 s) while CLIP, the decoder, and the universal image download to the Space cache. Subsequent calls are 2–5 s.
The target phrase the attacker wants the VLM to emit
About
- Code: github.com/jeffliulab/VisInject
- Experimental data (147 response_pairs, 21 universal images, 147 adv images, v3 dual-axis judge results): datasets/jeffliulab/visinject
- Decoder weights:
jiamingzz/anyattack— from Zhang et al., AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models, CVPR 2025.
v1.5 Methodology
Attack success is now scored by a dual-axis LLM judge (DeepSeek-V4-Pro, thinking mode, calibrated against Claude Opus 4.7 with Cohen's κ = 0.79 on injection axis). Both axes — Influence (did the response change?) and Precise Injection (did the target concept come through?) — are reported separately. See the paper §3.4 for full methodology and the dataset README for reproducibility manifest (cache replay path: no API key required to reproduce paper numbers).
VisInject is released for defensive security research. Do not use it to target production systems without authorization.