What are invisible Unicode characters and why are they dangerous?

Characters like zero-width space (U+200B), zero-width non-joiner (U+200C), and byte order mark (U+FEFF) are invisible but can affect tokenization. Attackers embed invisible instructions that bypass display-level filters but get processed by the model.

What strictness level should I use?

Use Low for general sanitization of user content. Use Medium when you need to prevent prompt injection but preserve natural language. Use High only for structured inputs like codes or IDs where you can afford to strip most characters.

Does escaping injection keywords fully prevent injection attacks?

No. Escaping is a heuristic that disrupts common patterns but sophisticated attackers can work around it. Use this alongside semantic classification and output monitoring for defense-in-depth.

AI Input Sanitizer

Remove invisible Unicode, escape injection keywords, and strip dangerous content from LLM input.

Sanitize user input before sending to LLM APIs with three strictness levels. Low: removes invisible Unicode characters (zero-width spaces, BOM). Medium: also escapes prompt injection keywords by wrapping them in brackets. High: strips everything except alphanumeric characters, spaces, and basic punctuation. Shows a diff of what changed.

Strictness level

Also escape injection keywords with brackets

Input text

Sanitized output

Related Tools

AICAI Prompt Injection CheckerNEW

Detect prompt injection attacks in text with pattern matching and a 0-10 risk score.

AJDAI Jailbreak Pattern DetectorNEW

Detect DAN, developer mode, roleplay exploits, and encoding tricks in AI prompts.

ATCAI Text CleanerNEW

Clean and sanitize text for LLM input by stripping HTML, normalizing Unicode, and collapsing whitespace.

APDAI PII DetectorNEW

Detect personal information (email, phone, SSN, credit card, IP, date of birth) in text before sending to LLMs.

Learn More

guide:prompt injection prevention guide:pii handling

FAQ

What are invisible Unicode characters and why are they dangerous?: Characters like zero-width space (U+200B), zero-width non-joiner (U+200C), and byte order mark (U+FEFF) are invisible but can affect tokenization. Attackers embed invisible instructions that bypass display-level filters but get processed by the model.
What strictness level should I use?: Use Low for general sanitization of user content. Use Medium when you need to prevent prompt injection but preserve natural language. Use High only for structured inputs like codes or IDs where you can afford to strip most characters.
Does escaping injection keywords fully prevent injection attacks?: No. Escaping is a heuristic that disrupts common patterns but sophisticated attackers can work around it. Use this alongside semantic classification and output monitoring for defense-in-depth.