Regex Pattern for HTML Tags

Using regex to match HTML tags is one of those topics with a famous caveat: regex is not a proper HTML parser and should not be used to parse arbitrary HTML in production. The reason is that HTML allows optional closing tags, self-closing variations, attribute values containing angle brackets in quoted strings, and malformed markup that browsers handle gracefully but regex cannot. For these cases, a DOM parser (browser's DOMParser, jsdom, cheerio, or BeautifulSoup) is the right tool. That said, regex on HTML has very legitimate uses in controlled scenarios: stripping all HTML tags from a string before storing plain text in a database, finding specific simple patterns like anchor tags in clean HTML you generated yourself, extracting meta tag attributes from a known-format page head section, or doing a quick substitution on well-structured template output. The pattern in this example matches opening tags (capturing the tag name and attributes), self-closing tags (ending with />), and closing tags (</tagname>). It uses the gi flags: g for global to find all matches, and i for case-insensitive matching so it handles both <P> and <p>. A simpler and more robust pattern for the most common use case of stripping all HTML tags is /<[^>]*>/g — this matches anything between < and >, including the tag name and attributes. The [^>]* means "any character except >" which avoids crossing tag boundaries. This simple pattern works reliably for well-formed HTML. Test cases in this example include: a paragraph with a class attribute, a self-closing img tag with multiple attributes, an anchor tag, a self-closing br tag, and a div with a nested span. Each represents a different tag structure the pattern should match. Real-world scenarios: a CMS that receives rich-text input needs to extract a plain-text excerpt for search indexing; a newsletter builder strips all tags before generating a plain-text fallback; a content scraper extracts the text of specific headings from fetched HTML. Tips for customisation: to match only specific tags, replace [a-z]+ in the pattern with the tag name you want — for example /<a[^>]*>(.*?)<\/a>/gi matches anchor elements and captures their inner content, which you can use to extract all link text from a document.

Example
/<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)/gi

# Test input:
<p class="intro">Hello world</p>
<img src="logo.png" alt="Logo" />
<a href="https://example.com">Click here</a>
<br />
<div id="main"><span>Nested</span></div>
[ open in Regex Tester → ]

FAQ

Why should I not parse HTML with regex?
HTML allows arbitrary nesting, optional quotes, and malformed markup that breaks regex. For robust HTML processing, use the browser DOM API or a parser like cheerio.
How do I strip all HTML tags from a string?
Use a simple pattern like /<[^>]*>/g with replace to remove all tag-like substrings. This works for clean HTML but may fail on malformed or nested edge cases.
What does the i flag do in regex?
The i flag enables case-insensitive matching, so the pattern matches both <P> and <p>. HTML tags are case-insensitive in the HTML specification.

Related Examples