Regex Pattern for URL Matching
URL matching regex is needed in a surprising variety of situations: extracting links from user-generated content before storing it in a database, converting plain-text URLs in a message body to clickable links, validating that a form field contains a URL pointing to an allowed domain, or parsing specific parts of a URL for analytics. Getting the pattern right means balancing comprehensiveness with maintainability. The pattern in this example targets HTTP and HTTPS URLs — the two schemes used by virtually all web resources. It matches an optional www. prefix, a domain of up to 256 characters, a port, and then an optional path, query string, and fragment. The character class in the path segment is intentionally broad to handle the wide variety of characters that appear in real URLs including hyphens, underscores, dots, tildes, percent-encoded sequences, and query parameters with special characters. Using the global flag /g is important when scanning a block of text for all URLs. Without it, the regex engine stops after the first match. The test cases in this example include a bare domain, a URL with path and query string and anchor fragment, a URL with a non-standard port, a non-URL string (which should not match), and an FTP URL (which this HTTPS-specific pattern correctly rejects). Edge cases that trip up URL regex: localhost URLs lack a TLD, so standard patterns that require a dot in the domain won't match http://localhost:3000. IP address URLs like http://192.168.1.1/api also lack a traditional domain structure. If you need to match these, add alternation branches to handle them explicitly. Capture groups are useful for extracting URL components. Wrapping the domain segment in parentheses lets you retrieve just the domain from each match, which is useful for domain allow-list checking or analytics dashboards showing which external sites you link to. Real-world scenario: a content moderation tool scans user posts for URLs, extracts the domains, and checks them against a blocklist of known phishing domains before the post is displayed — all driven by a URL-matching regex that feeds a domain extraction step. Tips: for production URL parsing where you need to extract components reliably, consider using the URL constructor (new URL(str)) in JavaScript instead of regex — it handles all edge cases correctly and gives you structured access to protocol, hostname, pathname, search, and hash.
/https?://(www.)?[-a-zA-Z0-9@:%._+~#=]{1,256}.[a-zA-Z0-9()]{1,6}([-a-zA-Z0-9()@:%_+.~#?&/=]*)/g
# Test cases:
https://example.com
http://www.site.co.uk/path?q=1#anchor
https://sub.domain.org:8080/api/v2
not-a-url
ftp://wrong-protocol.comFAQ
- Should I use the g flag for URL matching?
- Use the g (global) flag when you want to find all URLs in a block of text. Without it, the regex only returns the first match.
- Does this pattern match localhost URLs?
- Standard URL regex patterns require a TLD, which localhost lacks. Add a separate branch for localhost and IP addresses if you need to match development URLs.
- How do I extract just the domain from a URL?
- Use a capturing group around the domain segment of the pattern. The tester shows capture group values for each match so you can see what each group extracted.
Related Examples
Email validation regex is simultaneously one of the most commonly written patter...
Regex Pattern for IP AddressesIP address validation looks deceptively simple until you try to write the regex....
Regex Pattern for HTML TagsUsing regex to match HTML tags is one of those topics with a famous caveat: rege...