78301

What regular expression does a browsers use for HTML5 input type=url?

Question:

I'm working on a HTML5 input pattern polyfill and I'm trying to validate an input type=url in JavaScript exactly as the browser (Chrome) does but can't find any documentation on a JavaScript or PERL compatible regular expression. As it's a polyfill, I don't particularly mind if it matches all URL's exactly (which is impossible) but rather that it imitates how the browser works.

Would anyone know of an identical pattern in PERL syntax?

Thanks

Answer1:

Read the regarding specification at <a href="http://www.w3.org/TR/html5/forms.html#url-state-(type=url)" rel="nofollow">http://www.w3.org/TR/html5/forms.html#url-state-(type=url)</a>:

Your polyfill should start with sanitizing the input, i.e. removing linebreaks and trimming the string. The sentence "<em>User agents must not allow users to insert "LF" (U+000A) or "CR" (U+000D) characters</em>" might also be interesting.

The results should be a <a href="http://www.w3.org/TR/html5/infrastructure.html#valid-url" rel="nofollow">valid</a>, <a href="http://www.w3.org/TR/url/#concept-absolute-url" rel="nofollow">absolute</a> URL. The there referenced RFCs <a href="http://tools.ietf.org/html/rfc3986" rel="nofollow">3986</a> and <a href="http://tools.ietf.org/html/rfc3987" rel="nofollow">3987</a> will be describing the URL validation, the section about <a href="http://www.w3.org/TR/url/#url-parsing" rel="nofollow">parsing URLs</a> may be as well interesting.

Your polyfill might not only validate URIs, it also may <a href="http://www.w3.org/TR/html5/infrastructure.html#resolving-urls" rel="nofollow">resolve relative URIs</a>. At least, validating a URI will be much simpler with an algortihm instead of finding an appropriate regexp. Yet, even the RFC mentions a regexp for parsing a <em>already validated</em> URI in <a href="http://tools.ietf.org/html/rfc3986#appendix-B" rel="nofollow">appendix B</a>.

Answer2:

After searching through several HTML5 shivs on GitHub to see if anyone else has come across an ideal expression, I believe I found something that's very close but it doesn't match perfectly.

Alexander Farkas (<a href="https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285" rel="nofollow">https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285</a>) uses this pattern to test URLs:

/^([a-z]([a-z]|\d|\+|-|\.)*):(\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?((\[(|(v[\da-f]{1,}\.(([a-z]|\d|-|\.|_|~)|[!\$&'\(\)\*\+,;=]|:)+))\])|((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=])*)(:\d*)?)(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*|(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)){0})(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i;

Also, just for anyone who stumbles across this via Google, if you don't need the pattern, but just want to check if something's valid through JavaScript (perhaps onChange), you can use the formelement.checkValidity() method. Obviously this doesn't help with a polyfill (which assumes no native HTML5 validation support) but it is useful nonetheless.

Recommend