73495

What regular expression does a browsers use for HTML5 input type=url?

Question:

I'm working on a HTML5 input pattern polyfill and I'm trying to validate an input type=url in JavaScript exactly as the browser (Chrome) does but can't find any documentation on a JavaScript or PERL compatible regular expression. As it's a polyfill, I don't particularly mind if it matches all URL's exactly (which is impossible) but rather that it imitates how the browser works.

Would anyone know of an identical pattern in PERL syntax?

Thanks

Answer1:

Read the regarding specification at <a href="http://www.w3.org/TR/html5/forms.html#url-state-(type=url)" rel="nofollow">http://www.w3.org/TR/html5/forms.html#url-state-(type=url)</a>:

Your polyfill should start with sanitizing the input, i.e. removing linebreaks and trimming the string. The sentence "<em>User agents must not allow users to insert "LF" (U+000A) or "CR" (U+000D) characters</em>" might also be interesting.

The results should be a <a href="http://www.w3.org/TR/html5/infrastructure.html#valid-url" rel="nofollow">valid</a>, <a href="http://www.w3.org/TR/url/#concept-absolute-url" rel="nofollow">absolute</a> URL. The there referenced RFCs <a href="http://tools.ietf.org/html/rfc3986" rel="nofollow">3986</a> and <a href="http://tools.ietf.org/html/rfc3987" rel="nofollow">3987</a> will be describing the URL validation, the section about <a href="http://www.w3.org/TR/url/#url-parsing" rel="nofollow">parsing URLs</a> may be as well interesting.

Your polyfill might not only validate URIs, it also may <a href="http://www.w3.org/TR/html5/infrastructure.html#resolving-urls" rel="nofollow">resolve relative URIs</a>. At least, validating a URI will be much simpler with an algortihm instead of finding an appropriate regexp. Yet, even the RFC mentions a regexp for parsing a <em>already validated</em> URI in <a href="http://tools.ietf.org/html/rfc3986#appendix-B" rel="nofollow">appendix B</a>.

Answer2:

After searching through several HTML5 shivs on GitHub to see if anyone else has come across an ideal expression, I believe I found something that's very close but it doesn't match perfectly.

Alexander Farkas (<a href="https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285" rel="nofollow">https://github.com/aFarkas/webshim/blob/master/src/shims/form-shim-extend.js#L285</a>) uses this pattern to test URLs:

/^([a-z]([a-z]|\d|\+|-|\.)*):(\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?((\[(|(v[\da-f]{1,}\.(([a-z]|\d|-|\.|_|~)|[!\$&'\(\)\*\+,;=]|:)+))\])|((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=])*)(:\d*)?)(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*|(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)|((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)){0})(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i;

Also, just for anyone who stumbles across this via Google, if you don't need the pattern, but just want to check if something's valid through JavaScript (perhaps onChange), you can use the formelement.checkValidity() method. Obviously this doesn't help with a polyfill (which assumes no native HTML5 validation support) but it is useful nonetheless.

Recommend

  • WSO2 EI/ESB: Aggregate mediator for nested Iterate
  • Sort array result with date key in firebase database
  • Use gulp for typescript compilation
  • Python + Selenium: Wait until element is fully loaded
  • return all possible combinations of values in columns in SQL
  • List files with multiple conditions part2
  • When an object has the same name as its class in a VB project, how do I rename the object without al
  • How to alter Column name in hibernate entity?
  • Is it possible to host multiple django projects under the same domain?
  • 0:1(10): error: GLSL 3.30 is not supported. ubuntu 18.04 c++
  • UIDatePicker crashing application
  • Memory leak in Ionic(or angular)?
  • Limit # of records returned based on a form control
  • Django staticgenerator vs CACHE_BACKEND
  • Shopify Custom Payment Gateway Implementation
  • Using XPATH to isolate inline javascript
  • Do iTunesConnect external testers get updates for *every* uploaded ipa file after Beta Approval?
  • How to run Daphne Server (Django Channels) & workers in the background?
  • Draw 9 patch dotted /dashed line on Android
  • Why do you need 2 Javascript files for cross-platform Cordova plugin?
  • select an input by value?
  • How to define something in JavaScript [closed]
  • Rotating Towards Path in OpenGL
  • C++ STL stack pop operation giving segmentation fault
  • Neo4j…how to get a visual representation of my data?
  • Drag and drop unicode TText in DelphiXe4
  • Comma decimal separator is ignored by ASP.NET MVC model binder
  • Bitrate JWplayer
  • Accessing Arguments, Workflow Variables from custom activities
  • Simple stitching in c++ using opencv
  • ARKit code issue {unknown error -1=ffffffffffffffff error: Task failed with exit 1}