88034

Matching subdomain and top domain using regex in Java

Question:

Follow up of this question <a href="https://stackoverflow.com/questions/12393918/regex-to-match-pattern-with-subdomain-in-java" rel="nofollow">Regex to match pattern with subdomain in java</a>

I use the below pattern to match the domain and subdomain

Pattern pattern = Pattern.compile("http://([a-z0-9]*.)example.com");

this pattern matches the following

<ul><li>http://asd.example.com</li> <li>http://example.example.com</li> <li>http://www.example.com</li> </ul>

but it is not matching

<ul><li>http://example.com</li> </ul>

Can any one tell me how to match http://example.com too?

Answer1:

Just make the first part optional with a ?:

Pattern pattern = Pattern.compile("http://([a-z0-9]*\\.)?example\\.com");

Note that . matches any character, you should use \\. to match a literal dot.

Answer2:

You can use this regex pattern to get domains of all urls:

\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}

For example;

Input = http://www.google.com/search?q=a Output = http://www.google.com Input = ftp://www.google.com/search?q=a Output = ftp://www.google.com Input = www.google.com/search?q=a Output = www.google.com

Here, \p{L}{0,10} stands for the http, https and ftp parts (there could be some more i don't know), (?:://)? stands for :// part if appears, [\p{L}\.]{1,50} stands for the foo.bar.foo.com part. The rest of the url is cut out.

And here is the java code that accomplises the job:

public static final String DOMAIN_PATTERN = "\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}"; public static String getDomain(String url) { if (url == null || url.equals("")) { return ""; } Pattern p = Pattern.compile(DOMAIN_PATTERN); Matcher m = p.matcher(url); if (m.find()) { return m.group(); } return ""; } public static void main(String[] args) { System.out.println(getDomain("www.google.com/search?q=a")); } Output = www.google.com

Finally, if you want to match just "example.com" you can simply add it to the end of the pattern like :

\\p{L}{0,10}(?:://)?[\\p{L}\\.]{0,50}example\\.com

And this will get all of the domains with "example.com":

Input = http://www.foo.bar.example.com/search?q=a Output = http://www.foo.bar.example.com

Note : Note that \p{Ll} can be used instead of \p{L} because \p{Ll} catches lowercase unicode letters (\p{L} all kind of unicode letters) and urls are constructed of lowercase letters.

Recommend

  • Is it possible to have a wildcard in the middle of a sub/sub-subdomain etc?
  • RewriteCond and RewriteRule in .htaccess
  • Different SSL-certificates for different parts of site
  • Django how to modify database records by template
  • What's the benefit of the trailing apostrophe in character literals
  • Can I put a + sign in a folder with IIS?
  • How is SLOC counted by Delphi IDE?
  • Ray-tracing triangles
  • SQL Server Nvarchar and Java prepared statement
  • Moving data between processes in Spartan 3
  • Sort by a column in a union query in SqlAlchemy SQLite
  • Open hyperlink on click on an ggplot/plotly chart
  • Symfony2 redirect to https route fails (uses wrong port)
  • How to remove the dot in to_char if the number is an integer
  • Using HTML/CSS for UI in XNA?
  • Pythons argparse default value doesn't work
  • C function strchr - How to calculate the position of the character?
  • Efficient User-Agent Regex to find Safari in Python
  • Cypher - matching two different possible paths and return both
  • Scipy Leastsq Optional Output Variable (Mesg)
  • Trying to get the char code of ENTER key
  • SAXReader not re-ecape characters
  • Zoom in and out of jPanel
  • preg_replace Double Spaces to tab (\\t) at the beginning of a line
  • Extracting HTML between tags
  • Seeking advice on Jetty HttpClient Hang
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Regex thinks I'm nesting, but I'm not
  • Validaiting emails with Net.Mail MailAddress
  • Which linear programming package should I use for high numbers of constraints and “warm starts” [clo
  • Javascript + PHP Encryption with pidCrypt
  • what is the difference between the asp.net mvc application and asp.net web application
  • Alternatives to the OPTIONAL fallback SPARQL pattern?
  • Google cloud sdk not working when python points python3
  • Matrix multiplication with MKL
  • using HTMLImports.whenReady not working in chrome
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Binding checkboxes to object values in AngularJs
  • Android Heatmap on canvas or ImageView