34551

How to correctly replace multiple white spaces with a single white space in PHP?

<h3>Question</h3>

I was scouring through SO answers and found that the solution that most gave for replacing multiple spaces is:

$new_str = preg_replace("/\s+/", " ", $str);

But in many cases the white space characters include UTF characters that include line feed, form feed, carriage return, non-breaking space, etc. This wiki describes that UTF defines twenty-five characters defined as whitespace.

So how do we replace all these characters as well using regular expressions?


<h3>Answer1:</h3>

When passing u modifier, \s becomes Unicode-aware. So, a simple solution is to use

$new_str = preg_replace("/\s+/u", " ", $str); ^^

See the PHP online demo.


<h3>Answer2:</h3>

The first thing to do is to read this explanation of how unicode can be treated in regex. Coming specifically to PHP, we need to first of all include the PCRE modifier 'u' for the engine to recognize UTF characters. So this would be:

$pattern = "/<our-pattern-here>/u";

The next thing is to note that in PHP unicode characters have the pattern \x{00A0} where 00A0 is hex representation for non-breaking space. So if we want to replace consecutive non-breaking spaces with a single space we would have:

$pattern = "/\x{00A0}+/u"; $new_str = preg_replace($pattern," ",$str);

And if we were to include other types of spaces mentioned in the wiki like:

<ul><li>\x{000D} carriage return</li> <li>\x{000C} form feed</li> <li>\x{0085} next line</li> </ul>

Our pattern becomes:

$pattern = "/[\x{00A0}\x{000D}\x{000C}\x{0085}]+/u";

But this is really not great since the regex engine will take forever to find out all combinations of these characters. This is because the characters are included in square brackets [ ] and we have a + for one or more occurrences.

A better way to then get faster results is by replacing all occurrences of each of these characters by a normal space first. And then replacing multiple spaces with a single normal space. We remove the [ ]+ and instead separate the characters with the or operator | :

$pattern = "/\x{00A0}|\x{000D}|\x{000C}|\x{0085}/u"; $new_str = preg_replace($pattern," ",$str); // we have one-to-one replacement of character by a normal space, so 5 unicode chars give 5 normal spaces $final_str = preg_replace("/\s+/", " ", $new_str); // multiple normal spaces now become single normal space
<h3>Answer3:</h3>

A pattern that matches all Unicode whitespaces is [\pZ\pC]. Here is a unit test to prove it.

If you're parsing user input in UTF-8 and need to normalize it, it's important to base your match on that list. So to answer your question that would be:

$new_str = preg_replace("/[\pZ\pC]+/u", " ", $str);

来源:https://stackoverflow.com/questions/40264465/how-to-correctly-replace-multiple-white-spaces-with-a-single-white-space-in-php

Recommend

  • Insert a grid line when the record gets saved
  • Try block being called twice
  • Get selected radio buttons of certain class
  • Angular4 refresh page repeats page in url
  • how to redirect. old url to new url. [Laravel, htacces]
  • Play video in Vaadin
  • Ctypes: fast way to convert a return pointer to an array or Python list
  • Normal Query on Cassandra using DataStax Enterprise works, but not solr_query
  • Enumerating attached DVD drives in Linux / Java / Scala
  • How to parse Response xml in JMeter and send the result as dynamic parameters to another http reques
  • WinForm event subscription to another class
  • Issue with terraform lookup
  • How to change textview height constraint within table view cell?
  • Matplotlib: subplot
  • Getting SPI temperature data from outside of class
  • Iterating through a folder using batch script
  • How to get a Builder object from rows related to pivot - Laravel
  • How to use grep to output unique lines of code from a file?
  • iterating through image folder using javascript and adding the result in HTML
  • get all files in git diff in intellij
  • Working with codeception and laravel
  • in Gwt, there are 2 different packages (or 2 options) for doing drag n Drop? Which one is better?
  • Bulk loading into PostgreSQL from a remote client
  • Inet6Address valid for invalid IPv6 Address
  • Compiling and linking NASM and 64-bit C code together into a bootloader [duplicate]
  • Opening tel: links from UIWebView
  • Issue with Terrain Collision using Three.js
  • Debugging php script timeout?
  • How to join two tables from different databases
  • Generate and export point cloud from Project Tango
  • Ember.js + JQuery-UI Tooltip - Tooltip does not reflect the model / controller changes
  • JQuery: Infinite input select
  • 'url' requires a non-empty first argument. The syntax changed in Django 1.5, see the docs
  • how do i compare two rows and store the similarities of the two rows in another column
  • Drag and drop unicode TText in DelphiXe4
  • How to warp text around image in iOS?
  • Google App Engine backend servlet not responding
  • Create/delete users from text file using Bash script