82238

How to do a regular expression replace in MySQL?

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;

I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]

Now, <strong>is there a function in MySQL that lets you replace through a regular expression</strong>? I'm looking for a similar functionality to REPLACE() function - simplified example follows:

SELECT REPLACE('stackowerflow', 'ower', 'over'); Output: "stackoverflow" /* does something like this exist? */ SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-'); Output: "-tackover-low"

I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.

(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)

Answer1:

MySQL 8.0+ you could use natively REGEXP_REPLACE.

12.5.2 Regular Expressions:

<strong>REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])</strong>

Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.

and Regular expression support:

Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).

Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE() function performs regular expression matching in the manner of the REGEXP and RLIKE operators, which now are synonyms for that function. <strong>In addition, the REGEXP_INSTR(), REGEXP_REPLACE(), and REGEXP_SUBSTR() functions are available to find match positions and perform substring substitution and extraction, respectively.</strong>

SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c'); -- Output: -tackover-low

<strong>DBFiddle Demo</strong>

Answer2:

No.

But if you have access to your server, you could use a user defined function (UDF) like mysql-udf-regexp.

<strong>EDIT:</strong> MySQL 8.0+ you could use natively REGEXP_REPLACE. More in answer above

Answer3:

Use MariaDB instead. It has a function

REGEXP_REPLACE(col, regexp, replace)

See MariaDB docs and PCRE Regular expression enhancements

Note that you can use regexp grouping as well (I found that very useful):

SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\\2 - \\1 - \\3')

returns

over - stack - flow

Answer4:

My brute force method to get this to work was just:

<ol> <li>Dump the table - mysqldump -u user -p database table > dump.sql</li> <li>Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.</li> <li>Import the table - mysqlimport -u user -p database table < dump.sql</li> </ol>

If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.

Answer5:

I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:

http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/

Here is the function code:

DELIMITER $$ CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) DETERMINISTIC BEGIN DECLARE temp VARCHAR(1000); DECLARE ch VARCHAR(1); DECLARE i INT; SET i = 1; SET temp = ''; IF original REGEXP pattern THEN loop_label: LOOP IF i>CHAR_LENGTH(original) THEN LEAVE loop_label; END IF; SET ch = SUBSTRING(original,i,1); IF NOT ch REGEXP pattern THEN SET temp = CONCAT(temp,ch); ELSE SET temp = CONCAT(temp,replacement); END IF; SET i=i+1; END LOOP; ELSE SET temp = original; END IF; RETURN temp; END$$ DELIMITER ;

Example execution:

mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');

Answer6:

we solve this problem without using regex this query replace only exact match string.

update employee set employee_firstname = trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))

Example:

emp_id employee_firstname

1 jay

2 jay ajay

3 jay

After executing query result:

emp_id employee_firstname

1 abc

2 abc ajay

3 abc

Answer7:

I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:

https://github.com/mysqludf/lib_mysqludf_preg

Sample SQL:

SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;

I found the package from this blog post as linked on this question.

Answer8:

<strong>UPDATE 2:</strong> A useful set of regex functions including REGEXP_REPLACE have now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.

<hr>

<strong>UPDATE 1:</strong> Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html

<hr>

The following expands upon the function provided by Rasika Godawatte but trawls through all necessary substrings rather than just testing single characters:

-- ------------------------------------------------------------------------------------ -- USAGE -- ------------------------------------------------------------------------------------ -- SELECT reg_replace(<subject>, -- <pattern>, -- <replacement>, -- <greedy>, -- <minMatchLen>, -- <maxMatchLen>); -- where: -- <subject> is the string to look in for doing the replacements -- <pattern> is the regular expression to match against -- <replacement> is the replacement string -- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching -- <minMatchLen> specifies the minimum match length -- <maxMatchLen> specifies the maximum match length -- (minMatchLen and maxMatchLen are used to improve efficiency but are -- optional and can be set to 0 or NULL if not known/required) -- Example: -- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl; DROP FUNCTION IF EXISTS reg_replace; DELIMITER // CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845), replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT) RETURNS VARCHAR(21845) DETERMINISTIC BEGIN DECLARE result, subStr, usePattern VARCHAR(21845); DECLARE startPos, prevStartPos, startInc, len, lenInc INT; IF subject REGEXP pattern THEN SET result = ''; -- Sanitize input parameter values SET minMatchLen = IF(minMatchLen < 1, 1, minMatchLen); SET maxMatchLen = IF(maxMatchLen < 1 OR maxMatchLen > CHAR_LENGTH(subject), CHAR_LENGTH(subject), maxMatchLen); -- Set the pattern to use to match an entire string rather than part of a string SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern)); SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$')); -- Set start position to 1 if pattern starts with ^ or doesn't end with $. IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN SET startPos = 1, startInc = 1; -- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos -- to the min or max match length from the end (depending on "greedy" flag). ELSEIF greedy THEN SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1; ELSE SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1; END IF; WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject) AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject) AND !(LEFT(pattern, 1) = '^' AND startPos <> 1) AND !(RIGHT(pattern, 1) = '$' AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO -- Set start length to maximum if matching greedily or pattern ends with $. -- Otherwise set starting length to the minimum match length. IF greedy OR RIGHT(pattern, 1) = '$' THEN SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1; ELSE SET len = minMatchLen, lenInc = 1; END IF; SET prevStartPos = startPos; lenLoop: WHILE len >= 1 AND len <= maxMatchLen AND startPos + len - 1 <= CHAR_LENGTH(subject) AND !(RIGHT(pattern, 1) = '$' AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO SET subStr = SUBSTRING(subject, startPos, len); IF subStr REGEXP usePattern THEN SET result = IF(startInc = 1, CONCAT(result, replacement), CONCAT(replacement, result)); SET startPos = startPos + startInc * len; LEAVE lenLoop; END IF; SET len = len + lenInc; END WHILE; IF (startPos = prevStartPos) THEN SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)), CONCAT(SUBSTRING(subject, startPos, 1), result)); SET startPos = startPos + startInc; END IF; END WHILE; IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos)); ELSEIF startInc = -1 AND startPos >= 1 THEN SET result = CONCAT(LEFT(subject, startPos), result); END IF; ELSE SET result = subject; END IF; RETURN result; END// DELIMITER ;

<strong>Demo</strong>

Rextester Demo

<strong>Limitations</strong>

<ol> <li>This method is of course going to take a while when the subject string is large. <strong>Update:</strong> Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).</li> <li>It <strong>won't</strong> allow substitution of backreferences (e.g. \1, \2 etc.) to replace capturing groups. If this functionality is needed, please see this answer which attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).</li> <li>If ^and/or $ is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$) are not supported.</li> <li>There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.</li> </ol>

<strong>Usage Examples</strong>

The function has been used to answer the following StackOverflow questions:

Recommend

  • Compiling a library in C for Android
  • \\w in PHP preg_replace covers only second byte of UTF-8 chars
  • What is the equivalent of regexp \\Q…\\E in postgresql?
  • What does different namespaces mean in Flex?
  • what will be the sql query for checking same pairs of column values in a table?
  • preg_match breaks with more then 5100 characters
  • Rank by two columns and keep ties
  • Need help converting DataTable to XML VB.NET
  • Pandas split array based on condition
  • How to group a list of lists by date using Linq?
  • Find minimum balance in mobile
  • What are good and safe keylength for El-Gamal?
  • How to Optimize mach_msg_trap
  • How to get the percentage of my brute forcer?
  • Compare a column between 2 csv files and write differences using Python
  • href inside href [duplicate]
  • Finding max value of a weighted subset sum of a power set
  • R h2o.glm - issue with max_active_predictors
  • how to read a file in prolog?
  • How do I add a File Type Association in a Windows Phone 8.1 app manifest?
  • Streaming screenshots over WebRTC as a video stream from iOS
  • IE6 changes DOCTYPE to a bad one
  • Where these are stored?
  • Runtime.exec() gives Error: Could not find or load main class
  • Doctrine/Symfony entity generator and generating entity from one table
  • quiver not drawing arrows just lots of blue, matlab
  • Default parameter as generic type
  • Linq Merge lists
  • gspread or such: help me get cell coordinates (not value)
  • Zurb Foundation _global.scss meta styles for js?
  • How to run “Deployd” on port 80 instead of port 5000 in webserver.
  • Display images in Django
  • Problem deserializing objects from cache on MyBatis 3/Java
  • Trying to switch camera back to front but getting exception
  • align graphs with different xlab
  • QuartzCore.framework for Mono Develop
  • RestKit - RKRequestDelegate does not exist
  • Free memory of cv::Mat loaded using FileStorage API
  • Angular 2 constructor injection vs direct access
  • Programmatically clearing map cache