I'm filtering through a string (Pulled from a text file), and removing all and tags using preg_replace. For some reason, it is removing the actual text "script", but leaving the <> and . I've tried subbing in /< (to try and treat it as a literal), but that just generates errors. How do I get it to remove the brackets as well? The input is <script>Text</script> Here's the code:

$file = file_get_contents($directory . "original-" . $name); $file = htmlentities($file); $file = preg_replace('<script>', '', $file); $file = preg_replace('<\script>', '', $file);

And here is the output:



$html = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $html);

But you might want to have a look at the strip_tags function


The pattern you use in your preg_* functions has to have some kind of a delimiter before and after that. PHP allows many different delimiters, so it's treating your angle brackets as the regexp delimiter, and not part of the pattern. I ordinarily use { and } as delimiters, many other people use slashes, hash signs, square brackets, parentheses. Angle brackets are also permitted as delimiters, that's why your pattern fails.

You can solve this by adding some delimiters around your patterns, e.g.:

$file = preg_replace('/<script>/', '', $file);

Also, note that PHP regular expressions are case sensitive, so your pattern is foiled by a tag that says <SCRIPT> or <Script>. The i modifier after the pattern (after the closing delimiter) makes it case insensitive (/<script>/i). Also, there are many different ways to write HTML tags that are still interpreted by the browser, e.g.:

<script type="text/javascript">...</script> <script src="..." />

On a sidenote, and maybe I'm reading too much into your question, you should not, I repeat, not use regexps to parse HTML, and especially to sanitize it.


$html = preg_replace('#(.*?)#is', '', $html);


