Hinnerk Altenburg

Web Developer in Hamburg, Germany

Strip all HTML tags with Perl like PHP’s strip_tags() does

leave a comment

The Perl regular expression (regexp/regex) equivalent to PHP’s strip_tags() is:

while ($string =~ s/<\S[^<>]*(?:>|$)//gs) {};

Please note that it also denotes an opening “<” (followed by a non-whitespace character) as a tag and strips all characters behind, even it is not closed by a “>”. This is the same behavior as PHP’s strip_tags().

Update: This regexp is only satisfying my test against PHP 4.x, but 5.x is pretty smarter when it comes to edge cases. It will be a challenge to build a Perl equivalent as all the different approaches in CPAN also fail the test.

Update 2010-07-07: I’m currently porting strip_tags() from the C source code of PHP 5.3.2 to a CPAN Module. Stay tuned.

Written by Hinnerk

December 23rd, 2009 at 2:30 pm

Posted in English

Tagged with , , , ,

Share: Share with your XING contacts

Leave a Reply