Ensuring that HTML string has proper number of closing tags

Any suggestions for a reliable method to ensure that a string of HTML code is valid? I'm not too concerned with W3C warnings, I really just need to make sure that every opening tag is closed.

What I'm really working with is user-submitted content that's often copied from another site (usually blurbs of news articles). I have a ton of regexes in place to remove tags, styles, etc that might cause a conflict with my site, so I'd really like to have a safety in place to make sure that I don't accidentally remove a closing tag or something that then messes up the display on my site.

My initial thought was to write a function to add every opening tag in the string to an array, excluding <br> and <img> (all other void elements are stripped, anyway). Then do the same thing with every closing tag. Then if the length of both arrays aren't the same, add a closing tag to the end of the string until the length's do match. It wouldn't be perfect, but it would (err, should) prevent a display error on the entire page.

Thoughts? Or is there a module that would do the same thing... but better?

print validator($HTML_string); sub validator { ($_) = @_; my %open, %close, $count; # Safety net for WHILE $count = () = /</g; # Count opening tags $x = 0; while ($x <= $count && m#<(\w+)[^>]*>#g) { $tag = lc($1); if ($tag ne 'img' && $tag ne 'br') { $open{$tag} = exists($open{$tag}) ? $open{$tag} + 1 : 1; } $x++; } # Count closing tags $y = 0; while ($y <= $count && m#</(\w+)>#g) { $tag = lc($1); $close{$tag} = exists($close{$tag}) ? close{$tag} + 1 : 1; $y++; } # If more opening tags than closing, add closing to make it fit for $key (keys %open) { if ($open{$key} > $close{$key}) { $fix = $open{$key} = $close{$key}; # add $key to the end of $_, $fix number of times $_ .= "</$key>" x $fix; } } return $_; }

my %open, %close, $count; # Safety net for WHILE $count = () = /</g; # Count tags $x = 0; while ($x <= $count && m#<(/)?(\w+)[^>]*>#g) { $tag = lc($2); if ($tag ne 'img' && $tag ne 'br') { $open{$tag} //= 0; $close{$tag} //= 0; if ($1) { $close{$tag}++; } else { $open{$tag}++; } } $x++; } # If more opening tags than closing, add closing to make it fit for $key (keys %open) { if ($open{$key} > $close{$key}) { # not sure why the previous post had an = instead of a - ? $fix = $open{$key} - $close{$key}; # add $key to the end of $_, $fix number of times $_ .= "</$key>" x $fix; } }

Ensuring that HTML string has proper number of closing tags

csdude55

csdude55

csdude55

phranque

phranque

csdude55

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week