And wouldn't that mean that the loop never executes in the first place, even once, because the condition fails at the outset?
I'm lost.
In theory, there would never be an
<a href without a closing
</a>, so this shouldn't be an issue. If one did exist then the entire string would become a link, which would be a whole different type of problem! LOL
In practice, elsewhere in the script I convert
[
ipsum.com...]
to:
<a href='https://www.ipsum.com' target='_new'>https://www.ipsum.com</a>
Then if the user modifies their post, I remove the tag and leave the contents before converting it again. If I don't then I end up with something like:
<a href='<a href='https://www.ipsum.com' target='_new'>https://www.ipsum.com</a>'><a href='https://www.ipsum.com' target='_new'>https://www.ipsum.com</a></a>
It's also not uncommon for a user to copy text from another site and paste it to my contenteditable, so I need to be prepared to strip their tags, too. These tags can include style, class, onWhatever, data-whatever, and it's not too uncommon to see made-up attributes! So it's easier to just strip the whole tag.
So this:
<a[^>]* href=(["']).*?\2[^>]*>(.*?)</a>
is supposed to catch anything that:
1. starts with <a
2. is optionally followed by anything
3. followed by a whitespace, then href=
4. followed by either a " or '
5. followed by anything until it gets to the matching " or '; this should be an URL, so :, /, ?, =, &, %, ;, and I guess any other punctuation that could be in a query string. And since it could come from another site, there's no guarantee that it's validated
6. followed by anything until it gets to the first >, which should mark the end of the tag
Then it should match anything until it gets to the first </a>, which should close the opening tag.
And it can forget everything except for that last match.