Forum Moderators: coopster

Message Too Old, No Replies

Regex: find string matches within anchor tags

match href and link text

         

Scooter

1:25 pm on Nov 26, 2008 (gmt 0)

10+ Year Member



hello, I am looking for a regex that will find matches in the href and/or the link-text and enable me to replace the entire link with some text if any sort of string match is found.

I have a regex that seems to do a match for the href portion, and enables me to replace the entire link with ordinary text with preg_replace. But this won`t match the link-text portion.
$regex = "#(<a[^>]*?)$value(.*?)<\/a>#si";

And this regex seems to do a match for link-text, but when I try to preg_replace the entire link, it only replaces the link-text portion and not the entire anchor tag.
$regex = "#(?!<a*?)($value)(.*?</a>)#si";

basically I`d like to find string matches with everything contained within the open and close anchor tags <a ,/a>, and replace the entire anchor tag if any matches are found. which I think would probably suffice my needs. unless someone can easily comeup with some href/link-text matching regex.

Any help would be appreciated.

[edited by: eelixduppy at 8:56 am (utc) on Nov. 27, 2008]
[edit reason] disabled smileys [/edit]

coopster

4:05 pm on Dec 1, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Are you trying to replace the entire anchor every time, whether an href was matched or the node's text was matched? Perhaps a plain example would help. For example, if the link was
<a class="mylink" href="http://www.example.com/page.htm">My Page</a>

... what would be the string you are attempting to match? And what would the replacement anchor element look like after the replacement?

Scooter

5:08 pm on Dec 1, 2008 (gmt 0)

10+ Year Member



I am trying to replace the entire anchor every time with a plain string when there is a match. Anykind of string is fine.

for example say I am looking for a match(case insensitive) for the string "my", "page" or ".com" I would like to replace the anchor code below with just the text "test"(no tags, nothing, just text).

<a class="mylink" href="http://www.example.com/page.htm">My Page</a>

I`m assuming that with the strings I am looking for(e.g. ?a=r), it is unlikely that a match would occur in the attributes but I`m just compromising with what I think is the easiest solution as I am so stuck with this.

If a string comparison can be made in just the href and link-text that would be perfect (I just couldn`t get anything to work like that). If not something that can find matches in the entire anchor tag would still work for me.

This was what I was trying to get to work, if anyone can help..
[webmasterworld.com...]

$regex = "/<a\s[^>]*href\s*=\s*([\"\']?)(".$filter_string."[^\" >]*?)\\1[^>]*>(.*)<\/a>/siU";

SarK0Y

3:16 pm on Dec 6, 2008 (gmt 0)

10+ Year Member



Hi, Scooter.
i suggest to capture it so:
$patterns1 = "/\<a.+href=\s*\'\s*([\w\.]*)\s*\'[^\>]*\>/i";
$postText="<a id='5' href='my.com' >my5</a>";
$matches=null;
preg_match_all($patterns1, $postText, $matches);
then sieve $matches and replace what you want.

[edited by: SarK0Y at 3:43 pm (utc) on Dec. 6, 2008]

Scooter

10:21 pm on Dec 6, 2008 (gmt 0)

10+ Year Member



Thanks SarK0Y,

Say if I was trying to find/replace certain links in a webpage how would I go about it with preg_match_all, as preg_match_all only stores the matches right? (currently I`m going with preg_replace)

Also if I understand your suggestion correctly, I think your solution will find all the anchor tags in a web page. I`m only looking for certain links with specific strings in them, so that`s why I have a php variable in the regex.

Let me know if I understood you correctly.

SarK0Y

12:29 am on Dec 7, 2008 (gmt 0)

10+ Year Member



Hi, Scooter.
if u need to replace patterns to string/(array strings) immediately - use preg_replace; if string's building has dynamic - use 'preg_replace_callback'; if replacement needs addition checks and no all strings may go to replacing then use 'preg_match_all' and sieve gotten result(array).
//add
in your case, i don't know how to build regex without using '(...)' and so i suggested to use 'preg_match_all'.

[edited by: SarK0Y at 12:35 am (utc) on Dec. 7, 2008]

doodlebee

3:48 am on Dec 7, 2008 (gmt 0)

10+ Year Member



I don't know if this is *exactly* what you need, but if I'm understanding your question correctly, your'e looking for anchor tags that have certain input in them, and that will be replaced?

I've used this (or a version of it - sometimes I need to mess with it a little bit) several times and it works nicely:


preg_replace('/<p>\s*<!--(.*)-->\s*<\/p>/i')

the section with "<--(.*)-->" basically looks for a string in my content that looks similar to a comment, such as <-- text here -->. The (.*) part covers the "text here". it can be anything. it's just looking for the "<--" at the beginning and "-->" at the end, and replaces anything in between it with whatever I want.

It also looks for <p> at the beginning and end...the \s* handles anything after the "<p"> and before the "<--"

Hopefully that's what you're looking for :)

Scooter

10:07 am on Dec 7, 2008 (gmt 0)

10+ Year Member



Thanks Y`all!
I think I`ll go with preg_replace.

Doodlebee, do you know any ways to replace the entire "<!--(.*)-->" section with whatever you want, rather than what is between the "<--" "-->" comment tags?

I might be able to work with that, if I can put a php variable in it. Which is my other question. If I was using DOS style wildcards (*), I`d be looking for the regex equivalent of *$match_string*, as in find any string that has $match_string in it regardless of what is before or after the $match_string ( and in between the <a> tags). If match is found then replace the anchor tag and what is in between with something.

Currently I`ve come up with versions of *?$value(.*?) like in the first post in this thread, but I can only match the link and not match with link-text, or not replace the entire anchor tag.

Basically I want to replace a link with a string if a certain kind of string is found between the anchor tags.

[edited by: Scooter at 10:11 am (utc) on Dec. 7, 2008]

doodlebee

2:45 pm on Dec 7, 2008 (gmt 0)

10+ Year Member



Yes. The above code basically looks for "<p> some text here <-- comment here --> more text here </p>". It then replaces that whole line *except* for the tags - just replaces anything it finds between them. (That's why I said it's my "base" thing - because it can apply to so many different things. I just edit it as I need to.) I haven't had to do it in a long time, so I can't recall off the top of my head what section you're looking for, but I *believe* if you just want to replace the entire link, you would do:


preg_replace('/<a\s*(.*)\s*<\/a>/i','replacement text here',$content);

I *think* that might do it (the "\s" is basically for spaces, and the * is a wildcard) - but you'll have to play with it and see.

Just for the record, the base thing ("

preg_replace('/<p>\s*<!--(.*)-->\s*<\/p>/i', 'replacement', $content)
") is what I use is a function for WordPress stuff. So when I'm in a WordPress post, I can do something like "<-- command here -->" and the function searches the post ($content) for the stuff above, and replaces it with 'replacement'. (Hopefully that gives you a better idea of what I use it for.) It's very versatile.

I might be able to work with that, if I can put a php variable in it.

you should be able to. Going on what I said before:

preg_replace('/<a\s*(.*)\s*<\/a>/i','$var',$content);

for if you want what it's replaced with to be a variable. I can't imagine the link itself would be a variable though!

I`d be looking for the regex equivalent of *$match_string*, as in find any string that has $match_string in it regardless of what is before or after the $match_string...

You might be looking for strpos(); so you can do:


if(strpos($content,$var) != '') {
preg_replace('/<a\s*(.*)\s*<\/a>/i','$var',$content);
}

where $var would be what you're looking for, and if it's found, then do the replacement. strpos() (if I remember correctly) will looks for the exact characters (case-sensitive...stripos() for insensitive), and if they aren't alphanumeric, it'll turn it into either "1" or "0". it's a little tricky sometimes - I find a lot of times I have to use the "===" instead of "!=" or "==" to make it work.

Basically I want to replace a link with a string if a certain kind of string is found between the anchor tags.

If I were to write a function for this, it'd probably look something like this:


function find_link($content) {
$find = '<a' . (*) . '</a>';
$replace = 'text replacement here';
if(stripos($content,$find) != '') {
$content = preg_replace($find,$replace,$content);
}
return $content;
}

if it's a particular *kind* of link - for example, a certain site you're looking for, then you could change the above $find to:

$find = $find = '<a' . (*) 'somesite.com/' . (*). '</a>';

I don't know how accurate that is - it's totally untested (and I'm currently experiencing PHP brainfry this weekend - so I'd expect if you tried this right out of the box, it'll put out errors, knowing my luck this week) but it should give you a starting point. I'm also not quote sure how you'd actually make the function run - I typically code for WordPress, and that has built-in functions that you can readily use for custom coding (in the above case, I'd end it with "

add_action('find_link',$content)
" so it would search posts for the link).

But that should give you something to go on anyway.

SarK0Y

7:06 pm on Dec 7, 2008 (gmt 0)

10+ Year Member



Hi, Scooter.
we have two possible tasks:
>>simple task: find '<a....href="my.com"....>.......</a>' and replace to '<a href="newmy.com">...</a>' - for it, quite enough alone regex with applying it to html.
>>more difficult task: insert new url with preserving all options of the '<A>' - here, dispense with additional code and only preg_replace. it's fantastic, imho. and if it's possible - show me this regex, Please!

Scooter

3:26 am on Dec 8, 2008 (gmt 0)

10+ Year Member



Thanks. I tried out all kinds of regex, haven`t found the right one yet.
I`ll keep at it.

Does anyone know why
$regex = "#(<a[^>]*?)$value(.*?)<\/a>#si";

would only find string matches contained in the
"<a >open anchor tag" and not the link-text between the tags as well?

SarK0Y

9:51 am on Dec 9, 2008 (gmt 0)

10+ Year Member



>>$regex = "#(<a[^>]*?)$value(.*?)<\/a>#si";
maybe, so: "/(\<a[^\>]*\>)$value(.*?)\<\/a\>/si";

[edited by: SarK0Y at 9:56 am (utc) on Dec. 9, 2008]

Scooter

2:53 pm on Dec 9, 2008 (gmt 0)

10+ Year Member



came up with this: but matches only link-text
$regex = "#(<a[^\>]*\>)$value(.*?)<\/a>#si";

wondering if I need something like this:
regex = "#(<a[^\>]*?$value.*?\>)$value(.*?)<\/a>#si";
(but this doesn`t match anything.)

[edited by: eelixduppy at 6:26 pm (utc) on Dec. 9, 2008]
[edit reason] disabled smileys [/edit]

SarK0Y

3:46 pm on Dec 9, 2008 (gmt 0)

10+ Year Member



hmmm... '>'/'<' - special symbols, and so must be escaped with '\' and else i suggest you regex, "/\<a\s+.+href=\s*\'\s*([\w\.]*)\s*\'[^\>]*\>/i"; thanks to '(...)' you will got necessary part of the regex and replaced it to what you desire.

Scooter

4:32 pm on Dec 9, 2008 (gmt 0)

10+ Year Member



SarK0Y, first thanks.

As far as I know I don`t think '>' '<' are special symbols in regex.

Can you explain to me a little more about what you are explaining with '(...) ' without $value variable as you totally lost me there.

SarK0Y

12:14 am on Dec 10, 2008 (gmt 0)

10+ Year Member



Scooter, 'preg_match_all' builds multidimension array saves full matches and matches by subpatterns. '(...)'- subpattern. simple example:
$str="<a id='5' href='my.com' >My</a>";
$ptn="/\<a\s.*id=\s*\'([\w\.]+)\'\s.*href=\s*\'([\w\.]+)\'[^\>]*\>/";
preg_match($ptn, $str, $matches);
print_r($matches);
output:

Array
(
[0] => <a id='5' href='my.com' >
[1] => 5
[2] => my.com
)

[edited by: SarK0Y at 12:17 am (utc) on Dec. 10, 2008]

[edited by: eelixduppy at 12:32 am (utc) on Dec. 10, 2008]
[edit reason] formatting [/edit]