How does a search engine treat these URLs?

Forum Moderators: phranque

Message Too Old, No Replies

How does a search engine treat these URLs?

Are they the same?

Patrick Taylor

9:16 am on Jul 27, 2006 (gmt 0)

I have a series of web pages, eg:

ht*p://www.site.com/index.php?main_page=index
ht*p://www.site.com/index.php?main_page=about
ht*p://www.site.com/index.php?main_page=contact
etc etc

They all seem like a homepage, but only with different parameters.

To a search engine, are these separate URLs, each indexable with their own content? I should know the answer to this, but actually I'm not so sure.

trillianjedi

9:43 am on Jul 27, 2006 (gmt 0)

To a search engine, are these separate URLs, each indexable with their own content?

Yes.

kaled

11:25 am on Jul 27, 2006 (gmt 0)

I raised this matter some weeks ago but received zero replies.

Search engines require a means of identifying url parameters as either being content-related (index separately) or function-related (index as a single page).

The method I proposed was that all parameters after a null parameter should be ignored by search engines, thus:-

page.html?param=data&&ignoreme=data2
page.php?&ignoreme=data

There may be a better way - any suggestions?

Kaled.

trillianjedi

12:49 pm on Jul 27, 2006 (gmt 0)

Search engines require a means of identifying url parameters as either being content-related (index separately) or function-related (index as a single page).

I've always seen this as a subject of good design - i.e. up to the webmaster to create good URL's. Every content URL should be unique. Further standards shouldn't really be necessary as long as the URL structure makes sense in its use.

Wherever I've had URL's which represent some form of duplicate content and extend a pre-existing URL, I've always used "NOINDEX" tags inside the page header to tell SE's not to index.

For example:-

example.com/page1.html
example.com/page1.html&printview=true

Same page, same content, different markup. In this case the "printview" page is served with a NOINDEX tag in the header.

mod_rewrite also makes most of this quite easy work by making variable names appear to the browser or spider as directories and then blocking those directories from being indexed by using robots.txt

lorax

3:47 pm on Jul 27, 2006 (gmt 0)

To a search engine, are these separate URLs, each indexable with their own content?

Maybe. It really depends upon the engine.

Patrick Taylor

7:22 pm on Jul 31, 2006 (gmt 0)

Thanks for the replies. I suppose the search engine is Google, and they're not URLs I have much control over.

[edited by: Patrick_Taylor at 7:24 pm (utc) on July 31, 2006]

Jim Catanich

2:28 pm on Aug 1, 2006 (gmt 0)

The question should be posed differently. How do I get the SE's to index "query string" paramater pages? The following is being done by most dynamic application (i.e. ASP / PHP)

a.asp?part-number=016-1660-01
a.asp?part-number=02684-00002
a.asp?part-number=02684-00003
a.asp?part-number=02684-00004
a.asp?part-number=02684-60001

If the above links are "hard coded", then it is will be indexed. But nothing below it will be. Dynamic code design sets up very serious problems for all SEs. Think about the following scenario:

if i= 1 to 1,000,000
if j= 1 to 1,000,000
a.asp?id=<%=i%>&op=<%=j%>
end if
end if

This type of dynamic looping would create a billion pages and hang the SE.

The best work-around is to create a "static" Sitemap with all the URL in it. The SEs will index this everytime. But they will not allow any scenario the could create a "loop".

Jim Catanich

[edited by: trillianjedi at 4:20 pm (utc) on Aug. 1, 2006]
[edit reason]
[1][edit reason] Examplifying - please see TOS, thanks. [/edit] [/edit][/1]