Forum Moderators: open
I work for a large Website in Germany. The Sessions on our application are configured to pass their ID in the URL if the user does not accept cookies. Certainly Googlebot does not accept cookies, so we ended up with lots of different URLs for the same page in the Google-index. Easily spotted with the "inurl:"-parameter in a query on Google.
Our workaround for that problem involves disabling the sessionid in the URL for Googlebot. (It makes no sense to have them for bots anyway, since Googlebot will most likely never log into our application with a user account and buy something :-) )
Now my company had some consulting from Google Germany. And one engineer there stated that disabling the sessionid in the URL for Googlebot will be counted as cloaking and will eventually lead to exclusion from the index. On the other hand the engineer stated that we have to avoid duplicate content (same content with different URLs on the same domain) by all means. To me these 2 statements are completely contrary and make no sense since booth can not be followed at once. Is there an official Google-Way to deal with Sessions in URLs?
Will using sitemaps to communicate to the "official" URL eliminate the duplicate URLs from the Google-index?
thanks for any help
Sebastian
p.s.: this is a cross-post from here:
[groups.google.com...]
but noone cared to answer there :-(
[edited by: encyclo at 2:57 am (utc) on Feb. 21, 2008]
[edit reason] fixed formatting [/edit]
Our workaround for that problem involves disabling the sessionid in the URL for Googlebot.
I wouldn't only do it for Googlebot but for Slurp and MSNBot too. And all other well behaved bots.
It makes no sense to have them for bots anyway, since Googlebot will most likely never log into our application with a user account and buy something.
That's a sensible conclusion and one that I think Google and the majors would appreciate. Anything to assist them with indexing the site and not getting caught up in duplicate content, loops, etc. is an added benefit for you and for them.
My understanding is that cloaking by IP in this instance is probably the best solution.