Forum Moderators: open
[sitemaps.org...]
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
'Together we're announcing [sitemaps.org...] which provides details of the current release of the Sitemaps protocol and will include future updates as we continue to collaborate on this common protocol. By offering an open standard for web sites, webmasters can use a single format to create a catalog of their site URLs and to notify changes to the major search engines. This should make is easier for web sites to provide search engines with content and metadata. And in turn, search engines can spend less time crawling unchanged pages and can update indexes faster as new content is discovered. This will help us reflect the changes more quickly, and improve our ability to provide more timely and relevant search results for users. Sitemaps is available to any site owner who wishes to communicate more easily with participating search engines. Simply create and upload an XML Sitemap and submit the URL of the file to search engines.'
A who is on domain says it created On:12-Aug-2001 and owner is Google
[edited by: Brett_Tabke at 2:49 pm (utc) on Nov. 16, 2006]
[edit reason] fixed link [/edit]
I just wonder if i have a sitemap like this, is it still a bad idea to go more then 2 levels deep as i see around 4 level pages with PR5.
You were said at [webmasterworld.com...]
All pages should be linked to more than one other page on your site,and not more than 2 levels deep from root.
Thanks
[edited by: encyclo at 3:06 pm (utc) on Nov. 16, 2006]
[edit reason] fixed link [/edit]
It all just looks like a repackaged and debranded version of the Sitemaps documentation on Google's site.
So is the news just that the Sitemap schema should point to the new domain, and that Yahoo (and Microsoft?) will now accept it?
I wonder if Yahoo SiteExplorer will continue to accept other formats (not that I won't be sending them to the sitemap schema'd file now, rather than an Atomized version).
Anyone have something tangible with regard to Microsoft? There isn't an existing sitemap service for MSN, is there?
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</urlset>
The one problem with the current system is that if you have different subdomains, you need a different sitemap for each subdomain. They don't allow one sitemap linking pages on another subdomain.
Example:
mysite.com site maps can't link to sub.mysite.com, you need to put another site map at sub.mysite.com just for pages under that domain. Does it make sense or am I tripping here?
Thanks
someone said in summer that the spiders are constantly fighting with 40 % broken links. I assume they are fed up with dedicating their bandwith to that crap. Search engines will always perfrom some spidering the way they used to, but I guess sites with correct sitemaps of this kind will be spidered much more often/regular in the future.
> So, can we submit the exact same sitemap file to google and yahoo now?
If I remember correctly the only difference to the sitemap the webmaster-central-console requires is the urlset-tag. It would be helpful if one of the google insiders could confirm if or when we might alternatively submit the urlset-tag listed in the example kapow gave.
I understand how to submit (http://www.sitemaps.org/faq.html#faq_after_submission), but I was wondering if anyone knew what the ping url's were so I can submit automatically without creating specific Webmaster Central and Site Explorer logins:
You can also submit your Sitemap using an HTTP request (replace <searchengine_URL> with the URL provided by the search engine):
Issue your request to the following URL:
<searchengine_URL>/ping?sitemap=sitemap_url
1. Duplicate Content
2. Page not found errors (as Oliver Henniges menthions above)
3. Faster refreshes (due to less downloading)
There were many many discussions about webmasters who complained that SE's keeps on spydering and listing pages that weren't ment for that - mostly when it comes to dynamic content.
I think all 3 SE's have problems with listing duplicate pages. If there is a website like [sitemaps.org...] - webmaster can specify which on his page are to be spydered and listed only once - and all the SE's will follow...
This will also help to get rid off "not found pages" faster - and will make refreshes faster, since SE's will download less and faster...
In general: thumbs up to Google, MSN and Yahoo!
You can also submit your Sitemap using an HTTP request (replace <searchengine_URL> with the URL provided by the search engine):
Could anybody please tell me that is the <searchengine_URL> for each Google, Yahoo, and MSN?
For some reason they didn't include the most important info... duh...
I think this is a delicate business. Google most certainly doesn't use XML sitemaps very much, though the ideals are lofty.
a) It's the only way a webmaster has of providing subjective input to the crawler - through the <priority> tag you can tell the crawler which of your pages you prefer over others. If it comes to a choice between two of your equally-ranked pages, the one you give the highest priority to should be presented first in the SERPs.
b) There is potential for bandwidth savings, though I'm not sure this has been thought through properly. First there is <lastmod>. In theory - with a "trusted" site - the crawler can use <lastmod> to decide which pages it should spider. This cuts both ways. If I find a minor and trivial typo, I fix it and upload the page. Today, the spider will spot the change on its next HEAD or "Get if modified" and upload the page for reindexing. Pointless, because anyone visiting the page will see the change anyway. So I could make the change, upload the page, and NOT modify the <lastmod>. Save the crawler bandwidth. Or if the change is significant, I update <lastmod>, reload the sitemap and resubmit it.
There are other ramifications, and I'm in the process of writing a web page. The conceptof a "trusted" sitemap looms and promises to be a nastier issue than -30 penalties. A lot of the tools generate XML sitemaps with the current date in <lastmod> and a <changefreq> of daily - anyone expecting to fool a search engine thus is as dumb as Shrub.
This is how I have been using sitemaps. I have a lot of archive news articles on one site I don't want Google to bother reindexing on a regular basis but I would like it indexed once as it is an important reference for some people, so I set change frequency to 1 month. At the moment it hasn't had much affect on the spider but we'll see in the future.
I have a number of pages indexed where I present a random collection of our products to my customers. The only way to account for this correctly in a stiemap with respect to this lastmod tag would be to let the xml-file run through the php-parser (or anyother language) and insert today's value. I don't do that yet, i'm happy if the pages are indexed at all, so I leave the lastmod date on the day I changed some of the rest of the pages.
I assume spiders cannot read the correct lastmod-date from the filesystem, can they? How would you handle this?
I think this will not just reduce the spider technical overhead, but potentially create a legal protection for the search engines.
If the spiders can be targeted by the webmasters' direction through sitemaps (verses 'just' scanning through everything), the webmasters no longer have a case to claims of 'illegal copying' or similar.
[edited by: Tapolyai at 9:01 pm (utc) on Nov. 17, 2006]
1. Spam will have a much reduced affect on the search results and therefore the quality of results should increase.
2. Quicker refreshing search indexes with better quality results as well as reduced bandwidth usage for both publisher and search engine.
3. In the future, with the sitemaps becoming standardized and universal it will enable small business sites to get indexed without having to redesign their site due to poor crawlability.
4. Content that is never updated or rarely updated should not be constantly crawled by search spiders and this can save a lot of bandwidth on heavy traffic sites especially.
The very fact that it has taken so long for some form of agreement to be made shows that the search engines took their time before looking at collaboration as ultimately all of them gain from better quality information rather than just relying on their own crawlers.
Does anyone have the sitemap ping addresses for Google and Yahoo?
I found these in someone's blog:
[google.com...]
[search.live.com...]
[siteexplorer.search.yahoo.com...]
I haven't tested them though.