Forum Moderators: open

Message Too Old, No Replies

Sitemap.xml file creation for 500000 pages

Sitemap.xml file creation for 500000 pages

         

sviba

11:37 am on Feb 15, 2007 (gmt 0)

10+ Year Member



I have created the sitemap.xml file. My sites contains 500000 pages. So I splitted the sitemap.xml file into multiple files like sitemap1.xml, sitemap2.xml etc... Each files contains 50000 URLs.

Actually, My site is in subdomain... URL like http://mysite.example.com/.

So i have created my sitemap files on
http://mysite.example.com/sitemap1.xml
http://mysite.example.com/sitemap2.xml etc...

Also i have created the sitemap_index.xml files on
http://www.example.com/sitemap_index.xml

The sitemap_index.xml contains

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mysite.example.com/sitemap1.xml</loc>
</sitemap>
<sitemap>
<loc>http://mysite.example.com/sitemap2.xml</loc>
</sitemap>
etc........
</sitemapindex>

Please tell me whether the file creations is correct or not....

[edited by: encyclo at 3:54 am (utc) on Feb. 17, 2007]
[edit reason] switched to example.com [/edit]

sviba

4:16 am on Feb 20, 2007 (gmt 0)

10+ Year Member



Please tell me whether the sitemap_index file location is correct or not........

DXL

7:06 pm on Feb 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Each files contains 50000 URLs.

From what I've read on here before, I was under the impression that spiders will only access a certain number of links on a sitemap, and that number wasn't remotely close to 50,000.

[edited by: caveman at 5:57 pm (utc) on Feb. 26, 2007]

SandySEO

10:42 am on Mar 5, 2007 (gmt 0)

10+ Year Member



Yes DXL, you are right. Search Engines like Google reads 500-1000 urls in a page. So 50,000 no. is too big.

-SandySEO

sviba

10:50 am on Mar 5, 2007 (gmt 0)

10+ Year Member



Please refer the site "sitemaps.org/protocol.html". We can list 50,000 URLs or file size as 10mb.

[edited by: caveman at 5:02 pm (utc) on Mar. 5, 2007]

caveman

5:32 pm on Mar 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sviba,
"example.com/" and "mysite.example.com/" are technically different sites.

From sitemaps.org:

All URLs listed in the Sitemap must reside on the same host as the Sitemap. For instance, if the Sitemap is located at http*//www.example.com/sitemap.xml, it can't include URLs from http*//subdomain.example.com. If the Sitemap is located at http*//www.example.com/myfolder/sitemap.xml, it can't include URLs from http://www.example.com.

On the assumption that sitemap indexes are treated similarly to sitemaps, I'd place each index at the root level of it's own site. Then each index can cover all pages under the root.

You can also validate your sitemaps, which given your example and the number of pages you want to have indexed, is probably a very good idea. ;-)

sviba

4:14 am on Mar 6, 2007 (gmt 0)

10+ Year Member



Now i have created the sitemap file in
http*//mysite.example.com/sitemap_index.xml

The sitemap_index.xml contains

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://mysite.example.com/sitemap1.xml</loc>
</sitemap>
<sitemap>
<loc>http://mysite.example.com/sitemap2.xml</loc>
</sitemap>
etc........
</sitemapindex>

Also sitemap1, sitemap2.... are in
http*//mysite.example.com/

Please tell me whether the file creations is correct or not....

Thanks...

[edited by: caveman at 7:00 am (utc) on Mar. 6, 2007]

sviba

4:18 am on Mar 6, 2007 (gmt 0)

10+ Year Member



sitemap1.xml contains first 50,000 URLs ,
sitemap2.xml contains next 50,000 URLs etc...

For example,

sitemap1.xml
<loc>http://mysite.example.com/page1.html</loc>
<loc>http://mysite.example.com/page1/sub-page1.html</loc>
<loc>http://mysite.example.com/page1/sub-page1/business1.html</loc>
etc....