Forum Moderators: phranque
My brain hurts.
A description for this result is not available because of this site's robots.txt – learn more
Wouldn't it be more practical to switch off auto-indexing
from your description it sounds like the content in the what-is-here.html file should actually be in the index.html fileThat is exactly right, just wondering if I am better off redirecting the index.html file to the what-is-here.html page in each subdirectory or renaming what-is-here.html to index.html
regarding your meta noindexed content showing up in the indexI am not having this problem, but I can see where the confusion is. The index.html pages are not being indexed, they do not appear in the sitemap and I'm sure they are crawled and ignored. They are not blocked in robots.txt. My problem was that although the index.html files are not in my sitemap, the subdirectories are listed and if crawled, that URL would serve up a dummy index.html file. I can control what goes in the sitemap by file extension but folders/directories are only either on or off, I can't list every page in a subdirectory without listing the subdirectory as an URL too.
Can't you tell your sitemap script to index only files with certain extensions? If there's nothing in there but images, there's no reason for it to be on the sitemap at all. G### will find the individual images; they're linked from pages.Yes, it is not images that concern me, but you just gave me an idea.. If I rename the index.html pages to index.php they will not be giving the subdirectory URLs to the sitemap script because .php files are ignored and I would not need to do anything else. My htaccess is only redirecting requests for index.html to the subdirectory URL. If I rename them to index.php that should keep the subdirectories out of my sitemap.
renaming what-is-here.html to index.html
http://www.example.com/subdirectory/index.html
http://www.example.com/subdirectory/index.php