I'm in the middle of trying to get pages to our site re-indexed. Some time ago a canonical URL tag was added to our pages, but the URL was just the homepage (for every single page on the site). My best guess is that this caused Google to stop indexing those pages. As of 2/22 this was fixed so that most pages just pointed to themselves as canonical (the main problem being that some pages were being linked to as index.php?cPath=##### instead of our rewritten URLs, now those list the rewritten URL as canonical but most every other page just lists itself). Two problems I've encountered watching to see if the pages are getting indexed again. One is that still about 80% of the main subpages still aren't indexed. I can't tell if I'm just being impatient or if I should be worried that it's been in excess of two weeks and these pages are linked to on every single one of our 600 or so indexed pages.
The other problem is a little stranger. Some pages that Google has started to index are dramatically incorrect URLs that have zero links pointed to them. The problem seems to be that the CMS takes something like index.php?cPath=####_yyyy_zzzz to build breadcrumbs on Page zzzz (showing the structure as ##### > yyyy > zzzz). However, this means that the page could theoretically display using any number of paths (just cPath=zzzz, or cPath=www_yyyy_zzzz, pretty much cPath={anything}_zzzz). As I've been watching for pages to re-enter Google's index, I've noticed Google indexing some of them using a seemingly arbitrary path for the page. But these pages have absolutely no incoming links. What gives? Anyone have any idea what could be causing this? Clearly a page with zero incoming links is never going to show up in a regular Google search.