Forum Moderators: open
So while this page may have ranked for several years:
http://www.example.com/gallery/nid-12345/keyword-phrase.aspx
If you request a garbage URL from your own site, do you end up somewhere? If so, where? What's in the address bar? What's in the body of the page? If the page is blank, is there any source code?
While a URL might be requested as example.com/gallery/id-12345-page-name-here by someone "out there" on the web, that request will be mapped via an internal rewrite to a function like /gallery.aspx?id=12345&name=page-name-here or similar. The fact that the URL request has been fulfilled by the .aspx file means the server returns 200 OK. This happens even when there is no content in the database matching this request. It's therefore up to the ASPX script to return a 404 status code and error page in that case.
With correct design of the complete system, duplicate content and "soft 404" problems can be completely eliminated.
Another idea I've seen was that they are looking for new content, but the potential URL list for that for even a small site is ludicrously prohibitive.
It should do so by looking up that text in the database, and then either return 404 if it doesn't match what was requested or else redirect to the correct version of the URL.
The SERVER returns 200 IF it can find a page corresponding to the request, either by direct URL or by an internal redirect (eg if 404 then return home page with 200) but that is a deliberate decision by the site designer or hoster. If there is no database content there should be a section in the script that says: "Invoke 404".
can you speak at all to how resource intensive these lookups would be on the database?