are there any valid and up to date information which pdf-filesize google indexes for sure (and make searchable!).
There are any "dusty" statements speak about 100kb, can´t verify this for my experience - must be much more...?
Greetings from the seaside Dietmar
Tastatura
7:29 pm on Feb 1, 2007 (gmt 0)
Hi and welcome to WebmasterWorld!
I don't think G cares how big the file size is
[edited by: Tastatura at 7:29 pm (utc) on Feb. 1, 2007]
seaside
7:44 pm on Feb 1, 2007 (gmt 0)
> I don't think G cares how big the file size is
Yes. Google *reads* every filesize but obviously not the whole content of these big files becomes part of the Google index.
Possibly this is matter of the general structure and content of the site.
-- seaside
Tastatura
8:15 pm on Feb 1, 2007 (gmt 0)
For “regular” pages G doesn’t care about file size (per public statements by G officials), and since it can read pdf format, it would really surprise me that it cares about pdf file size. I am not sure, but from your response you might be mixing causality and correlation. Reading, indexing and ranking are separate affairs. SE might find the content (read it), and decide, for various reasons, that it doesn’t want to put it into (main, etc.) index. So it might read the whole file, and hence don’t care about file size, but it might or might not put it into the index. Similar thing for ranking. In a nutshell and very simplistically, SE needs to find content, consume it, figure out in which bucket it goes, and at what position in the bucket.
seaside
8:56 pm on Feb 1, 2007 (gmt 0)
> So it might read the whole file [...] but it might > or might not put it into the index.
Yes - BUT between "might or might not" there are specific circumstances/conditions when G put only a *part* of a pdf-file into the index - and i´m looking for the conditions which determine the size of the part.