Forum Moderators: phranque
thanks,
I'll throw my $0.02 worth in ;)
Outside of SEs you must also consider your target audience once they have discovered the resource. Are they going to have a Microsoft Word application or MS Word reader program available? Many folks out there do not. Most will have a PDF reader, but all of them will obviously have an HTML reader (web browser). Convert to HTML, the language of the web and you'll cover all bets.
PDF is also the format of choice when it comes to creating eBooks on the Internet - hardly anybody is creating .exe eBooks any more, they are all in PDF format so you don't need to worry about people not knowing how to read a PDF.
Also, Google can index the contents of PDF Documents.
Sean K.
I have a similar concern with some large tutorials I have produced, which are full of pictures and tables.
I initially used PDF, but then I realised that this format may be very usable, but it is not search engine optimised.
Only the 1st page of the PDF file is indexed.
Thus, if you want this content to be fully indexed it has to be in HTML.
K
- the format is a closed, binary one whose longevity is not assured. Over time, changes in the Word format mean that older documents don't display correctly in newer versions.
- not all users have Microsoft Word (or the version of Word that you are using), and alternative programs may not be able to read the file
- with Word you can usually backtrack through the corrections and changes made to a document, so the end user has the final version plus an editing history. This can be awkward to embarrassing dependent on the content!
You should choose the most appropriate open format that suits your needs. The three best options (as suggested by others above) are PDF, HTML or RTF. Each has their advantages and disadvantages.
PDF is best for large documents that you want your users to download but not edit, and it preserves the fonts, formatting and images across all platforms. HTML is easiest for compatibility, but is best for online viewing. RTF is great if you want the user to be able to download then copy or edit the contents.
Have you noticed recently how PDF files are taming longer to show onscreen? The latest PDF reader seems to insist on the whole file being loaded before displaying anything of use to the reader.
This goes against the whole purpose of the PDF format imho.
So - I'd still go with PDF, but ask (or force) your readers to right click and save to their hard drive.
Adobe really should remember their audience.
It's a shame the 'Fast Web View' / Linearized option is not always switched on by default in all PDF Converters.
In Acrobat Reader you can check if your PDF is linearized or not by opening the PDF, then from the File Menu select "Document Properties" and it tells you in there.
Let the reader know in advance that it is a PDF file - too many people put a link up for a pdf file without specifying it is a PDF - this annoys most people.
--justablink
If you want your Browser to load each page as needed the PDf must be linearized.
Not sure what that means, but I'll go and see if I can fix my problem. What if the original file was PDF'd in the year 2,000 and is password protected - Can I still "linearize" it?
I think html would be best, but for the longer articles (just received one that was 55 pages) I should stick with pdf, and users can print easier if need be.
One way of handling this is to write a brief overview/excerpt/table of contents/etc of the PDF contents and publish this as a HTML page, with the PDF linked on it (i.e. one HTML page for each PDF).
This way you can get a nice search engine-optimised HTML page, with the PDF available for people who want to read more - and you don't spend all your time converting massive PDF's into HTML pages ;)
I keep documents as HTML (no menus or logos) and then modify them automatically as follows:
a) For HTML - adding menus, logos etc. on-the-fly
b) For PDF - change certain HTML to LaTeX codes and strip other tags, then pdflatex it (install tetex-latex on *nix)
c) For email - wordwrap to a width of 72, use custom replacements for headers (e.g. H2 -> upper case with a row of stars below, H3 -> upper case only etc.), strip out all remaining HTML
In the Dialog Box that is displayed click on "Clean Up" and checkbox the "Optimize the PDF for Fast Web View".
Basically what Linearization does is to put the unternal PDF structure in linear order so the Browser just loads the PDF data line by line without having to parse the entire PDF Structure first.
Sean K.