Re: pdf vs txt or html

John St. Julien (stjulien who-is-at UDel.Edu)
Tue, 13 Jul 1999 16:04:10 -0400

XMCAers,

I use PDF files in my own teaching and colleagial exchanges. There are
problems but I see different ones from those that Martin cites and think
that overall it is a good alternative. Some quick comments on Martin's
points:

>- PDF files do not open their contents to web search engines.
This is a not true. It is true that most search engines do not know how
to read PDF files. (Infoseek is supposed to index PDF files.)

>- PDF files do not provide internal anchors for external hyperlinks.
True, and a good point. You can have a PDF file open to an index but that
is not the same.

>- PDF files are huge and slow to download.
Not necessarily. This is the way we often encounter them but this
almost universally a product of decisions made by the person formatting the
PDF file. What leaves this impression is the fact that when you use the
format by simply scanning in pages as images you do get huge files. But
that is a product of the data type, not the format. An HTML file with the
same images called would actually be "larger" and take longer to download.
PDF files are very efficient and compact -For only moderately lengthy
files, if you have direct access to the text and images you can save a PDF
file that is smaller than a like html file with pictures.
A nice feature included since at least version three is that you can
download PDF files incrementally. Once the first page is downloaded,
subsequent pages download in the background while the user can read the
initial pages.

>- PDF readers impose system requirements that effectively exclude
> users with older and smaller systems.
This is a problem; and it will get worse as the format evolves and
readers are no longer made for older systems. But...the pdf creator can
decide on the level of compatibility. "Modern" readers are readily
available for: Mac: 68020 chip with system 7 and PC: 386 chip with Windows
3.1. These are old systems to be using on the web at all; older systems
than these will be strained for reasons well beyond PDF files.

>- Regarding authenticity, it is unclear how a simple text file
> can be un-authentic.
Hmmn, I have the opposite immediate reaction. No one publishes or even
writes in ASCII text. (There is no bold face or italic in ASCII.) How can
stripping away the context of publication be authentic? That would be so
only if the sequence of characters is the true, complete meaning of the
work. If the design of the work matters at all it is not. In the extreme:
consider trying to read Derrida's _Glas_ as a text file. Less extreme:
reading Dewey's _Logic_ in the original face and layout lends a contextual
meaning to each word read. Each period has its style. ...I do miss the
yellowing pages though. :-)

It seems to me that there are very large advantages to the fixed, original
layout for the practices of learning in academic communities. You can ask
students or colleagues to bring a marked up reading to discuss and they
will be able to refer to page numbers, column breaks, and the image on the
left side of page 103. Makes referencing a lot simpler too.

>- Regarding the likelyhood of editing the original, isn't that a
> function of server security?

Not in PDF-it is also a function of the security level encoded in the file.
(Though the default is to be entirely "open" to all readers.)

I do wish that PDF were not proprietary. I'd like someone to develop a fat
format that would let the reader determine whether to use a
page-description based layout (like PDF) or tags (HTMLish). I do wish that
SGML were widely available. I worry about making a computer necessary to do
intellectual work. And I worry about the things that David Kirshner alludes
to.

But then I am a chronically disatisfied worry-wort.

Best, John St. Julien