Portable Document Format.
Adobe’s platform-independent format for distributing documents. You can
recognize them by the *.pdf extension. You will find
them commonly on commercial websites to distribute product literature or complex
technical documentation. PDF allows searching and scrolling, just like HTML in a
browser.
Click to download or read a sample
PDF document. Requires Acrobat.
Similarities to HTML
PDF is similar to HTML in that:
- Both let you prepare documents for the web, hard disk or CD.
- You can convert HTML to PDF and PDF to HTML.
- Both support hypertext links.
- Both support embedded images, sound files and movies.
- Both support forms you fill in and submit. PDF calls that FDF
(Forms Data Format).
- Both will display the first part of a document even before the whole document is
downloaded
Differences From HTML
PDF is different from HTML in that:
- It is oriented around pages. HTML is oriented around documents with no page
breaks. PDF does not have a way of reflowing page breaks the way you would in
PageMaker.
- PDF is not designed to be modified. You modify the original documents, perhaps
in MS Word format, then regenerate the PDF. With HTML you modify the master HTML
documents directly.
- PDF looks identical on all machines. HTML adjusts itself to the resolutions and
fonts available the viewing machine.
- PDF requires proprietary to tools to create. HTML can be created with tools as
simple as NOTEPAD.
- PDF documents are designed to be printed. HTML documents are not. Most printers
cannot understand PDF. Software must convert the PDF to the native printer
language, possibly PostScript. A few printers, such as the HP Color LaserJet
4650 series, can understand PDF directly.
- Adobe pulled off a near miracle, persuading the major font companies to allow
their fonts to be embedded in PDF files, without royalties. There is no way to
do this with standard HTML, CSS or Java. If you include the fonts, everyone will
see the document with the proper fonts. If you don’t, people without the
fonts installed will see the document with a rough approximation to the correct
font using Adobe’s morphing master font technology. The document will be
rendered with whatever fonts the end user has available. They may look
absolutely nothing like the ones you used to compose the document. Only the
glyphs you actually use in the document are included. If you don’t want
the bulk, the distiller will save some information about the font metrics
instead, then the viewer’s reader software can warp one of the installed
fonts to create a substitute that will at least have the same spacing even if it
looks nothing like the original font.
- PDF documents may include Adobe PostScript Type 1 or Microsoft TrueType fonts.
HTML documents normally never embed fonts. I don’t even know what the
format is when they do.
- With HTML, the person viewing an HTML document can override all your stylistic
choices. With PDF he will always view the document exactly as you designed it.
Advantages of PDF
The advantages of PDF format are:
- It is a lot less work to prepare a PDF document if you already have either a
word processing document or a printed page. Preparing the HTML document is often
almost as much work as retyping from scratch.
- You can rapidly create documents for the web from printed materials, including
graphic images. The raw materials might be MS Word documents, PostScript files,
books, or brochures. You don’t need a lot of manual keying and touchup the
way you would to create the equivalent HTML documents. All you have to do is
scan a page, or print a word processing document to a special printer driver and
Adobe Acrobat does the rest. The conversion program is slow, but requires little
effort to touch up the results.
- PDF gives you absolute control over the final look of the document, unlike HTML
where you can only give hints.
- You can control whether the reader is permitted to cut/paste or make hard copy
printouts. You can control whether others are permitted to modify the document.
Clever users can bypass these restrictions.
- PDF documents print properly with page breaks in the right places. HTML
documents break printed pages half way through images or even half way through a
line of text. HTML printouts are a mess!
- PDF is much more compact than the equivalent set of graphics images. PDF is
essentially an OCRed document in modified PostScript format. Part of the secret
of the compactness is that standard fonts need not be included as part of the
document. They come with the viewer. However, PDF is not as compact as HTML. It
contains much more precise font and positioning information.
- PDF documents come with a miniature search engine built into every document.
- PDF documents have automatically generated table of contents, thumbnails and
indexes.
- If you buy Acrobat 4, you can add annotations and highlights to documents. You
can have many people adding comments to a document. You can filter annotations
by person or date. You can sort them by person, date, type or page number.
Annotations may be comments, stamps (e.g. approved, confidential), highlighting,
thumbtacks, paperclips or scribbles.
- PDF has article threading. If you scan in a magazine, you can leave it exactly
the same as the original layout. The viewer program will automatically guide the
reader through the disconnected pieces of an article. With HTML you must
manually cut and paste the various pieces into separate documents, one for each
article. You have to show Acrobat the flow of each article, but this is a tiny
fraction the work of cutting and pasting the fragments into a new document for
each article.
- PDF documents can be digitally signed. You can be sure the document is not a
forgery and has not been tampered with or damaged.
- It is easy to take an existing paper form and convert it for electronic use in
PDF. With HTML you must start from scratch and design the form electronically.
Designing forms is much simpler in PDF. In HTML, it requires a programmer.
Anyone could do it in Acrobat/PDF. You can also design forms electronically in
PDF, or use any graphics, word processor or publishing software to design them.
- Apple Mac OS X uses the PDF as the basis of its Quartz imaging model.
Disadvantages of PDF
The disadvantages of PDF format are:
- PDF is supported under Windows 95/98/ME/NT/W2K/XP/W2K3 and Mac only. HTML is
supported on nearly everything.
- PDF documents are bulkier than HTML documents.
- PDF is oriented around fixed size pages. You turn pages electronically much the
way you would pages in a magazine, or use a hand tool to drag the paper. You can’t
scroll the way you are used to with the mouse wheel or scrollbars, though you
can still scroll after a fashion. Usually you navigate by clicking repeatedly,
by using the hand tool, or by following the table of contents. In HTML, the
document in one giant page you can scroll through continuously. I think HTML
maps more naturally to the screen.
- Search engines do not index PDF documents. They can’t see inside them. In
2001 February, the Google search engine started indexing PDF documents. Google
is the fastest and most accurate of the search engines, so the rest should
follow suit.
- People who wish to read the documents must install the Acrobat viewer to be able
to see the documents. It is free, but that is still a hassle that will
discourage novices from looking at your documents.
- PDF is proprietary to Adobe. You must purchase the Acrobat program before you
can prepare PDF documents. There are now some third party tools for creating PDF
documents, and some add ons that work with Acrobat, but chances are you will
need Acrobat itself at the core. There are a number of tools that are optional,
but highly desirable, for preparing PDF documents, some costing over
.
- PDF uses only the bulky AIF and WAV file sound formats. It does not support the
much more compact MP3 format.
- PDF supports Apple QuickTime and AVI movie formats only. People who view your
documents still must separately install the QuickTime and AVI engines to see the
movies. HTML supports dozens of other movie formats.
- PDF forms are for submitting data only, either to a database, to a CGI server or
to email. They are not suitable for doing inquiries the way HTML forms are. I am
not 100% sure of this point. The Adobe
representative I talked to did not seem to understand my questions in this area.
- PDF documents embed the fonts needed to view them, even if the user already has
them installed. This makes the PDF files bulkier than they need be, and slows
downloading them. Ideally, the font files would be separate and downloaded only
if needed, and then cached so they would not need to be downloaded every time.
PDF Tools
You don’t have to choose. You can prepare your documents in PDF format,
then export in HTML, and post both on your website, reaping the benefits of both.
Let your users decide which they prefer to view. Search engines will bring
people to your site who then may choose look at the PDF, especially if they want
a printed hard copy.
Entrofocus PitStop is a plug-in for Acrobat that solves many of the small
irritations with Adobe Acrobat. It comes highly recommended.
Linux PDF tools tend to be free.
You can create PDF files using a Adobe
Web-Based Conversion Service for
a month. This is a reasonable alternative for low volumes or to experiment.
DocuDesk
converts MS Word and 250 other apps. I suspect it works by acting as a virtual
printer driver.
JPedal is a Java library
for reading and displaying PDF files. It has a JWS (Java Web
Start) PDF viewer that lets you view a PDF file on a any
Java-supported platform. You can extract text or images from PDF files. You can
extract data from FDF forms. There are three versions: There is a stripped down
free open source version. The Enterprise version is
for a single seat, and
for a site licence. You need to negotiate a licence to include it in your
distributed software.