A phobia of binary format files. It is widespread especially in the Unix community.
Then
To understand how this fear got started you must understand what
computers were like in the beginning.
In the early days, there was almost no standardisation. A byte was anywhere from 6
to 12 bits, a word anywhere from 8 to 64 bits. Every machine had at least two proprietary floating point
formats. Sometimes each installation defined its own custom
character set. Some machines were big-endian, some little-endian, some twos-complement, some ones-complement. Reading
a file from someone else’s computer was quite an undertaking. It was easier if
it was pure characters because then it was easier to decipher the format.
If a program did not work, since the documentation on the format was typically so
sketchy, it was easier to deal with human-readable character data than binary data,
even if it were more bulky.
Data formats were not taken very seriously. Formats were defined procedurally
— whatever the program produced. This sufficed because there was very little
interchange of data. If data were exchanged, it would always be read and written by
the same program on the same hardware, so there was no need to define precisely what
the format was.
Even mag tape densities and proprietary formats and labels caused interchange
problems.
Microsoft used binary formats for its MS Word and Excel products. However, they
considered the format a proprietary secret. They would often change the format
without telling anyone. They arranged formats to be deliberately incompatible as a
dodge to trick customers into upgrading. Once Version N+1 has touched a document,
Version N could no longer read it. Everyone had to upgrade to Version N+1 at
considerable cost, just to be able to read their documents again. Microsoft only sold
version N+1, so there was no legitimate way new users in a shop could stick with
version N to avoid the problem. Microsoft traumatised programmers against binary
formats. Programmers gradually decoded and document the formats as best they could.
It was an undertaking comparable to breaking the German enigma code. And there was no
guarantee the result was 100% accurate. Whenever
programmers think binary format, they instantly associate it with Microsoft’s
wicked behaviour. In NLP (Neuro Linguistic Programming) terms, binary format has
become a negative anchor.
CORBA made a brave stab at letting you
exchange binary data between different platforms. The catch was
CORBA (Common Object Request Broker Architecture) made such a production of it, that the very thought
made programmers want to lie down and take a nap.
Now
Today, things have changed:
- We have converged on IEEE (Institute of Electrical & Electronics Engineers) format
standards for binary floating point interchange.
- We have standardised on big-endian format for network order.
- Nearly all hardware can read/write 8-bit, 16-bit, 32-bit and 64-bit binary two-complement
integers signed and unsigned, as well as IEEE
floating point.
- Java allows serialised objects to be exchanged between machines with totally
different internal hardware. Though it started in Java, there is no reason other languages could not implement
the same protocols.
- The Internet means data is now routinely exchanged between computers from
different manufacturers, using software from different vendors on each end. It now
becomes extremely important to precisely define the data formats and to create
programs to verify that the standards are being adhered to.
- The Internet means it is more important than ever to exchange data in compact
formats. If you don’t, you waste bandwidth, air time, computing power,
battery life in hand-held devices and, most of all, people time waiting for
transmissions to complete.
- We are moving to an age with an explosion of hand held devices that communicate
the same way cellphones do with the Internet. These too must be accommodated. They
have very tight RAM (Random Access Memory) and
CPU (Central Processing Unit) requirements. Further, air time is considerably
more expensive and considerably slower than the cable connections that desktop
machines enjoy. Further, the amount of bandwidth is limited by the radio
frequency spectrum. We are rapidly running out of cellphone type bandwidth. You
have to be ultra-efficient to even play the game.
Advantages
There are several major advantages to binary
formats:
Compactness
They are compact to store, compact to process in
RAM and
compact to transmit over the Internet. In contrast, some text formats such as
XML (extensible Markup Language) can be an order of magnitude fluffier.
Speed
A well designed binary format is computer-friendly. The computer
can rapidly navigate the data finding what it wants without having to parse that
which it does not want.
Simplicity
Though a binary format might look terrifying to a human
viewing it with the wrong tool, such as NOTEPAD, from the computer’s point
of view, it takes much less code to read and analyse a binary format file. This
is especially important in hand-held devices where RAM
for code and battery power to drive that code is at a premium.
Accuracy
If you use text files for information interchange there will be
a conversion from binary to prepare them and a conversion back to binary to read
them. Each of those conversions can introduce small errors if you are not
careful, especially with IEEE
floating point. If you go direct binary to binary there are two less places you
can go wrong.
Symptoms
XML is probably the fluffiest,
least efficient text format ever conceived. It is the complete antithesis of a binary
format. Addiction to XML is a symptom of a severe case of binaphobia.
Treatment
The binaphobic wakes in the night terrified he has written a
program to create a binary format and now for some reason he cannot read the data.
What can be done to reassure the binaphobic?
- Use industry standard protocols and well tested libraries to read and write the
data. Then at worst you will be missing a field. You are then no worse off than had
you done the whole thing in text.
- Remind him, When was the last time anyone lost a
serialised object because of bugs in readObject or
writeObject?
- Use the proper debugging tools to study your binary format files. You would not
use NOTEPAD to modify an MS Word document, so why do you think it the appropriate
tool to examine and edit a binary format document. Use a binary format
editor/inspector. Programmers have no fear of the binary
TCP/IP (Transmission Control Protocol/Internet Protocol) format because they use proper tools to examine
the bits in the packets, rather than trying to analyse them with NOTEPAD.
- If you invent a new binary format, get different people to write the reader,
writer, verifier and inspector/editor. That will help iron out inconsistencies or
ambiguities in the format specification and cross check each others’ work.
Then get lots of other people to use it. The more people using it, the less likely
a bug will slip through unnoticed.
- Remind him that bugs in properly tested programs are rare. Errors in text files
prepared with NOTEPAD are extremely common. He is like the fool who sits in his car
in the garage with the motor running to avoid being hit by lightning.
- Let the binaphobic use a binary editor. He imagines somehow it will be harder
to use that Notepad. He imagines that it will force him to fiddle bits with hex
notation. He has no idea that a modern binary editor is like a spreadsheet with the
formulas locked that validates each field as you entered it.