A phobia of binary format files. It is widespread especially in the Unix community.
Then
To understand how this fear got started you must understand what computers were like in the beginning.
In the early days, there was almost no standardisation. A byte was anywhere from 6 to 12 bits, a word anywhere from 8 to 64 bits. Every machine had
at least two proprietary floating point formats. Sometimes each installation defined its own custom
character set. Some machines were big-endian, some little-endian, some twos-complement,
some ones-complement. Reading a file from someone else’s computer was quite an undertaking. It was easier
if it was pure characters because then it was easier to decipher the format.
If a program did not work, since the documentation on the format was typically so sketchy, it was easier to
deal with human-readable character data than binary data, even if it were more bulky.
Data formats were not taken very seriously. Formats were defined procedurally — whatever the program
produced. This sufficed because there was very little interchange of data. If data were exchanged, it would
always be read and written by the same program on the same hardware, so there was no need to define precisely
what the format was.
Even mag tape densities and proprietary formats and labels caused interchange problems.
Microsoft used binary formats for its MS Word and Excel products. However they considered the format a
proprietary secret. They would often change the format without telling anyone. They arranged formats to be
deliberately incompatible as a dodge to trick customers into upgrading. Once Version N+1 has touched a document,
Version N could no longer read it. Everyone had to upgrade to Version N+1 at considerable cost, just to be able
to read their documents again. Microsoft only sold version N+1, so there was no legitimate way new users in a
shop could stick with version N to avoid the problem. Microsoft traumatised programmers against binary formats.
Programmers gradually decoded and document the formats as best they could. It was an undertaking comparable to
breaking the German enigma code. And there was no guarantee the result was 100%
accurate. Whenever programmers think binary format, they instantly associate it with Microsoft’s wicked
behaviour. In NLP (Neuro Linguistic Programming) terms, binary format has become a negative anchor.
CORBA made a brave stab at letting you exchange binary data between different
platforms. The catch was CORBA (Common Object Request Broker Architecture) made such a production of it, that the very thought made programmers want to lie
down and take a nap.
Now
Today, things have changed:
- We have converged on IEEE (Institute of Electrical & Electronics Engineers) format standards for binary floating point interchange.
- We have standardised on big-endian format for network order.
- Nearly all hardware can read/write 8-bit, 16-bit, 32-bit and 64-bit binary two-complement integers signed
and unsigned, as well as IEEE floating point.
- Java allows serialised objects to be exchanged between machines with totally different internal hardware.
Though it started in Java, there is no reason other languages could not
implement the same protocols.
- The Internet means data is now routinely exchanged between computers from different manufacturers, using
software from different vendors on each end. It now becomes extremely important to precisely define the data
formats, and to create programs to verify that the standards are being adhered to.
- The Internet means it is more important than ever to exchange data in compact formats. If you don’t,
you waste bandwidth, air time, computing power, battery life in hand-held devices, and, most of all, people
time waiting for transmissions to complete.
- We are moving to an age with an explosion of hand held devices that communicate the same way cell phones do
with the Internet. These too must be accommodated. They have very tight RAM (Random Access Memory) and CPU (Central Processing Unit) requirements. Further, air
time is considerably more expensive and considerably slower than the cable connections that desktop machines
enjoy. Further, the amount of bandwidth is limited by the radio frequency spectrum. We are rapidly running out
of cell phone type bandwidth. You have to be ultra-efficient to even play the game.
Advantages
There are several major advantages to binary formats:
Compactness
They are compact to store, compact to process in RAM, and compact to transmit over the Internet. In contrast,
some text formats such as XML (extensible Markup Language) can be an order of magnitude fluffier.
Speed
A well designed binary format is computer-friendly. The computer can rapidly navigate the data finding what
it wants without having to parse that which it does not want.
Simplicity
Though a binary format might look terrifying to a human viewing it with the wrong tool, such as NOTEPAD, from
the computer’s point of view, it takes much less code to read and analyse a binary format file. This is
especially important in hand-held devices where RAM for code, and battery power to drive that code is at a
premium.
Accuracy
If you use text files for information interchange there will be a conversion from binary to prepare them and
a conversion back to binary to read them. Each of those conversions can introduce small errors if you are not
careful, especially with IEEE floating point. If you go direct binary to binary there are two less places you
can go wrong.
Symptoms
XML is probably the fluffiest, least efficient text format ever conceived. It is the
complete antithesis of a binary format. Addiction to XML is a symptom of a severe case of binaphobia.
Treatment
The binaphobic wakes in the night terrified he has written a program to create a binary format and now for some
reason he cannot read the data. What can be done to reassure the binaphobic?
- Use industry standard protocols and well tested libraries to read and write the data. Then at worst you
will be missing a field. You are then no worse off than had you done the whole thing in text.
- Remind him, "When was the last time anyone lost a serialised object because of bugs in readObject or writeObject?"
- Use the proper debugging tools to study your binary format files. You would not use NOTEPAD to modify an MS
Word document, so why do you think it the appropriate tool to examine and edit a binary format document. Use a
binary format editor/inspector. Programmers have no fear of the binary TCP/IP (Transmission Control Protocol/Internet Protocol) format, because they use proper
tools to examine the bits in the packets, rather than trying to analyse them with NOTEPAD.
- If you invent a new binary format, get different people to write the reader, writer, verifier, and
inspector/editor. That will help iron out inconsistencies or ambiguities in the format specification, and cross
check each others’ work. Then get lots of other people to use it. The more people using it, the less
likely a bug will slip through unnoticed.
- Remind him that bugs in properly tested programs are rare. Errors in text files prepared with NOTEPAD are
extremely common. He is like the fool who sits in his car in the garage with the motor running to avoid being
hit by lightning.
- Let the binaphobic use a binary editor. He imagines somehow it will be harder to use that Notepad. He
imagines that it will force him to fiddle bits with hex notation. He has no idea that a modern binary editor is
like a spreadsheet with the formulas locked that validates each field as you entered it.