armouring : Java Glossary

* 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)

armouring

Converting binary data into printable gibberish so that data transport systems will not corrupt it. You see it used often in certificates, email and HTTP (Hypertext Transfer Protocol) communications.

There are many data transport systems that either ignore, act on or otherwise meddle with control characters embedded in the data. They may trim trailing blanks, change line end characters, convert tabs to spaces etc. etc. Any of these actions would totally corrupt binary data. To pass binary data through such a meddlesome channel, e.g. the email system, it must first be armoured, converted to use only safe printable characters that will not be meddled with, e.g. a-z A-Z 0-9 and the vanilla punctuation. I sometimes refer to character than need special processing to pass through a channel as awkward.

MIME (Multipurpose Internet Mail Extensions) email and email attachments have a configurable encoding scheme, controlled via the Transfer-Content-Encoding mime header, often base64 or Quoted-Printable.

Unfortunately this bulks the message up by 30 to 300% depending on the technique you use. The other end has to recognise the armouring technique and do the reverse to get the binary back.

When 8-bit data are encoded in printable characters, the more printable characters used in the representation, generally the more efficient the protocol. However, the more characters used, the greater the odds one of the characters used will be interfered with by your communication channel.

Armouring Schemes

Unfortunately, there are a plethora of techniques. It is not always obvious just from looking which was used to encode the data:

base64: common in certificates, passwords, email, email attachments, cookies and HTTP Base64 uses a small cast of characters to convert 8-bit data into printable characters: a to z, A to Z, 0 to 9, + / and =. You might do this to convert any binary data to printable. This makes base64 suitable for encoding binary data as SQL (Standard Query Language) strings, that will work no matter what the encoding. Unfortunately + / and = all have special meaning in URLs (Uniform Resource Locators). See Base64 for free Java source code. Every three characters in the original fluff up to four characters in the encoded form. This 33% increase in size occurs independent of what characters appear in your data. At the receiving end you convert the printable characters back to the 8-bit data.
url-encoded. See the separate entry on it.
base64u: A variant of Base64 that avoids the + / and = characters that have special meaning in URL (Uniform Resource Locator) s, GET and POST. You can treat its output either as not needing URLEncoding, or as already URLEncoded. Used to armour bytes or anything that can be converted to bytes, e.g. via serialized
the Transporter which optionally handles serialising/reconstituting, compression/decompression, signing/verifying, heavy duty encryption/decryption and Base64u armouring/dearmouring all with light weight classes. Use it when you want to include arbitrary Java Objects in your CGI (Common Gateway Interface) GETS and POSTS.
Quoted-Printable (RFC 2045 ) used in newsgroup messages and email. Quoted-Printable (RFC 2045 ) uses the following set of characters to convert 8-bit data into printable characters : space, a to z, A to Z, 0 to 9, !- <, >- ~, =. It converts unsafe characters into =FF where FF is the hex equivalent. In the best case, your message is the same size as the original. In a pathological case, your message can balloon up to three times the original size.
hexadecimal: two characters per byte 0..F. The result is always exactly double the size of the original. This is one of the easiest schemes to write code for.
binhex: a hex variant used on the Macintosh.
UUEncode: similar to Base64 in that they both use 64 ASCII (American Standard Code for Information Interchange) characters to represent 6 bits in the printable representation, but they are not compatible. Base 64 uses upper case, lower case, digits and only three punctuation symbols. UUEncode uses 28 punctuation symbols and it uses only upper case letters. Also, the uuencode command has a structure to its output, with a header containing a file name and permissions, line-length encoding characters, and a footer, none of which are part of Base64.
CMP Encode: dates back to 1985. Very efficient for text that is mostly printable already. CMP (Canadian Mind Products) Encode uses the full 95 ASCII printable characters excluding space. Printable characters it leaves as is. It encodes control characters with a lead ^, e.g. code 3 becomes ^C. High bit chars are encoded with a lead `. It has a simple compression scheme for repeating character strings. In the best case, your message can be even smaller than the original. In a pathological case, your message can balloon up to twice the original size. Unfortunately, Java code for this algorithm is not currently available. Pascal source and executable is available. This algorithm is not a recognised official MIME encoding.
CMP Encrypt: dates back to 1985. Also encrypted with a theoretically uncrackable one-time pad. Pascal source and executable is available.

Base64
Cramfull
encipher
encode
encoding
encrypt
MIME
printable
registration key
the Transporter
url-encoded

standard footer
	This page is posted on the web at:	http://mindprod.com/jgloss/armouring.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\armouring.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.216.123]
Feedback	You are visitor number