A way of armouring, i.e. sending awkward characters. Browsers use url-encoding
on HTTP GET and PUT requests to the server. They embed data in the URLs. Url-encoding
is also used by the url-encoded and x-www-form-urlencoded
mime types.
You see url-encoding every time you do a Google search e.g.
http://www.google.com/search?client=opera&rls=en&q=%22rabbits%22%2BEaster+eggs
&sourceid=opera&ie=utf-8&oe=utf-8
The request url-encodes my query:
"rabbits"+Easter eggs
There are two flavours of urlencoding, one used in URLs, and one used in forms.
URL Encoding
Ironically you don’t use java.net.URLEncoder.
encode/decode to handle encoding URLs or GET
parameters. Unfortunately, the URL class provides no
escaping features. You must use the URI class
and convert the URL with toURL().
The encoding algorithm is described in RFC 2396.
Properly speaking, you should not see bare & in
URLs; they should be pre-encoded as &. I
wrote a utility called Amper that
processes *.html files to make this correction.
Form Encoding
Form url-encoding/decoding is handled by java.net.URLEncoder.
encode/decode. This is only intended for String
data with a few awkward characters in it, not heavy-duty binary. Encodings you
will likely use in conjunction with URLEncoder
include ISO-8859-1 (Latin-1), UTF-8
and windows-1250.
java.net.URLEncoder uses the following set of
characters to convert 8-bit data into printable characters :a
to z, A to Z,
0 to 9, -,
., *, and _.
It works like this:
- The alphanumeric characters a to z,
A to Z, 0
to 9 remain the same.
- The special characters ., -,
*, and _ remain the same.
- The space character is converted into a plus sign +.
- All other 16-bit characters are unsafe and are first converted into one or more
bytes using some encoding scheme. Then each
byte is represented by the 3-character string %FF,
where FF is the two-digit hexadecimal representation of the byte. e.g. $ →
%24, % → %25, & → %26, / → %2F and : → %3A. You must
URLEncode only once. If you URLEncode something already URLEncoded you will get
gibberish.
In the best case, your message is the same size as the original. In a
pathological case, your message can balloon up to three times the original size.
Learning More
Sun’s Javadoc on the
URLEncoder class : available:
Sun’s Javadoc on the
URLDecoder class : available:
Sun’s Javadoc on the
URL class : available:
Sun’s Javadoc on the
URI class : available: