HTTP : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)
The JDisplay Java Applet displays the large program listings on this web page. JDisplay requires an up-to-date browser and Java version 1.8+, preferably 1.8.0_102. If you can’t see the listings, or if you just want to learn more about JDisplay, click  here for help. Use Firefox for best results.

HTTP
HTTP (Hypertext Transfer Protocol). A protocol used on the Internet by web browsers to transport text and graphics. It is focuses on grabbing a page at a time, rather setting up a session. Applets also use it to download jars, classes and resources. Browsers use to download files and images, not just HTML (Hypertext Markup Language) text.
Message Headers From Browser To Server Under the Hood
User Agent response codes
Message Headers From Server To Browser Speeding Up HTTP
Language and Charset Rant
GET vs POST Learning More
Sample Code Links

Message Headers From Browser To Server

Some of the acronyms you will encounter in deciphering HTTP headers include: HTTP, MSIE, CLR Fields in the headers let browsers and servers communicate. You set them up with .setHeaderField or more specialised methods. These codes are idiotic. In a sane universe, should just contain the browser name and version. For example:

HTTP Headers that Browsers Send Servers
Field Typical Value Meaning
User-Agent: Java.exe default ⇒ Java/ 1.8.0_102 Last revised/verified: 2016-05-27

Chrome Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Last revised/verified: 2016-08-22

FirefoxMozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0 Last revised/verified: 2016-09-19

OperaMozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36 OPR/39.0.2256.48 Last revised/verified: 2016-08-22

SafariMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2 Last revised/verified: 2015-09-27

SeaMonkeyMozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0 SeaMonkey/2.39 Last revised/verified: 2015-12-10

AvantMozilla/5.0 (Windows NT 6.1; WOW64; Avant TriCore) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36Last revised/verified: 2015-12-10

EdgeMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586 Last revised/verified: 2016-05-27

IE (Internet Explorer) 11.0Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; Avant Browser; rv:11.0) like Gecko Last revised/verified: 2015-12-10

IE 9Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; Avant Browser)

IE 8Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; Avant Browser; SLCC2; .NET CLR 2.0.50727;. NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC (Personal Computer) 6.0; .NET4.0C; .NET4.0E) Last revised/verified: 2011-02-03

IE 7Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media CenterPC5.0; .NET CLR 3.0.04506)

NetscapeMozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.5pre) Gecko/20070710 firefox/2.0.0.4 Navigator/9.0b2

Which browser being used/simulated
Host: localhost:8081 destination url, server:port.
Accept: application/xhtml+voice+xml;version=1.2, application/x-xhtml+voice+xml;version=1.2, application/x-shockwave-flash,text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
or
text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
MIME (Multipurpose Internet Mail Extensions) types the browser is willing to accept. The encoding of this field, is described in RFC 7230 section 14. and in the more friendly w3.org version. Roughly the q numbers define your preference. The higher the number the higher the preference. Default is 1. The q applies to the preceding MIME. You set this with URLConnection. setRequestProperty( Accept, …); not accept as the Sun docs erroneously suggest.
Accept-Language: en Language the browser in willing to accept.
Accept-Charset: windows-1252, utf-8, utf-16, iso-8859-1;q=0.6, *;q=0.1 Character set encodings the browser is willing to accept.
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0 compression schemes the browser is willing to accept.
  • deflate: zlib format defined in  RFC 1950 plus the deflate compression mechanism described in  RFC 1951. This is a stripped down gzip without the header.
  • gzip, alias x-gzip: Java-style gzip RFC 1952 Lempel-Ziv coding with a 32-bit CRC (Cyclic Redundancy Check).
  • compress, alias x-compress, UNIX compress. Rarely used.
  • identity means as-is, no compression. Use in the Content-Request header, but not the Content-Encoding header. Just leave out the Content-Encoding if it is identity.
Referer: http://mindprod.com/jgloss/http.html. However, if the page is loaded locally from hard disk, this field will be missing. the web page that contained the link that triggered this request.
If-Modified-Since: Mon, 06 Feb 2006 01:24:23 GMT (Greenwich Mean Time) Only bother with the request if the file has changed since this date, otherwise the browser already has a copy in cache. If the file has not changed, you will get a 304 not modified response code.
Connection: Keep-Alive requests server keep the socket open for further messages. It is true by default in HTTP 1.1, so you don’t need to use it.
Keep-Alive: 300 requests server keep the socket open 300 seconds for further messages.
Pragma: no-cache requests getting a fresh copy from the server, rather than from a cache.
Content-Type: application/x-www-form-urlencoded MIME type of the payload to the server.
Content-Length: 114 length in encoded bytes of the payload to the server.

Beware using  HttpURLConnection.setFollowRedirects ( false); This reportedly causes trouble in recent JDKs (Java Development Kits). When it is set true, it will not automatically follow responses with: <META HTTP-EQUIV=Refresh.

Message Headers From Server To Browser

You read these.getHeaderField or more specialised methods after the connection has been made.
HTTP Headers that Servers Send Browsers
Field Typical Value Meaning
Server: Apache/2.0.55 (NETWARE) mod_perl/1.99_12 Perl/v5.8.4 Which server software being used.
Accept-Ranges: bytes Inform the browser that the server supports downloading just parts of files, as small as a byte granularity.
Location: http://mindprod.com/index.html If the URL (Uniform Resource Locator) has been redirected/moved, this is the new URL to use instead. You can tell if it is permanently or temporarily redirected by looking at the response code.
Keep-Alive: timeout=15, max=99 how long to keep this socket open for more messages.
Connection: Keep-Alive requests browser keep the socket open for further messages.
Content-Type: image/png MIME type of the payload from the server. Also used to encode the CharSet encoding, e. g. Content-Type: text/html; charset=utf-8
Content-Encoding: gzip gzip or x-zip or deflate or not present if no compression.
Content-disposition: attachment;filename=smile.png Server suggests a filename to save this download under.
Content-Length: 842 length in encoded bytes of the payload from the server.

In the real world, the conversations between browser/client and server are much more complicated as slipshod than you might suppose. Each query often results is a flurry of permanent and temporary redirects back and forth. Each element on an HTML page must be requested independently. Sometimes servers will send back a fail error code, then send the page anyway. Or they will send a 404 with an OK text response code. Sometimes servers refuse HEAD requests, but accept the equivalent GET. Sometimes servers send back https: in response to an http: request. Sometimes servers give you a totally different page from the one you requested and don’t tell you the one you wanted is on longer available. Sometimes servers redirect to localhost, or send back gibberish messages. Sometimes a server won’t send you a page if you have recently previously requested it. They expect you to have cached it. Browsers just do their best to muddle through. When you start emulating browsers with code, you get pretty flaky programs.

Language and Charset

You might wonder, where does the server encode the language and character set? Oddly not in the HTTP header, but embedded  Embedding this information makes it easier for web page authors to control, even if it makes finding the information slightly more difficult for the browser.

GET vs POST

In a GET, the parameters are embedded in the URL send to the host, with various header fields following the URL. The first parameter starts with a ?. Subsequent ones start with &. = separates the keyword from the value. Here is what a typical message sent to the server looks like:

A POST, is like a GET, with optional embedded parameters in the URL sent to the host. In addition in has a message tacked on the end after a blank line like this:

Sample Code

This code for doing GET and POST is from the com.mindprod.http package. You can download the whole package.

Code to do a GET

Code to do a POST

Code to do a HEAD

Code to do a Fetch (generic GET)

Code to do a PROBE (check if page present)

Base class for GET, POST and PROBE

view

Code to READ

Read the response either as bytes (readBytesBlocking) or converted to a String

Under the Hood

What happens when your Java-based browser requests a page?
  1. URL.openConnection just sets up a place to build the HTTP header. It does no communicating with the outside world.
  2. HTTPConnection.connect () requests sending the header to the server. This request triggers opening a TCP/IP (Transmission Control Protocol/Internet Protocol) socket connection to the server. This is done by sending a SYN connection request packet. The server sends back a SYN+ACK. Then the client sends an ACK, upon which may be piggybacked some data.
  3. If you look at what URL. openConnection gives you with getClass to will see a variety of objects such as:
    • HttpURLConnection
    • HttpsURLConnectionImpl
    • FileURLConnection
    • JarURLConnection

    depending on which protocol you specified e.g. http:, https:, file:, jar:

  4. This triggers sending the GET header composed of all the header fields set up before the .connect call. The GET request header includes a list of the encodings and compression algorithms the browser would like in response. .connect does not return until the HTTP header is safely sent out the wire. .connect can take a long time to return since it waits for the other end to respond, or a timeout.
  5. The browser calls  HTTPConnection.getResponseCode to see if request went ok. This blocks until the server responds with an HTTP response header.
  6. Then the browser calls  HTTPConnection.getInputStream and reads the text of the message from the server containing the requested web page. Using the standard TCP/IP protocol flow-control features, the server sends data only as fast the browser can read it.
  7. The browser then scans the web page for the URLs (Uniform Resource Locators) of embedded images and puts out GET requests for them.
  8. Then various images usually come back from the server on the original socket. The browser could elect to request each image on it own socket so they can arrive simultaneously.
The stream is made purely of printable characters. The server can detect the start of a new GET request by looking for line terminators.

Speeding Up HTTP

There are several things you might consider to speed up HTTP transmissions.

Rant

HTTP is an unnecessarily fluffy and asymmetrical protocol. If I were redesigning it:

Learning More

RFC 2045 MIME part 1.

RFC 2049 MIME part 2, non ASCII (American Standard Code for Information Interchange).

RFC 1945 HTTP 1.0 specification.

RFC 2045 MIME Part One: Format of Internet Message Bodies, specifies the various headers used to describe the structure of MIME messages.

RFC 2046 MIME Part Two: Media Types, describes the general structure of the MIME media typing system and defines an initial set of media types.

RFC 2047 MIME Part Three: Message Header Extensions for non-ASCII text

RFC 6838 and RFC 4289 MIME Part Four: Registration Procedures

RFC 2049 MIME Part Five: Conformance Criteria and Examples, Provides some illustrative examples of MIME message formats

RFC 2183 MIME Part Five: Conformance Criteria and HTTP Content-Disposition

RFC 7230 updates the HTTP protocol. Details of what all the header fields do.

RFC 2617 (obsolete) replaced by RFC 7235: for details on how to send username and password in http headers to restrict access

RFC 2183 MIME Part Five: Conformance Criteria and HTTP Content-Disposition

Oracle’s Javadoc on URLConnection class : available:
Oracle’s Javadoc on HttpURLConnection class : available:
Oracle’s Javadoc on JarURLConnection class : available:

HttpsURLConnectionImpl and FileURLConnection are undocumented.


This page is posted
on the web at:

http://mindprod.com/jgloss/http.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\http.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[54.226.110.143]
You are visitor number