Network Programming :: Lessons :: The HTTP Protocol
The Hypertext Transfer Protocol (HTTP) is the standard protocol for communication between web browsers and web servers and defines how data is transferred and how the server and client talk to each other. For each request from the client to the server there are four steps:
- The client opens a TCP connection to the server on port 80 (by default).
- The client sends a message to the server requesting the resource at a specified path. The request includes a header and occasionally a blank line followed by data for the request.
- The server sends a response to the client. The response begins with a response code, followed by a header with metadata, a blank line, and the requested data or an error message.
- The server closes the connection.
The above steps are the basic HTTP 1.0 procedure. In later version of HTTP starting with HTTP 1.1, multiple requests and responses can be sent in series over a single TCP connection. Basically, steps 2 and 3 above can repeat. A typical client request looks like the following:
GET /index.php HTTP/1.1 Host: yhscs.us Connection: keep-alive User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8
You can view the request information of a web page in Chrome by using the Inspect function and going to the Network section.
The first line above is the request line that includes a method, a path to the resource, and the version of HTTP. The GET method will be discussed later, but it basically asks the server to return a representation of the resource at the path /index.php. The request line is the only line that is required for a request.
Each line after the request line take the form "Keyword: Value" and both sides should be ASCII. A line in the header is terminated by a carriage-return linefeed pair (\r\n).
The first keyword in the above example is the host, which allows web servers to differentiate between different named hosts at the same IP address.
In HTTP 1.1 and later the connection keyword allows you to specify a keep-alive connection that will stay connected while multiple resources are accessed.
The user-agent keyword lets the server know what browser is trying to access the resource, which may mean the server sends a response optimized for that browser.
The accept keyword tells the server the types of data the client can handle. The client in the example above can handle four MIME types: text/html, application/xhtml+xml, application/xml, and image/webp. A MIME type is specified at two levels: a type and a subtype. The type shows generally what type of data is contained such as an image or text. The subtype identifies the specific type such as GIF image, JPEG image, or WEBP image. There are eight top-level MIME types:
- text/* for human-readable words
- image/* for pictures
- model/* for 3D models
- audio/* for sound
- video/* for movies
- application/* for binary data
- message/* for protocol-specific messages
- multipart/* for containers of multiple resources
Once the server see a blank line (\r\n\r\n) it knows the request is complete. The response will return a status line along with header information like so:
HTTP/1.1 200 OK Date: Fri, 16 Sep 2016 12:33:08 GMT Server: Apache Cache-Control: max-age=0 Expires: Fri, 16 Sep 2016 12:33:08 GMT X-UA-Compatible: IE=edge X-Content-Type-Options: nosniff Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 3192 Keep-Alive: timeout=2, max=100 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8
The first line indicates the protocol as well as a response code that indicates the status of the response. 200 OK is the most common response code. You can go to Wikipedia to see a list of all response codes, including some unofficial codes. It is important to know that codes from 100 to 199 always indicate an informational response, codes 200 to 299 always indicate success, 300 to 399 always indicate redirection, 400 to 499 always indicate a client error, and 500 to 599 always indicate a server error. Some of the most important response codes to know are the following:
- 200 OK: The request succeeded.
- 301 Moved Permanently: The resource has moved to a new URL. The client should move to the new URL and update any bookmarks to the old one.
- 401 Unauthorized: A username and password is required to access this resource.
- 403 Forbidden: The server is deliberately refusing to process a request.
- 404 Not Found: The requests resource was not found.
- 418 I'm a teapot: Attempting to brew coffee with a teapot.
- 500 Internal Server Error: The server does not know how to handle an unexpected condition.
HTTP 1.0 opens a new connection for every request, which can typically take more time than the time it takes to transmit the data. Encrypted HTTPS connected that use SSL or TLS can take even more time since setting up a secure connection involves more steps than setting up a regular socket.
In HTTP 1.1 and later, the server doesn't have to close the socket after it sends a response. The server can leave the socket open and wait for a new request from the client on the same socket so multiple requests and responses can be sent in a series over a single TCP connection. A client indicates it is willing to do this by sending the Connection: Keep-Alive header.
The URL class in Java supports Keep-Alive by default. You can control Java's use of HTTP Keep-Alive with the following properties:
- http.keepAlive: Set to true or false to enable or disable HTTP Keep-Alive.
- http.maxConnections: The number of sockets you are willing to keep open at once. The default is 5.
- http.keepAlive.remainingData: Set to true to let Java clean up after abandoned connections. The default is false.
- sun.net.http.errorstream.enableBuffering: Set to true to attempt to buffer the short error streams from 400- and 500-level responses so the connection can be freed sooner. The default is false.
- sun.net.http.errorstream.bufferSize: The number of bytes to use for buffering error streams. The default is 4,096 bytes.
- sun.net.http.errorstream.timeout: The number of milliseconds before timing out a read from the error stream. The default is 300 milliseconds.
Communication with an HTTP server follows a request-response pattern. Each HTTP request has two or three parts:
- A start line containing the HTTP method and a path to the resource.
- A header of name-value fields.
- A request body containing a representation of the resource (POST and PUT only).
There are four main HTTP methods that identify the operations that can be performed:
The GET method retrieves a representation of a resource and can be repeated without concern if it fails. Its output is often cached and can often be bookmarked and preference without concern.
The PUT method uploads a representation of a resource at a known URL. It can also be repeated without concern if it fails since putting the same document on the same server twice in a row leaves the server in the same state as only putting it once.
The DELETE method removes a resource from a specified URL. If you aren't sure a delete request succeeded you can simply send the request again.
The POST method is the most general of the four methods. It upload a representation of a resource to the server, but it does not specify what to do with that resource. The server may move the resource to a different URL or use the data in the resource to update a database. POST is intended for actions that commit to something while GET is intended for noncommittal actions such as browsing a static web page. Adding an item to an online shopping cart should use the GET method since it doesn't commit to a purchase. Purchasing the item, however, should use POST since it commits to the purchase.
The GET method retrieves a representation of the resource identified by a URL. The URL class in Java uses the GET method to communicate with HTTP servers. The path and query string of a GET request let the server know what to do.
POST and PUT are a bit more complex. The representation of the requested resource is sent in the body of the request after the header. The following four items are sent in order:
- A start line that includes the method, path and query string, and HTTP version
- An HTTP header
- A blank line (\r\n\r\n)
- The body
The following POST request sends form data to a server:
POST alumniContact.php HTTP 1.1 Date: Sun, 18 Apr 2016 21:47:02 Host: yhscs.us Content-type: text/html Content-length: 101 name=Derek+Milleremail@example.com&message=Hey%2C+Edric.&recipient=Edric+Yu
Cookies are small strings of text used to store persistent information. Cookies are passed from server to client and back again through HTTP headers and are used for login credentials, shopping cart contents, user settings, and more.
To set a cookie in a browser, the server includes a Set-Cookie header line. The following example sets a username that is used to identify the current user on the website.
HTTP/1.1 200 OK Content-Type: text/html Set-Cookie: user=CoachMiller
If a browser makes a second request to the same server, it will send the cookie back in a Cookie line in the HTTP request header:
GET / HTTP/1.1 Host: ymsrunning.com Cookie: user=CoachMiller Accept: text/html
A server can only set cookies for domains it belongs to so vgc.yhscs.us cannot set cookies for ymsrunning.com, yhscs.us, or .us. Cookies are also limited by path so a cookie set to yhscs.us/apcs/ also applies to yhscs.us/apcs/lessons, but not to yhscs.us.
Cookies can be set to expire by using the expire attribute like the following example:
Set-Cookie: user=CoachMiller; expires=Wed, 21-Dec-2015 15:23:00 GMT
You can also set the cookie to expire after a certain amount of time (in seconds) has elapsed:
Set-Cookie: user=CoachMiller; Max-Age=3600
Set-Cookie: user=CoachMiller; pass=fakePass; secure; httponly
Java 6 includes a java.net.CookieManager subclass of CookieHandler, but it must be enabled to use it:
CookieManager manger = new CookieManager(); CookieHandler.setDefault(manager);
After those two lines, Java will store and cookies sent by HTTP servers that you connect to with the URL class and will send the stored cookies back to those servers in future requests.
To get and put cookies locally you can retrieve the store where CookieManager saves its cookies:
CookieStore store = manager.getCookieStore();
You can control the cookies in the store using the following methods:
public void add(URI uri, HttpCookie cookie) public List<HttpCookie> get(URI uri) public List<HttpCookie> getCookies() public List<URI> getURIs() public boolean remove(URI uri, HttpCookie cookie) public boolean removeAll()
The HTTPCookie class has some methods that are useful for inspecting cookies, although some are deprecated as a part of the defunct Cookie 2 specification.