Yorkville High School Computer Science Department
Yorkville High School Computer Science Department on Facebook  Yorkville High School Computer Science Department Twitter Feed  Yorkville High School Computer Science Department on Instagram

Yorkville High School Computer Science

ASSIGNMENTS: Compiler Part 2 - December 3, 2018 :: Challenges 7 - December 7, 2018 :: Network App - December 14, 2018 >>

Network Programming :: Lessons :: The HTTP Protocol

HTTP Requests

The Hypertext Transfer Protocol (HTTP) is the standard protocol for communication between web browsers and web servers and defines how data is transferred and how the server and client talk to each other. For each request from the client to the server there are four steps:

  1. The client opens a TCP connection to the server on port 80 (by default).
  2. The client sends a message to the server requesting the resource at a specified path. The request includes a header and occasionally a blank line followed by data for the request.
  3. The server sends a response to the client. The response begins with a response code, followed by a header with metadata, a blank line, and the requested data or an error message.
  4. The server closes the connection.

The above steps are the basic HTTP 1.0 procedure. In later version of HTTP starting with HTTP 1.1, multiple requests and responses can be sent in series over a single TCP connection. Basically, steps 2 and 3 above can repeat. A typical client request looks like the following:

GET /index.php HTTP/1.1
Host: yhscs.us
Connection: keep-alive
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

You can view the request information of a web page in Chrome by using the Inspect function and going to the Network section.

The first line above is the request line that includes a method, a path to the resource, and the version of HTTP. The GET method will be discussed later, but it basically asks the server to return a representation of the resource at the path /index.php. The request line is the only line that is required for a request.

Each line after the request line take the form "Keyword: Value" and both sides should be ASCII. A line in the header is terminated by a carriage-return linefeed pair (\r\n).

The first keyword in the above example is the host, which allows web servers to differentiate between different named hosts at the same IP address.

In HTTP 1.1 and later the connection keyword allows you to specify a keep-alive connection that will stay connected while multiple resources are accessed.

The user-agent keyword lets the server know what browser is trying to access the resource, which may mean the server sends a response optimized for that browser.

The accept keyword tells the server the types of data the client can handle. The client in the example above can handle four MIME types: text/html, application/xhtml+xml, application/xml, and image/webp. A MIME type is specified at two levels: a type and a subtype. The type shows generally what type of data is contained such as an image or text. The subtype identifies the specific type such as GIF image, JPEG image, or WEBP image. There are eight top-level MIME types:

HTTP Response

Once the server see a blank line (\r\n\r\n) it knows the request is complete. The response will return a status line along with header information like so:

HTTP/1.1 200 OK
Date: Fri, 16 Sep 2016 12:33:08 GMT
Server: Apache
Cache-Control: max-age=0
Expires: Fri, 16 Sep 2016 12:33:08 GMT
X-UA-Compatible: IE=edge
X-Content-Type-Options: nosniff
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 3192
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

The first line indicates the protocol as well as a response code that indicates the status of the response. 200 OK is the most common response code. You can go to Wikipedia to see a list of all response codes, including some unofficial codes. It is important to know that codes from 100 to 199 always indicate an informational response, codes 200 to 299 always indicate success, 300 to 399 always indicate redirection, 400 to 499 always indicate a client error, and 500 to 599 always indicate a server error. Some of the most important response codes to know are the following:

Keep-Alive

HTTP 1.0 opens a new connection for every request, which can typically take more time than the time it takes to transmit the data. Encrypted HTTPS connected that use SSL or TLS can take even more time since setting up a secure connection involves more steps than setting up a regular socket.

In HTTP 1.1 and later, the server doesn't have to close the socket after it sends a response. The server can leave the socket open and wait for a new request from the client on the same socket so multiple requests and responses can be sent in a series over a single TCP connection. A client indicates it is willing to do this by sending the Connection: Keep-Alive header.

The URL class in Java supports Keep-Alive by default. You can control Java's use of HTTP Keep-Alive with the following properties:

HTTP Methods

Communication with an HTTP server follows a request-response pattern. Each HTTP request has two or three parts:

There are four main HTTP methods that identify the operations that can be performed:

The GET method retrieves a representation of a resource and can be repeated without concern if it fails. Its output is often cached and can often be bookmarked and preference without concern.

The PUT method uploads a representation of a resource at a known URL. It can also be repeated without concern if it fails since putting the same document on the same server twice in a row leaves the server in the same state as only putting it once.

The DELETE method removes a resource from a specified URL. If you aren't sure a delete request succeeded you can simply send the request again.

The POST method is the most general of the four methods. It upload a representation of a resource to the server, but it does not specify what to do with that resource. The server may move the resource to a different URL or use the data in the resource to update a database. POST is intended for actions that commit to something while GET is intended for noncommittal actions such as browsing a static web page. Adding an item to an online shopping cart should use the GET method since it doesn't commit to a purchase. Purchasing the item, however, should use POST since it commits to the purchase.

The GET method retrieves a representation of the resource identified by a URL. The URL class in Java uses the GET method to communicate with HTTP servers. The path and query string of a GET request let the server know what to do.

POST and PUT are a bit more complex. The representation of the requested resource is sent in the body of the request after the header. The following four items are sent in order:

  1. A start line that includes the method, path and query string, and HTTP version
  2. An HTTP header
  3. A blank line (\r\n\r\n)
  4. The body

The following POST request sends form data to a server:

POST alumniContact.php HTTP 1.1
Date: Sun, 18 Apr 2016 21:47:02
Host: yhscs.us
Content-type: text/html
Content-length: 101

name=Derek+Miller&email=demiller@y115.org&message=Hey%2C+Edric.&recipient=Edric+Yu

Cookies

Cookies are small strings of text used to store persistent information. Cookies are passed from server to client and back again through HTTP headers and are used for login credentials, shopping cart contents, user settings, and more.

To set a cookie in a browser, the server includes a Set-Cookie header line. The following example sets a username that is used to identify the current user on the website.

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: user=CoachMiller

If a browser makes a second request to the same server, it will send the cookie back in a Cookie line in the HTTP request header:

GET / HTTP/1.1
Host: ymsrunning.com
Cookie: user=CoachMiller
Accept: text/html

A server can only set cookies for domains it belongs to so vgc.yhscs.us cannot set cookies for ymsrunning.com, yhscs.us, or .us. Cookies are also limited by path so a cookie set to yhscs.us/apcs/ also applies to yhscs.us/apcs/lessons, but not to yhscs.us.

Cookies can be set to expire by using the expire attribute like the following example:

Set-Cookie: user=CoachMiller; expires=Wed, 21-Dec-2015 15:23:00 GMT

You can also set the cookie to expire after a certain amount of time (in seconds) has elapsed:

Set-Cookie: user=CoachMiller; Max-Age=3600

Secure data should always be sent over a secure channel such as HTTPS. You can set the secure attribute to ensure browsers will refuse to send the cookie if it is not over a secure channel. You can also specify that the browser should only return the cookie to the server via HTTP or HTTPS and not via Javascript.

Set-Cookie: user=CoachMiller; pass=fakePass; secure; httponly

Java 6 includes a java.net.CookieManager subclass of CookieHandler, but it must be enabled to use it:

CookieManager manger = new CookieManager();
CookieHandler.setDefault(manager);

After those two lines, Java will store and cookies sent by HTTP servers that you connect to with the URL class and will send the stored cookies back to those servers in future requests.

To get and put cookies locally you can retrieve the store where CookieManager saves its cookies:

CookieStore store = manager.getCookieStore();

You can control the cookies in the store using the following methods:

public void add(URI uri, HttpCookie cookie)
public List<HttpCookie> get(URI uri)
public List<HttpCookie> getCookies()
public List<URI> getURIs()
public boolean remove(URI uri, HttpCookie cookie)
public boolean removeAll()

The HTTPCookie class has some methods that are useful for inspecting cookies, although some are deprecated as a part of the defunct Cookie 2 specification.

Yorkville High School Computer Science Department on Facebook Yorkville High School Computer Science Department Twitter Feed Yorkville High School Computer Science Department on Instagram