Network Programming :: Lessons :: Network Concepts
A network is a set of computers connected together for the purpose of sharing resources. A network can be as small and local as two computers connected in the same room, or as large and broad as the internet. Sending data across a network is much more complicated than sending other resources such as electricity or water. There is more to the data than the physical characteristics of the signals being sent back and forth. The signals have meaning and that meaning can be lost during transport if care isn't taken.
The different aspects of network communication are organized into multiple layers. Each layer only talks to the layers immediately above and immediately below it. There are different layer models depending on the needs of a particular network. The 4-layer model below is the TCP/IP model used for the internet.
Application Layer: This layer is where applications like Google Chrome or Call of Duty reside. The applications only talk to the transport layer. Some protocols at this level include HTTP for the web; SMTP, POP, and IMAP for email; and FTP for file transfer.
Transport Layer: This layer provides services to connect applications with the internet. A transport protocol is used in this layer to transport data. The Transmission Control Protocol (TCP) is the protocol that lent its name to the TCP/IP model and is used for transmissions that must make a connection. The User Datagram Protocol (UDP) is alternatively used for simple transmissions that don't need to make a connection. The transport layer talks to both the application layer and the internet layer. This layer is also responsible for ensuring that packets are received in the order they were sent and that no data is lost or corrupted.
Internet Layer: This layer is a collection of methods and protocols used to transport packets of data across different networks. The protocols in this layer are defined by the Internet Protocol (IP) and this layer talks with the transport layer as well as the link layer. IP is actually two protocols: IPv4, which uses 32-bit addresses, and IPv6, which uses 128-bit addresses. IPv6 is still trying to gain ground on IPv4. According to Google's statistics, IPv6 was used by nearly 12% of their users in August 2016.
Link Layer: Sometimes known as the physical layer, this layer can only communicate with the internet layer. This layer moves the data across wires, cables, or wirelessly to its destination or back to the internet layer to return to the application. Some of the protocols available on this layer include PPP, WiFi, and Ethernet.
In IPv4 and IPv6 data is sent across the internet layer in packets known as datagrams. An IPv4 datagrams has a header between 20 and 60 bytes and data that can be as big as 65,515 bytes. An IPv6 header is typically 40 bytes and the data can be as large as four gigabytes. The datagram below is an example of an IPv4 datagram and the video below also uses IPv4 datagrams as a basis.
The Internet Protocol was developed in cooperation with the military so it was designed to be robust and allow multiple routes between any two points on the network. IP also had to be open and platform-independent so different types of computers could all talk to each other. Since packets may take different routes to the same destination, they may not arrive in the same order that they were sent. TCP was layered on top of IP to allow each end of the connection the ability to acknowledge receipt of IP packets and request the retransmission of lost or corrupted packets. TCP has a lot of overhead, but is considered a reliable protocol.
UDP is an unreliable protocol that does not guarantee that packets arrive in the same order, or even if they arrive at all. This would be a problem for some applications like file transfer, but it is fine for video or audio transmissions where a few missing bits won't cause much of a problem. TCP would have too much overhead for applications like video and audio where you would have to wait for the retransmission of lost packets.
IP addresses are a part of the data packets sent through TCP/IP and every computer on a network is identified by a four-byte number in IPv4 in a dotted quad format like 192.168.1.1. There are just over four billion possible IPv4 addresses so the transition to IPv6 has begun. IPv6 addresses are written in eight blocks of four hexadecimal digits such as FEDC:0000:0000:0000:00DC:0000:7076:0010. A double colon in an IPv6 address indicates multiple zero blocks and leading zeroes do not have to be written so the previous address could be rewritten as FEDC::DC:0:7076:10. IP addresses can change over time so you should never write code that relies upon a constant IP address. Many clients receive a new address, often from a DHCP server, every time they boot up. It is also possible, although rare, that an IP address can change while a program is running so you may want to check the IP address every time you need it.
All IPv4 addresses that begin with 10., 172.16, 172.31, and 192.168 are unassigned, but can be used on internal networks. IPv4 address beginning with 127 mean the local loopback address so the address always points back to the local computer. The hostname for this address is typically localhost, and in IPv6 the address is 0:0:0:0:0:0:0:1. An IPv4 address that uses the same number for all four bytes such as 255.255.255.255 is a broadcast address. Any packets sent to a broadcast address are received by all nodes on the local network, but not by anything outside the network.
The Domain Name System (DNS) was developed to translate IP addresses into hostnames that are easier for humans so remember such as "www.google.com" and "www.yhscs.us."
If computers only did one thing at a time then an address would be all you need. However, your computer will probably be processing multiple requests at a time so those requests are routed through different ports. Each computer with an IP address has 65,535 logical ports per transport layer protocol. Each port is identified by a number between 1 and 65,535 and can be allocated to a specific service. As an example, HTTP commonly uses port 80 so a web server listens on port 80 for any incoming connections. Below are some more common port assignments:
|echo||7||TCP/UDP||Echo is used to verify that two machines are able to connect to each other.|
|FTP data||20||TCP||FTP uses two ports. This port is used to transfer files.|
|FTP||21||TCP||This port is used for FTP commands such as put and get.|
|SSH||22||TCP||User for encrypted logins.|
|SMTP||25||TCP||The Simple Mail Transfer Protocol is used to send email between two machines.|
|HTTP||80||TCP||The World Wide Web protocol.|
|POP3||110||TCP||Post Office Protocol version 3 transfers accumulated email to clients that only connect occasionally.|
The internet is the largest IP-based network in the world. It connects computers on all seven continents and is not governed by anyone. Blocks of IPv4 addresses are assigned to internet service providers (ISPs) to avoid conflicting addresses. When an organization wants to set up an IP network connected to the internet an ISP will assign them a block of addresses. Each block of addresses has a fixed prefix. For example, if the prefix is 244.233.80, then the local network can use addresses from 244.233.80.0 to 244.233.80.255. Since this address specifies the first 24 bits of the address it is called a /24. The smallest possible subnet would be a /30 that only leaves 2 bits, or a total of 4 IP addresses for the organization. However, the lowest address is used to specify the network itself and the largest address is a broadcast address so there are always two fewer addresses available. The map below shows the allocated IPv4 address as of 2006, but the unallocated spaces have obviously filled in quite a bit since then.
Because of the scarcity of IPv4 addresses, most networks use Network Address Translation (NAT) to allow the use of local network addresses but only a single external address. In an NAT-based network nodes only have local, non-routable addresses using the 10.x.x.x range, 172.16.x.x to 172.31.x.x ranges, or 192.168.x.x range. The video below goes into more detail on NAT.
A firewall is hardware and software the sits between the internet and the local network to make sure all the data that comes in or out is safe. The firewall is typically part of the router that connects the local network to the internet, but many modern operating systems have built-in firewalls to monitor data for just that machine.
Proxy servers can act as a go-between if a firewall prevents a host from making a direct connection to the internet. Check out the video below for an explanation about the difference between a firewall and a proxy.
Most network programming is based on a client/server model. Typically, a server sends data and a client receives data, but it is rare for one program to send or receive data exclusively. A better definition is that a client initiates a conversation while a server waits for clients to start conversations with it.
The two organizations that produce most of the standards relevant to network programming are the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C). The IETF is an informal group open to anyone interested. Its standards are based on working code and tend to follow instead of lead implementations. IETF standards include TCP/IP, MIME, and SMTP. The W3C, on the other hand, is controlled by dues-paying member corporations and does not allow participation by individuals. The W3C typically defines standards before implementation and its standards include HTTP, HTML, and XML.
IETF standards are published as Requests for Comments (RFCs) that are published works. A RFC may become obsolete or replaced, but it will never be changed. IETF working documents are called "internet drafts." Before something can become an RFC, is must exist and function. A list of RFCs can be found on the IETF website.
The W3C has five levels of standards. A note is an unsolicited submission by a W3C member or anything by W3C staff that does not describe a full proposal. A note amy or may not lead to a working group or recommendation. A working draft is a reflection of the current thinking of some members of a working group and should eventually lead to a proposal. A candidate recommendation indicates that a working group has reached a consensus on all major issues and is ready for comments and implementations. A proposed recommendation is mostly complete and will only undergo minor changes related to the document rather than the technology behind the recommendation. Finally, a recommendation is the highest level of W3C standards, although the W3C does not call it a "standard" to avoid antitrust laws.