Thursday, October 25, 2012

Overview of HTTP

If you have been using the internet for a while, you have probably typed into your browser something that starts with "http:" and ends with ".html", hit "Go", watched your modem lights flicker on and off, and a couple of seconds later you are magically looking at page of todays news or a page of pictures. Let's try and take some of the magic away ...

I thought about calling this article a "Geeks view of HTTP"; I am not going into the nitty gritty, but rather present a sketch of how a geek might see something like HTTP, how they can think something like HTTP is simple, and why they are not trying to make things complicated just for the sake of it!

--* Layers *--

The background to the sketch is that computers are full of layers. Programmers and designers think in layers, because organizing in layers makes it easier to build things.

You don't need to know what or where the layers are in your computer, or where one starts and another stops, but it is helpful to remember that they are everywhere because that is how geeks organize things.

--* Specifications and Protocols *--

Specifications and Protocols are the things that are the mostly likely cause of all your computer frustrations ... but they are a necessary evil. A wise man once said that any specification longer than one line will have ambiguities and be a source of problems. There are lots of specifications and protocols in everyday life. An example of a protocol is when you are driving and see a red light you slow down and stop until it turns green. The specification of this protocol is the rule that is (probably) written down in the road-rule book. But we don't need to read the road rules, it is just common sense to know to stop for a red light. Of course we all know the words computers and common sense don't belong in the same sentence, and that is why there are so many computer specifications and they are generally so long and detailed.

A common sentiment is "you really need to spell things out for a computer". I think this is a bit misleading. It isn't the computer that needs things spelt out, it is the programmers and designers who build the higher layers who need to spell things out for other programmers creating other higher layers.

If one programmer ignores the rules of a specification, or doesn't know there is a rule to follow, the result can be the same as if a driver runs a red light ... a crash.

--* Connection *--

I have finished with the background of the sketch, and now I am going to draw a picture of a couple of boats in a bay, each with a two- way radio. I grew up around boats and always liked listening into the chatter on the two-way radios. The following conversations are from my memories of growing up, but the ideas are probably the same for cb and other forms of radio.

The first boat we will call "Rock n Roll" the second "Jazz". There was only one channel, and I noticed that there was a protocol that everyone seemed to follow (except on Sunday afternoons) to keep things orderly. I don't know if this protocol was written down anywhere as a set of specification rules or was just common sense. If Jazz wanted to call Rock n Roll they would wait until they heard the current conversation end with an "Over and Out". They would wait a couple of seconds and then say something like "Rock n Roll, Rock n Roll, this is Jazz, do you read me, over". If someone on Jazz was listening, they would say "Rock n Roll, this is Jazz, go ahead, over". Bingo!, they have a connection.

The same kind of thing happens when you type an address into your web browser and hit "Go". Your computer is just moving a bunch of numbers to the modem. To keep things simple, lets ignore that the modem is converting those numbers into sounds, and pretend that it is those numbers that are traveling out along your phone line. The reason sending numbers out along your phone can work, is that at the other end of the line is your ISP's computer. This computer, and all the other computers on the internet, have layers written by programmers and designed by computer architects, that follow strict and detailed specifications of what to do with the numbers coming out of your computer.

This sounds complex (and the details are!), but it is just the same as the person on Rock n Roll knowing to wait until they heard "over and out" before calling Jazz, and saying "over" at the end of each sentence. If no one followed these protocols the channel would have been chaos, everyone trying to talk over everyone else. In the same way, if your computer and the other computers on the internet did not follow the specifications, the fact that your computer sends out some numbers over your phone line would be as useless as it sounds in the first place! ... but, if the rules are followed, it works.

I am not going to go into the actual details of these specifications, but you have probably seen the acronyms; TCP/IP and DNS (and a heap more!)

--* HTTP *--

Lets say that someone on Rock n Roll knows there is a football almanac onboard Jazz which will confirm the score of a particular game and settle an argument onboard Rock n Roll. When Rock n Roll hears Jazz say "this is Jazz, go ahead, over" they know they have a connection to Jazz and can start a conversation. Rock n Roll might say "Jazz can you get me the score of the 1987 Superbowl, over", and Jazz comes back with the answer finishing with an "over". Rock n Roll might ask for another score, or might ask for the list of players, or might just say "thanks, over" and which point Jazz would say, "See ya, over and out".

--* *--

That is really what HTTP is doing. The connection has been made at lower separate layers which are handling the numbers traveling out of your computer and moving them to the computer you are connecting to. HTTP is a fairly simple specification that allows one computer to ask another for some information (by naming it), and for that information to be returned. It doesn't say anything about where that information comes from; as far as the HTTP specification is concerned, somebody could be sitting at the other computer typing in the response. However, usually the information that is asked for is the name of a file, which is a bunch of numbers on the hard drive. Those numbers get moved from the hard drive into memory into the modem and back to your computer.

Thats it! That is the essence of HTTP.

The point is, to see why a geek can think something like HTTP is simple you need to think in layers like a geek. Thinking in layers is not some kind of zen like discipline for them, they probably do it without even being aware of it as that is what their tools and languages encourage. If a programmer was writing an HTTP program they may write something like:

LowerLayer connectTo: "Jazz".

LowerLayer send: "GET SuperbowlScore1987".

LowerLayer out.

The programmer who writes this is not thinking about the details of how the connection is established or how the message is sent. They may have no idea! When they are working with HTTP they just assume the lower layer works. If they, or you, do want to understand the lower layer, then put HTTP out of your mind and read up on the TCP/IP and DNS layers and specifications (have fun, and have a good supply of coffee ready).

Similarly, they are not trying to understand how the information that is received is displayed so nicely in your browser. That is a higher layer and yet another specification (HTML).

I hope you enjoyed reading this article, it has taken an unusual perspective of HTTP! If you want to read up on the details there are a number of good articles on the web and the HTTP specification itself.





No comments:

Post a Comment