Protocols vs Ports

Why ports 80 and 443 don't necessarily do what we expect.

When people are running their own server (like SimpleHelp) they inevitably need to understand ports in some sense. At a minimum they need to understand that:

Ports exist
They are a number
Servers run on them

Most of the time that's really all that is necessary. You launch the server, it turns up on a port and you use that port when you are connecting to it.

There are however some caveats, with complications arising around ports 80 and 443, and around HTTP and HTTPS. Some common points of confusion are:

If its running on port 80 and its serving web pages, I can run it forwarded from another web server?
If its running on port 443, why can’t I access it over HTTPS?
If its going to do SSL/HTTPS then I need to run it on port 443?
Its running HTTP, but where do I set the HTTPS port?

All of these are perfectly valid assumptions and stem from the fact that HTTP and HTTPS are well known protocols that run on well known ports: HTTP is synonymous with 80 and HTTPS is equally synonymous with 443. It’s not very often that you come into contact with any web page that isn’t running on port 80 or any HTTPS website that isn’t running on port 443.

There is even a very good reason why you don’t come into contact with many websites running on non-standard ports too – because if you did your browser wouldn't know where to find it.

If you already know all about TCP, HTTP, UDP, ports and protocols then you can skip to the end section to find out why this matters to SimpleHelp and what we’re trying to do about it (‘How we make it simpler, and more confusing’) but if you don’t then I’ve tried to pick out the crucial bits, condense it and go through it here to fill in the blanks.

IPs and Ports

If you’re not already clear on it all then this is likely all getting a bit confusing, so I’ll backtrack a bit.

Ports exist. HTTP exists. HTTP runs on port 80. Simple… but not entirely true.

Ports exist and are part of TCP and UDP. TCP and UDP are layered on top of IP (Internet Protocol) – a medium for sending messages between computers. That means that when your computer needs to speak to another computer it can (amongst other methods) either open a TCP connection or start sending UDP packets. In both of these cases it needs two things:

an IP address, and
a port.

The IP address tells TCP and UDP where to send the packets. All the intermediaries in the network layer, like routers, need that IP address to know where to send your data. A good analogy here is a phone number. Knowing the IP address is like knowing somebody else's phone number – you can dial it and start talking.

But if computers had only one phone line they would only be able to talk to one computer at a time. That wouldn't be much use and here is where ports come to the rescue. Along with your IP address you also always need to specify a port.

This is like dialling the phone number for a company but also having to dial an extension. You don't just get to speak to one person now, you can speak to hundreds or thousands, even though they are all at the same place. Applications running on a computer can listen for connections or messages coming in on any port from 1 to 65535 or they can make outgoing connections or send data to other computers on any port from 1 to 65535.

As you’re probably aware, to make life simpler for humans, there is a service called DNS where, if you know the DNS server’s IP address (like its phone number), you can ask for a particular website by name (e.g. www.simple-help.com) and it will give you the IP address so you can talk direct to it. Now we can enter in the website name into the browser instead of an IP address. DNS though doesn't tell you the port, and this is where protocols and default ports come in.

Protocols

When computers communicate, they send each other data – blocks of ones and zeroes that both sides know how to turn into something else – numbers, text, pictures, etc. They know how to do this because they are based on agreed upon standards. ASCII text for example explains how bytes map to letters.

Computers are capable of transmitting text and numbers, but need a way to know how they should communicate so that they don't get into a mess. This is where protocols and HTTP comes in. HTTP is in reality pretty simple, James Marshall’s page HTTP Made Really Easy is a great in depth explanation but essentially it goes like this.

Your browser says "Get me the root page or folder for www.simple-help.com, I am using HTTP version 1.1", like this:

GET / HTTP/1.1
Host: www.simple-help.com

The web browser reads that and responds with “OK (code 200 means OK), its a web page, its 1020 bytes long, here it is”:

HTTP/1.1 200 OK
Content-type: text/html
Content-length: 1020
<html>...(1020 bytes of HTML)

Now your browser can speak to the web server and communicate with it, asking for pages, showing them to you, asking for other pages when you click on links and hey presto – the web is here.

But there is one last thing missing. We know the IP address of the computer because DNS told us where to find it, we know how to talk to the web server to get pages and images, but we don't know its extension – we don't know what port it is going to be on on the remote computer. Except we do, because the HTTP specification specifies a default port: port 80.

Disappearing Ports - 80 and 443

This is really the last piece in the puzzle and if we think back to the original issue it becomes clear why you hardly ever see websites on non-standard ports. DNS can't tell you what port a web server is on, only the IP address, so your browser always has to assume that the web browser is going to be there on port 80. When you have another protocol like HTTPS, it specifies its own default port (443) so that means when you use HTTPS to connect to a website your browser is again always going to have to just assume its going to be there on port 443.

This also explains why you can't run more than one web server on port 80 and 443. If a connection or packet comes in – who does it go to? the first web server? or the second?

If you’re running multiple web sites from one computer they have to all be served from one web server application (like Apache or IIS), because then all connections go to that one application, and it can look for the website domain being asked for (e.g. www.simple-help.com) in the HTTP request and it knows which website the request is for.

It also explains why you never see any port when you look in a browser address bar. Instead you see:

https://simple-help.com/

No port? But that's just because your browser doesn't want to complicate life for you by sticking :80 on the end:

https://simple-help.com:80/

Default Ports

Port 80 is just a default port. There is no reason why you can't talk HTTP over port 81, or talk HTTP over port 443, or talk HTTPS over port 9999, it is just that if you did, most browsers wouldn't know where to find it. In fact, browsers and web servers are perfectly happy talking over other ports, they just need to be told what port it is to use. If you had a web server on port 1234 that could somehow speak HTTP and HTTPS then your browser could easily cope with either of those:

https://simple-help.com:1234/
https://www.simple-help.com:1234/

When a connection came in it would just be up to the web server to figure out whether the browser was trying to start a conversation with HTTP or HTTPS.

In fact, if the web server could distinguish between the protocols, you could speak all kinds of different protocols over the one port. You could do file transfers, HTTP, HTTPS, remote desktop, TCP and UDP at the same time, all over the one port.

Just like SimpleHelp…

Yes. This is why it can get a bit confusing when about why you don't need to open up port 443 for SimpleHelp to enable HTTPS and SSL and why even though you can point your browser at SimpleHelp and get a web page, it’s not a good idea to set up a reverse proxy. Or why you don't even need to specify an ‘HTTPS port’ separately from the ‘HTTP’ port, because normally, if you run a web server on port 80 that's what you get and all you get – a web server. Its speaking HTTP, maybe HTTPS and that is it, and it doesn't try to distinguish between the protocols because it does not have to, it knows HTTPS connections are coming in on 443 and HTTP on 80. When you run SimpleHelp on port 80 though, its doing a lot lot more than just that over the one port.

It makes it easier for SimpleHelp users to get through firewalls and NAT and it means SimpleHelp can serve up web pages, serve up secure SSL web pages, do secure SSL connections, get through HTTP proxies in a number of different ways, do UDP connections do other kinds of uncluttered, faster TCP connections all with the user just having to think about one port.

There is no concept of an HTTP port in SimpleHelp, or a HTTPS port – we do everything over all the ports. If you want to use HTTPS, no problem. In fact its already there on whatever port you have already congiured. You don't need to specify a particular port for it and it doesn't need to use port 443 but I think it unnerves some people when you tell them either they can add port 443 and do:

https://yourwebsite.com

or they can just leave it as it is and do:

https://yourwebsite.com:80

because the server will usually be linked to from your website or embedded in it anyway, so the link will tell the browser what port to use and the customer / technician doesn't have to.

It also means that occasionally someone assumes that because SimpleHelp is a web server, it can be reverse-proxied to from some other web server instead of just running it on another port. In reality though, when the connection comes in to Apache or IIS before it is forwarded on, the servers will understand HTTP, but all the other protocols we do over that port is going to be lost on them. It will work because we can do everything over HTTP if necessary, but it is a restriction that slows things down quite a bit it.

These aren't the common case. Most people get set up fine and never come into contact with any of this, but when we get a question like “where do I set the HTTPS port?” or “how can I set Apache/IIS up to forward connections to SH?” we know we are probably in for a few more emails.

Like everything we do though, we want to make it simpler and better, so in SimpleHelp 3.12 we’ll be adding UDP support which, amongst other things, will allow us to often get a better connection even where we are being forwarded through another web server, and we have also replaced the session graph with a connection information button that gives you the lowdown on how you got connected and what you can do to tweak it and make it better.