The worst decision in the design of the Internet

The Internet is a frankly incredible design. The IP protocol, which is at its heart, is technology from 1974. TCP, which implements connections on top of IP’s packet delivery, is from the same year. Forty-two years on, both are essentially unchanged. Even DNS, the domain-name service, dates back to 1983, and is going strong 33 years in with only minor tweaks.

The only big change in this time has been the slooow migration (still in its early stages really) from IPv4 to IPv6 — something that has proven necessary as the Internet has been so wildly more successful and popular than anyone anticipated, and the 32-bit-wide host addresses are running out. But in the scheme of things, this is a minor tweak. We’re running the Internet on 1970s technology, not due to sloth, but because it’s good.

sushi-2

There’s just one nasty misfeature in this suite of protocols, and that is port numbers. A running TCP/IP service is available on an address that consist of the host (expressed as an IP address such as 212.56.71.163 or more often a domain name such as google.com), together with a small integer — the port number. Different ports support different services, so you can run (for example) an remote-shell service and a Web server on the same host, on ports 22 and 80 respectively.

The problem is, there are only a finite number of ports on a host — 2^16, or 65536 — and a potentially infinite number of services that you might wish to support. Different services conventionally run on different well-known ports (such as port 80 for Web servers, as in the example above). The space is very polluted and getting worse.

In particular, port 8080 is woefully oversubscribed, with important services like Web proxies and caches, the Java-based Web server Tomcat, and several more all wanting to run there. This isn’t just a theoretical problem: the reason I am writing this post is because of the problems I’ve had today trying to get Tomcat to start on my Ubuntu 15.10 box, where something is already running on port 8080. I don’t know what: I can’t find out, because although it accepts connections, it immediately closes them.

GH12NJ7U9

How easy it would have been to avoid this problem, if only the protocols had specified ports as short strings instead of integers. Then it would be trivial to make up truly unique port-names for each service. They could be faceted, like Unix file paths or Java package names, so that for example my home-brew MUNDI server could run on port “uk.org.miketaylor.mundi”. Then Tomcat would never collide with whatever the heck is already running on my port 8080.

So it’s a real shame that in protocols so brilliantly engineered, which have stood the test of time so well, this one trivial wart causes so much avoidable grief.

 

22 responses to “The worst decision in the design of the Internet

  1. Yep, that’s what did it for me in the end.

  2. You’d probably need to structure the port names. I can see half of everyone using ports like ‘datacollection’, ‘assetmanager’ or ‘authorization’ with totally different protocols. My first guess is that they should include the domain name of the protocol developer like Apple bundle identifiers.

  3. Maarten Daalder

    Would using a 2 byte long string help? :P

    (Reverse) Proxies can help with this to some extend, but this is outside of my expertise. Where I work we have a reverse proxy for multiple (web facing) servers and for some reason that works.

  4. Oh yummy! Stuffed peppers! One of my favorites, to eat and to cook!

  5. Kaleberg, did you skip the bit of the post where I suggested port-names like “uk.org.miketaylor.mundi” :-)

    Rubberman: huh? Stuffed peppers?

  6. Try the command ‘netstat -vat’

  7. Yeah. Haven’t posted here for a long time. And aren’t those stuffed peppers in the top photo? Look like it to me! And I don’t disagree with being able to use names for ports. Maybe you should join the IETF… :-)

  8. A couple of technical objections:
    – “uk.org.miketaylor.mundi” seems to imply variable length port identifiers. Ports have to be in the TCP header, where it would be highly impractical to have variable length fields. (This stuff has to be processed at lightning speed in hardware.)
    – Encoded as 7bit ascii “uk.org.miketaylor.mundi” would be 161 bits. Times 2 (because both source and destination ports must be in the header) thats 322 bits. For reference the entire TCP header as currently defined is 160 bits (if no option headers are set).

    We could probably get somewhere by hashing the port name instead of using it directly, but even that would require a big increase in the port identifier size otherwise collisions would be too frequent.

  9. Rubberman, welcome back. Do you mean this photo? That is an inside-out avocada roll topped with salmon. It’s sushi.

    Pedro: yes, variable-length strings (though I suppose a reasonable cap wouldn’t be the end of the world). I understand the reasons for the fixed-length two-byte port, and no doubt it was a good trade-off back in the 1970s. But it’s not a good trade-off in 2016. It’s a decision that has not aged well. These days, “lightning speed” is literally indistinguishable for handling two-byte ints and short strings. (Bandwidth is more of an issue, but even there I think the trade-off we’ve made is not a good one.)

    Hashing the strings is a nice idea, and probably “good enough” even for a small target range — maybe even the current 16 bits, come to think of it.

  10. If you want a fixed-size item, perhaps a GUID would be a better choice – fixed size IS very nice for certain things, but it’s large enough, and everyone can generate their own unique ones whenever they wish.

  11. Oh, that’s not bad either — though GUIDs are butt-ugly.

  12. Another way to spot in-use ports is netstat -ap, as root — this will actually tell you the process doing the listen(), so it doesn’t matter how shortlived the response is. (Even this, of course, will go wrong if you’re using something like inetd or systemd socket activation, where all the sockets would appear to be owned by systemd until a child forks off: but then it’ll at least tell you which program to look in the config of.)

  13. I’ve always thought that the problem was that a node or host is a “thing” in IP. The port should be the only thing, and a more detailed DNS would map port names (DNS names are already dotted variable-length strings) to a field that combines the existing host address and port-number-within-host bits. Maybe you would grandfather the host/port distinction by punctuating with a colon: com.wordpress.reprog:http.sushi

  14. Interesting idea. Of course a host really is a thing (even if sometimes a much more complex thing than what the inventors of the Internet probably envisaged), and the present host+port scheme reflects that reality. The question is whether that bit of reality is best reflected or concealed.

  15. Sorry late to this blog post, but having implemented TCP/IP stacks, particularly IPv6 in Unix back in the early 1990s… memory… serves… me (not too well, must check wikipedia etc first…)

    Port numbers. Hmm, well there’s a IANA text registry here of TCP and UDP port numbers and names, but I can’t think of an API that uses this.

    http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?&page=2

    Basically if you want to be “legit” you ask for an official port. 50% did, and the other 50% just stomped over the limited 65535 port numbers. Hold on isn’t port < 1024 special "well known" ports, WHY ARE THERE UNOFFICIAL ONES THERE? Sheesh, well who knew that if you don't specify a good standard then 42 years later you'll have a complete dog's breakfast (of sushi at least perhaps… mmmm Sushi… why am I thinking of Sushi…)

    What's weird is there's /etc/protocols for mapping names to protocols, using the getprotoent() API. For TCP/IP and UDP/IP the socket() last parameter "prototype" can just be "0" which gives you the "proper answer" for TCP and UDP in the socket API. So getprotoent() is essentially useless and the *wrong* thing.

    http://man7.org/linux/man-pages/man3/getprotoent.3.html
    http://man7.org/linux/man-pages/man2/socket.2.html

    Which means URLs, which are essentially names, can't encode ports as a name, they're still numbers. This can't be right. Wikipedia to the rescue?

    https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers

    NOPE No API – you manually look it up.

    (NB. I was working on IPv6 API's for SCO Unix back in 1992, I was speaking at conferences each year after that saying "By next year you should all have migrated" for about 3 years until I finally realized "next year" is probably "next century" and gave up beating the poor dead horse).

  16. Inspired TCP/IP awesomeness, and worried by current political wall building.

    Software Architecture: “Build Bridges, Not Walls” at http://www.harth-lang.org/blog/2016/03/02/build-bridges-not-walls/

  17. Andrew Burrows

    This seems a fair solution for the problem, but the good news is that the problem is going away, when we all inevitably adopt containerisation. Run your tomcat (or whatever else) in a container, it listens on port 8080 within the container but you choose which port on the host it maps to when you start the container. You can run any number of identically configured tomcats on the same host, all thinking they are on port 8080 but connect to each of them on a different port.

  18. I know Docker is very trendy at the moment, but I’m going to hold fire before committing the architecture of the Internet to a technology-of-the-month. To me, it seems like a very heavyweight approach to running services. (I know it’s usually promoted as “lightweight”, but that’s by comparison with a complete virtual machine.)

  19. Andrew Burrows

    I agree, next week it will be something else saving the internet. But it does show that the problem is really just one of internal house keeping. There are very few ports/protocols that the world at large could expect to use when connecting to a given host the rest are all “by private arrangement”.

  20. If only there were some kind of Authority for Assigning Internet Numbers.

  21. Yes, I did miss it. I had a nasty infection that has slowed my brain down to a rather pathetic state when I wrote that. As usual, one only realizes just how sick one was after recovery. Lord only knows what other weird blog comments I posted. My own blog entries were barely coherent. My friends were worried.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s