TCP is a Byte Streaming Protocol

If you remember nothing else about TCP sockets, remember this: TCP is a BYTE STREAM protocol, it is not a packet or datagram protocol. You send and receive a single byte at a time with TCP, not messages, or lines of text, or packets, or datagrams. Remembering this will save you many many hours of frustration down the road.

~ or ~

"What are all these garbage characters doing in my MUD client?"

A very common mistake in MUD client development is to think that TCP is a packet protocol at the socket level, or that Line Mode Telnet guarantees that you read data from a game one whole line at a time. TCP doesn't work that way, and neither does Telnet.

TCP is a byte stream protocol, and if you start reading from a TCP socket thinking that you are getting packets of data, or worse, lines of data, you are going to end up in a world of hurt. A MUD server server might blast a 50 byte line into the Telnet TCP stream with a single write, but that doesn't mean you are going read exactly 50 bytes out on the client end. It doesn't work that way. "Write boundaries" and "packet boundaries" are not preserved by TCP.

This misunderstanding about how the socket works (the assumption that a read() from the socket is going to give a complete line of data instead of a single byte) is the most common reason why new mud clients occasionally glitch out on color codes and utf-8 sequences, and for player "triggers" to fail to match. The client typically tries to regexp match for a color code or character or trigger on the most recent read() instead of on a full buffer, and doesn't behave correctly when the sequence ends up broken across more than one read.

You might be tempted to think, "I can just increase my read() buffer size to adjust for the largest possible packet of data that I am likely to get in a single read, and that will fix it by always having enough space to avoid a break." Unfortunately, no. You can hand read() a larger destination buffer, but there's no guarantee that the OS will fill it for you. Even with the largest possible destination buffer, you can still get read sizes as small as a single byte from your OS. If your client is going to try to process "lines" of data, it MUST be able to buffer up multiple reads() (perhaps containing a single byte each!) into a full line before processing it.

Here's an example of what can and does happen. Let's say the player has a pattern matching trigger set up. They are trying to find the text You are hungry. to have the client automatically respond by eating some food.

The game sends this with a single write, "You are hungry.". Most of the time, the client would end up getting this message in a single read() call. The client sees "You are hungry." from that read() and can match it. But suppose something completely out of your control happens on the network, and some of those bytes take a little longer to arrive. The client does a read() and only gets the first part of the message to have arrived- "You are hu". If it tries to look for the pattern in that read(), it isn't going to find the whole message and run the trigger. On the next read(), it sees "ungry.", and that's not the message it was looking for either. So the trigger doesn't fire, and the poor player goes hungry.

The same problem can happen with multi-byte ANSI color sequences, or UTF-8 encoded characters, or any other multi-byte sequence. The read() breaks can show up anywhere in the byte stream.

But it's a MUD, so no problem, right? Just look for the newline and buffer to that! Well, also no. Don't forget about MUD prompts, they typically get sent without any newline termination at all. Some games support the TELNET EOR option to indicate a prompt line, some don't. Some games also send the TELNET GA character for that. Some just leave you hanging, and timeouts and partial line pattern matches are about all you can do.

What is The Baud Test?

If you want an easy way to torture test your client's stream parsing, try logging into The Last Outpost with the username "baudtest". There should be no ?'s or � characters, or broken colors. Try it from inside the LociTerm client for comparison with your own client!

PLAY NOW