So You Want to Make a Font for the SNES

It’s the early ’90s. You’re a developer at Nintendo implementing a font in The Legend of Zelda: A Link to the Past. The game is in Japanese with occasional English text, so your font needs to contain hiragana, katakana and as many kanji characters as you can fit, in addition to the Latin alphabet, punctuation, numbers and whatever game-specific symbols and icons you need. You come to the conclusion that a font of 512 characters might be a decent balance; 256 kanji characters, 160 kana characters and 96 characters for the Latin letters and everything else.

The full font from the Japanese version of A Link to the Past.

When the player talks to a character, you want to display a text box that uses your font. The game is written for the Super Nintendo (SNES), so unless you want to use special high-res graphics modes that impose additional restrictions, you have to make due with the default resolution of 256×224 pixels. To make things worse, you’re still stuck in the ’90s with CRT TVs that tend to crop parts of the image. For this reason, corporate has designated a 224×192 pixel “safe” area that all TVs ought to be able to display. The game should never display important information – like, you know, dialogue – outside that area. Add on top of that that you also need enough space for a border, stylish margins, and the game’s HUD (which the text boxes cannot overlap for technical reasons). Oh, and you also want to keep the player and the character they’re talking to visible on-screen, so you have even less screen real estate to work with. All things considered, you have been given 160×48 pixels to display text in. With such limitations, how do you fit as much text as possible while still keeping it legible?

An example of a text box and its space constraints.

As if you didn’t have enough limitations to worry about, the SNES handles its graphics using 8×8 pixel blocks called tiles. So, if you want to use the console’s native capabilities to display graphics, they need to be in multiples of 8×8 pixels. What character sizes would that permit, and how many characters would you then be able to fit?

Let’s start with 8×8. This would allow for 20×6=120 characters on screen at the same time. That’s a nice chunk of text. Now, can you actually make the characters 8×8 pixels large? Well, for the Latin letters, numbers and punctuation, yes. The kana should probably also fit within that space. The problem is that you want to have kanji characters as well. These characters are considerably more intricate and complex, so there’s absolutely no way to fit most of them into an 8×8 tile without them becoming an illegible mess of pixels. This wouldn’t really work.

Let’s go one size up; what about 8×16? Again you run into the same problem; the kanji is just too complex to fit within a width of 8 pixels. 16×8? Still no, and it would look really weird anyway. This brings you to 16×16. That would give you 10×3=30 characters. I mean, it might do? Japanese requires way fewer characters on average than English anyway. Still, is there a way you can do better?

This is where the cost-benefit analysis comes into the picture. Do you want to use 16×16 pixel characters – which the SNES can natively handle – or do you want to do the rendering manually in exchange for the freedom to use any character sizes you want? Perhaps the extra hassle is worth it, just to fit a little bit more text on screen? You eventually decide that yes, it is worth it.

This is still the SNES, so you are still locked into the 8×8 grid. The difference is that you’re not assembling predefined 8×8 tiles into an image, you’re generating new 8×8 tiles on-the-fly and assembling those into an image. You do this by allocating a buffer in Video RAM (VRAM), enough space for 20×6 tiles or 160×48 pixels, the amount of space you were given on-screen. You can then manually draw whatever you want into this area and display the resulting tiles on screen. This is less efficient since you’re not only using even more space in the SNES’s horrifyingly small VRAM, but you also have to spend precious time on manually drawing your characters into this space. This is actually more tricky than it sounds due to the way the SNES tiles are stored. Even calculating which bits in which bytes to change to affect a given pixel would be really slow.

A simplified view of how bit shifts and OR can be used to render characters despite being misaligned with the 8×8 grid.

If you internally store each character in the font as a 16×16 tile, regardless of the actual size of the character, you can optimize the drawing process a lot by writing code capable of taking a portion of the character, offsetting it by a few pixels and pasting it into the buffer. This can be done somewhat efficiently by some clever uses of bit-shifting and ORing the bytes that make up the tile, essentially taking advantage of the tile storage format rather than being encumbered by it. Copying-and-offsetting the characters like this is still several times slower than just simply copying them, but moving the characters closer to each other is sort of the whole point to free up more space on-screen for even more characters.

In the end, you find that the arbitrary-looking character size of 11×14 pixels is suitable; the characters are just barely wide enough for the kanji to be legible. Furthermore, the number of characters on-screen has now increased to 14×3=42, a 40% increase compared to when you had 16×16 pixel characters. Unfortunately, the characters being 14 pixels tall isn’t enough to fit any more lines in, so you resort to displaying them as if they were 11×16 to prevent having to deal with the slow offset management in both the X and Y directions.

Okay, so let’s stop for a second to remind yourself where you are. You have a font with 11×14 pixel characters and a system capable of drawing 16×16 pixel characters as if they were 11×14 pixel characters. Now you just need a way to actually store the font in the game’s ROM and some way to be able to load it.

The simplest way to store the font is to just save the padded 16×16 pixel characters as-is, but oh no, you forgot that you’re still stuck in the ’90s and cartridge space is at a premium. Sure, that extra padding only costs an additional 204 bits per character, but that quickly balloons to almost 13 KB of unused space for the full 512-character font. That’s an unacceptable waste; you desperately need to trim it down.

Problem: the SNES is a 16-bit console (except when it’s an 8-bit console; long story), meaning it deals with values 16 bits at a time. If you were to store your 11×14 font in the most obvious way – 22 consecutive bits per line times 14 lines – you have to jump through various hoops to compensate for each line going in and out of phase with your 16 bit window. It would make your life so much easier if you could ensure that you always dealt with your font in 16-bit chunks, i.e. contiguous 8-pixel segments.

Each 11×14 character is subdivided into 20 8-pixel strips and stored as an 8×20 character. This makes each line 16 bits – the native word length of the SNES – at the cost of 6 redundant pixels (12 bits).

That’s when inspiration strikes: “I shall make it more complicated!” An 11×14 character requires 308 bits to store. That’s 19 groups of 16 bits, plus an extra four bits. If you allow yourself to waste just a tiny bit of space, 12 bits per character, that’s 20 groups of 16 bits. Now, if you can find a way to logically subdivide an 11×14 pixel character into 8-pixel (16-bit) strips, only wasting at most 6 pixels (12 bits), you’re golden. Turns out you can deal with the 8 left-most pixels as 14 individual strips and then cover the rest of the pixels with six vertically rotated strips. Good enough!

Once that’s in place, the characters could now be stored as-is, but it just feels like something is missing. Ah yes, your old friend compression. A lot of compression algorithms used in games at this time focus on encoding stuff like “now there’s a long stretch of zeroes (empty space)”, “now we repeat the same byte/word a few times”, or “this part is exactly like that other part we did a few moments ago”. How would this approach work with your font? Let’s start by taking a look at the numbers and the Latin alphabet.

The alphanumerical portion of the font. Note how a lot of the letters are drawn to be made up of similar portions.

So let’s see, you have the zero which is basically the same row of pixels repeated over and over, save for the top and the bottom. You could probably save a little bit of space there. Oh, and the letter O is identical to the zero so no need to store that. The 1 and the I are identical, save for the very top, so that’s a few more bytes gained. E and F are identical until the very bottom; same goes for O and Q. 0, A, D, H, and U all have identical longer stretches of two vertical white lines, so that can probably be shared. R is P until halfway through, at which point it becomes an H. Yeah, there’s a lot of repetition in this font. Surely, you can use some fancy compression here to save space.

Now repeat the exercise for the kanji characters.

Oh wait, how silly of you. For a moment there you forgot you were a Japanese programmer who doesn’t primarily focus on the Latin alphabet, but rather the Japanese kana and kanji. Okay, these are a bit of a problem. Compression works by taking advantage of repeating patterns and/or data being predictable based on the data that came before. While that might work with the relatively simple Latin characters, that’s less of the case with Japanese ones. Sure, there are a few kana that look similar enough to save a few bytes here and there, but when you look at the kanji, all bets are off. Kanji characters are intricate. From a computer’s perspective, that means they’re chaotic and unpredictable; even more so at lower resolutions. So, seeing as the kanji, half of your font, probably won’t compress well at all with any compression system simple enough to implement into a ’90s Nintendo game, you may be forced to abandon compression altogether.

Hold on a second. You listed three types of compression tricks often used at this point in time. The same bytes/words repeated several times in a row, repetition of past data, and… What was the third one? Ah, yes, encoding large portions of empty space. You look at your font data. It’s currently 20KB, but it’s peppered with tons of 0x00 bytes. In fact, almost 6KB – close to a third of the total data – consists 0x00 bytes! Can you find some simplistic way to prevent having to store all those zeroes? Unfortunately, they’re spread out all over the font, so the RLE approach of “Now there are 17 0x00s, now there are these non-0x00 bytes: AA, BB, CC…” – which would work best for long stretches of 0x00s – might not be as efficient as you’d hope.

Since there’s going to be so many 0x00s, how can you get rid of them while somehow still storing enough information to know where to paste them back in? Actually, what you could do is to add a separate bitmask with a bit for each byte in the original data. If you store a 0 there, it means “this byte is 0x00” while a 1 means “this byte is something else”. The 0x00s can then safely be thrown away altogether, only retaining the non-0x00s. This effectively shrinks 0x00 bytes down to a single bit, at the expense of effectively increasing non-0x00 bytes from 8 to 9 bits. Still, if enough bytes get removed this way, it’ll be worth it. Now, how can you maximize the number of 0x00 bytes?

A visual description of how a four-color 8×8 pixel tile is stored on the SNES. The two bitplanes of the tile are separated and then interleaved byte-wise. Note how the blue part only shows up on odd lines, the red part only shows up on even lines and the white part shows up on both even and odd lines.

This is where a quirk of the SNES graphics format comes into effect. Each pixel in the font can be one of four colors, meaning you need two bits per pixel. How would any sane person store this? Well, you store the first bit and then the second bit of the first pixel, then the first and second bits for for the second pixel, and so on? No, that’s not what the SNES does. Instead, it stores all the first bits for the first set of 8 pixels, then all the second bits for the first set of 8 pixels, then all the first bits for the second set of 8 pixels, and so on. In other words, it splits the two-bits-per-pixel image into two separate one-bit-per-pixel images and interleaves the bytes together. As such, each line of pixels is split into the “low bits” byte and the “high bits” byte.

This split of each bit pair into separate bitplanes turns out to be really useful to you since it happens to massively improve your compression. You see, the four colors of your font are stored as such:

  • 00 – Background
  • 01 – Blue
  • 10 – Red
  • 11 – White

This specific color setup as a very interesting property: If a line contains nothing but the background color, both the line’s bytes will become 0x00 and can be omitted. If a line contains only the background and either blue or red, at least one of the bytes will become 0x00 and that one can be omitted. If a line contains white or both blue and red, both bytes will be non-0x00 and thus neither can be omitted. Now, if you look at the font, you might observe that there are a lot of empty lines, a lot of lines with background-and-blue, and some with background-and-red. These lines are the ones that shrink in size and thus save space.

In your font, almost all lines with white in them also use at least two other colors, usually background and blue. Three colors is impossible to store in a single bit, so that’s why you store white as 11, the least efficient value; you wouldn’t be able to compress those lines well anyway.

Graphical explanation of how the font encoding and compression works. In this example, the original 11×14 character is 308 bits, the encoded 8×20 character is 320 bits, and the compressed character is 144 bits plus the 40 bit bitmask for a total of 184 bits, 40% smaller than the original character. Note how the portion of the 8×20 image that only contains background-and-blue lines results in every other byte being omitted.

All in all, your font of 512 characters – each with 11×14 pixels – would have required 19.25KB if stored naively. With all these techniques you’ve just invented, you’ve managed to reduce it to… err… 16.57KB… which is an, um, mere 14% size reduction saving a grand total of 2.7KB. Okay, I mean, let’s put this into perspective. This is a large game squeezed into 1024KB, so the space you’ve saved corresponds to 0.25% of the total game size. Besides, even if that doesn’t sound too impressive on its own, keep in mind that if you try to squeeze a little bit out of every part of the game, all the little bits and pieces eventually add up. In the end, it’ll hopefully leave you with just enough space for that one extra feature you’d love to squeeze in before you wrap up the game.

Also, the 14% size reduction is compared to the most simplistic storage of the 11×14 characters. If you had instead done the extra-lazy storage of 16×16 characters, that would have instead been 32KB, so maybe you could argue that you’ve saved 48% of the space?

Okay, even if the space savings aren’t that great, the system you’ve just made is pretty darn cool and impressive from a technical standpoint. All things considered, you’re really proud of what you’ve accomplished here. Hm, what’s that? “The Americans are going to rewrite your entire font system from scratch when they localize the game?” Wait, what? “Something that’s way, way simpler?”

Oh.

Tags: , , , , ,