Destroying Virtual Cities for Fun and Profit

I’m a fan of Crowd Control, a service provided by Warp World. It’s an extension for the Twitch live streaming platform that allows viewers to influence the gameplay they are watching. This is done by letting the viewers purchase “coins” that can then be used to trigger specific effects in the game being played by the streamer. These effects range from being helpful (such as extra health or improved items) to malevolent (such as spawning additional enemies or impeding movement) or just silly-but-harmless. Since all games are different, the set of possible effects is game-specific. Regardless of the game and its exact effects, knowing the Chaotic Evil alignment of the internet as a collective, whatever the malevolent side of the equation manifests as per game is typically the more prevalent one.

Since (most) games aren’t designed for this type of audience participation, each game needs its own effect pack, a bespoke piece of software that can talk to Crowd Control, inform it what effects are available, accept incoming effect requests, and provide feedback about whether it was successful.

This past September, I was watching a YouTube video from a charity stream of Cities Skylines. The gimmick was that the player was trying to build a city, but also promised that he would cause a meteor strike for every $1000 raised. I realized that this process, which in this case was manual, could be automated using Crowd Control. Seeing as this charity stream was a success, there clearly existed some largely untapped fling virtual meteors at someone else’s virtual city niche. Knowing that Cities Skylines had an official modding API, surely I could build my own effect pack, right?

Getting Started

Roughly speaking, the way a given Crowd Control effect pack works falls on a spectrum. The two extremes of this spectrum are what I think of as external and internal integrations.

An external integration is one where the effect pack is separate from the game, manipulating it from the outside. A lot of the effect packs for retro console games do this; they alter select parts of the game’s memory as it’s running to change how it behaves. This can be thought of as a sort of viewer-controlled Gameshark or Cheat Engine. This is simultaneously the simpler and more difficult way; it’s conceptually simple to change memory address X to value Y, but finding which address to set to what value to have the desired outcome is the technically challenging part. Additionally, you are fundamentally limited to doing things that the game’s engine already supports.

The other side of the spectrum is an internal integration where you reprogram or otherwise augment the game itself, effectively writing a mod that triggers the desired effects on command. This is significantly more powerful as you are no longer limited to slightly tweaking the existing functionality of the original game; you can actually add new ones. However, it is also more code-intensive and requires one to learn enough about the game engine to integrate your own code with it. The presence of modding infrastructure in the game, as is the case with Cities Skylines, simplifies this type of integration a lot.

So, where do you start with a project like this? Well, let’s outline the basic elements:

  • Learn how to mod the game.
    • This step varies enormously from game to game. Is there any documentation, tutorials, or example code? Do you have to reverse-engineer everything yourself from scratch?
  • Integrate the mod with Crowd Control.
    • One might think the building of a Crowd Control effect pack would involve a lot of Crowd Control integration. On the whole, I found this integration process, while slightly bumpy in my specific case, to be the least of my problems in this project.
  • Decide what effects to build.
    • This is by far the most important part of the project. If the pack has boring effects, nobody would want to use them. If the effects are too chaotic or destructive, the delicate balance between player progression and viewer sabotage is ruined.
    • This part must be thought of in less technical terms than the rest. Yes, you still need to base your effects on what you’re capable of implementing, but it’s fundamentally more about game design. You are designing a second game atop the original one; a game that both the streamer and the viewers are playing together. Balancing, engagement, player psychology, frustration, sense of accomplishment, and all the other aspects of regular game design, are imperative here.
  • Implement the effects.
    • Again, depends on the game.
  • Testing, debugging, fixing bugs, testing, debugging, fixing more bugs, etc.
    • Oh no.
  • Talking to Warp World to get it published.
    • Luckily, they’re pretty chill people.

To me, experimenting with modding the game seemed like the reasonable place to start; even the best Crowd Control integration in the world would be useless if it cannot affect the game. I began by reading the modding documentation on the official community wiki and used it to set up a basic do-nothing mod that the game could recognize and load.

Despite having a basic mod in place, I decided to already take a quick step backwards. Even if I were to implement an effect, how would I trigger it without a working Crowd Control integration? Should I start by researching how to add buttons to the in-game UI? Should I reconsider my choice of not starting with the integration? I quickly realized that none of this mattered; I was in an early prototype phase where elegance is irrelevant. I needed something quick, efficient and utterly disposable. My solution? A text file on my desktop that my mod would read a few times per second. If a specific line of text was in it, empty the file and trigger a specific effect. It’s a caveman-level Crowd Control, but it only took a few minutes to write and, most importantly, allowed me to control what my mod did while it was running.

Disastrous Disasters

Now that I had a working mod and a way to externally trigger it to do stuff, I had a solid foundation to actually implement some effects. Given that this entire endeavor was inspired by someone launching meteors at their city, I felt that this was the right place to start. Let’s check the API documentation for how to trigger a disaster.

Oh no.

Now, don’t get me wrong. I absolutely adore this game and I admire what Colossal Order accomplished despite being a smaller studio. I must however come out and say it: modding this game is a nightmare.

Okay, so my first realization was that the API documentation made no mention of triggering disasters. This was strange to me since I had already checked the contents of the official modding API DLL and found the IDisaster interface, which seemed to contain all the basic functionality for controlling them. Was the documentation simply incomplete? Was it written at launch and not updated once disasters were added? Did they intentionally omit it since disasters require a paid DLC? Another possibility is “Because the disaster API is broken”.

Cities Skylines unfortunately has a number of… um… “quirks” that make modding problematic, at least from a beginner’s point of view:

  • Limited documentation. The information provided on the official wiki is at best bare-bones and didn’t answer most of the questions I came across. I was also unable to find any significant secondary sources of modding information. A lot of what I learned about this game instead came from reverse-engineering the game and enormous amounts of trial-and-error.
  • Limited API. After a closer inspection of the functionality provided by the official modding API, I found it incredibly limited. It didn’t really have a lot of potential for interesting effects. If I wanted anything beyond some bare-bones features, I had to look past the officially supported modding and instead start messing around with the undocumented internals of the game. The modding API instead became a bit of a starting point in my many reverse-engineering expeditions.
  • Broken API. Like I mentioned before, the modding API for disasters doesn’t actually work. To be fair, it is technically undocumented.
  • Threading. Most calls and alterations will crash the game unless they’re done from the correct thread. The documentation provides some clues about what thread is appropriate for what thing, but I had a hard time even getting the official threading constructs to work reliably. In fact, even the dispatch functionality (which is intended to safely transmit calls from one thread to another) seemed to occasionally break when used from one of the threads, forcing me to perform arcane nonsense like “from thread X, starting a new thread Y whose sole purpose is to schedule an action in thread Z”. Why this worked, I still don’t know.

Okay, where was I? Ah yes, the undocumented disasters API didn’t work. By checking the stacktraces and reverse-engineering how the disaster API is implemented, I realized that the game itself fails to initialize one of its own data structures properly. Hence, the first order of business was to use reflection to forcibly alter private data inside the disaster API. Yes, I’ve barely gotten started with my first effect and here I am already repairing broken data caused by a bug in the original game.

(As a brief tangent, the API allowed me to pick the exact coordinate for each disaster. For the longest time, I had a subtle bug where the edges of each area were statistically more likely to be hit. It turns out that the game lies to its players when it claims that each area is 2km2. In fact, the real area is 1.92km2; the sides are 1920m rather than 2000m. I would hereby like to petition Colossal Order to immediately patch in a 4% discount for all in-game area purchases to compensate for the 0.08km2 not provided.)

With the disaster API repaired, I was finally able to fire a meteor at the center of the map by typing some text in a file. Remotely controlled meteors was basically what I set out to make in the first place, so now what? I can’t release a pack with just a single lousy effect. Okay, let’s start with the low-hanging fruit: the rest of the disasters.

An Argument Against Arguments

The thing about disasters is that once you’ve implemented one, you’re most of the way to implementing the rest as they share a common interface. Each disaster has a type (meteor, tornado, earthquake, etc.), name, position, angle and intensity. Since some disasters have a massive difference in effect based on the specified intensity, I initially thought it would be a cute idea to utilize Crowd Control’s support for parameterized effects. When a viewer triggers an effect, they would be able to choose how powerful it would be.

While variable intensity would be cool, it would have required implementing some kind of dynamic pricing for the effect. After all, if a level 1 and a level 10 tornado cost the same, why bother with the weaker one? Also, the intensity value isn’t even used for some effects, so I’d have to implement two different systems, one for parameterized effects and one for constant ones. Worse yet, in the case of tsunamis, they are so slow and unpredictable that even an upper-mid tier one takes forever and may end up not really affecting anything anyway. Allowing anything but the maximum intensity for tsunamis would probably just be a waste. Besides, even if I allow people to control the strength of some disasters, do we really need 100 different options for how destructive each one should be?

In the end, I came to the realization that by adding parameterized disaster intensities, it only serves to clutter and confuse the set of options. Instead, I created two categories, Minor and Major Disasters and hand-picked which disaster at which intensity should go in which, or both. By having, for instance, a “Meteor Strike” and “Major Meteor Strike”, it’s significantly simpler to price them. Additionally, the streamer could independently disable them if, for instance, the major one is a bit too destructive for their tastes.

As software engineers, we often overthink and overgeneralize problems. There is something called the KISS principle that we frequently have to remind ourselves about: “Keep It Simple, Stupid”. Just because I could generalize the disasters and make them more flexible, it came at the cost of making everything more complicated for myself, the streamer and the viewer.

Social Media Nonsense

I seem to be editorializing the timeline slightly, because there was in fact one effect I began implementing before the disasters. It was the first one I started making, yet one of the last ones to be finished. To test whether I could trigger anything remotely, I used some of the readily available API functionality to post a message to Chirper, the in-game social network where your citizens discuss your city. I knew from the start that I wanted some effect involving Chirper, but I didn’t know what. The naïve approach would be to parameterize the effect so that the viewer could type in a message to be displayed in-game. However, having people be able to send any unmoderated and uncensored message into a game being livestreamed would probably be a horrifyingly bad idea.

After a lot of time of not knowing what I wanted to do for the Chirper-based effect, I eventually settled on the idea of allowing the the viewer to get namedropped in a Chirper message. The presumption here is twofold: Twitch usernames are moderated, and people like to leave their marks on things. To make it slightly more interesting, I tried added a few possible posts in which the viewer could be mentioned, one chosen randomly each time, in the hopes that people would notice, be curious what else it could say, and trigger the effect more times. I eventually decided to take it even further and combine this with my interest in procedural generation. Nothing fancy, just a simple set of nested mad-libs style messages. There are a few basic templates, each containing a few slots where I’d randomly pick one of a few possible options, to generate a quasi-unique message mentioning the viewer.

Aside from Crowd Control, Warp World also produces podcasts. One of them, called I Got One, involves coming up with silly business ideas and inventions. Inspired by this, and the fact that I had just implemented a system that can combine words at random, I also added a chance for the message to be about how the viewer had invented a new product, when is then randomly generated from a set of possible words. With almost 200 possible products, including “self-heating ice cream”, “canned cheese”, and “drone-delivered fire”, my hope was that the random silliness would inspire viewers to try this effect multiple times to see what other things they might have invented.

Fix = Hack + Time

City with partially invisible terrain and trees seemingly floating in the void. The text "Oops" is superimposed on top.
The game doesn’t always like when you tamper with it.

The Cities Skylines game engine occasionally comes off as fragile. I’m sure there is a reasonable explanation for everything somewhere, but deducing what exactly that explanation is by reverse-engineering its failures is difficult. Case in point, I had major difficulties getting the “Expand City” effect working reliably. At various points during gameplay, the player gets the option to purchase an additional, ahem, 1.92km2 of land. Given the normal requirements to unlock an additional area, as well as the cost, giving the viewers the ability to trade their coins in exchange for immediately granting the player another area seemed sensible. Unfortunately, it was also the singularly most difficult effect for me to implement, simply because it would cause random errors. Sometimes, these errors wouldn’t even manifest right away, instead causing some problem a minute or two down the line.

A good developer, assuming they are given the appropriate amount of time and resources, typically investigates a bug in detail. They analyze the code to figure out what sequence of events lead up to the error, and in the end either pinpoint a specific coding mistake or which set of features turned out to be incompatible with one another. I normally try to stick to this approach; figure out what exactly is going on so that I can make an informed decision about what I want to do about it. Unfortunately, it does occasionally happen that the deck is sufficiently stacked against me that I’m eventually just forced to give up. Some call I make will, at random, trigger some chain of events that, seconds or minutes down the line, causes a graphical glitch or some nondescript exception being thrown. This code, which I don’t have access to, don’t have documentation about, cannot easily debug, and has already shown indications of being incredibly thread-unsafe, is nigh impossible to track down complex problems in without an enormous time investment.

As the deadline for when the pack’s release date approached, I had to make a choice. Seeing as I wouldn’t have enough time to reverse-engineer enough of the game engine to find the root cause of the bug, I either had to cut the effect altogether, or find a band-aid solution to the bugs. I eventually managed the latter, in that I found a roundabout way of triggering the effect that, according to large amounts of testing on my end, for whatever reason made the issues disappear. This was done by having a several-second long cooldown before triggering it again as consecutive uses seemed to increase the chances of failure, as well as triggering it from another thread that seemed less prone to issues. I don’t know why, but this way of doing it made me unable to reproduce the bug again. There is certainly a correct way to do this, but lacking access to the original source and proper documentation, finding that correct way can be enormously difficult. I can only hope that the large amount of testing I did was representative enough of how the effect will work in practice.

Power Curve

Given the length of this article, I estimate that approximately three years have passed since I mentioned that making Crowd Control effects has everything to do with game design. As a more software oriented person, I sometimes find it difficult to remember the human side of the equation. One of the effects I made early on lasted for most of the project until, a few days before release, I suddenly realized it was completely absurd.

The effect was simple: “Destroy all power plants”. I initially made it to test whether I could selectively destroy specific types of buildings and left it in as a “fun” effect. Oh no, the power is gone. I need to rebuild my power grid. What a humorous turn of events. It took me a long time to understand the enormous, no pun intended, power gap between the player and the viewers in this situation. Unlike a meteor, which may destroy a small portion of a city and necessitate partial reconstruction, destroying all power plants causes an enormous and potentially long-lasting setback for the entirety of the city unless the player reacts immediately and is able to replace them. Combine this with the fact that most power plants are expensive and you get an extremely overpowered and overly destructive effect. Turns out that this probably isn’t fun.

In the end, the effect was significantly weakened; each use of the effect only destroys a single power plant. It’s still creates problem for the city that needs to be solved, but it’s less disastrous on the whole. Furthermore, for the more helpful segment of the viewers, I also added the opposite effect: the ability to spawn a random power plant somewhere in the city.

In the end, the destructive ability still needs to be balanced, though the last step of the way is typically done at the pricing level. I simply try to make sure that the individual effects are sufficiently granular so as to be more easily managed.

Legally Distinct Snap

Cities Skylines is not just a city building game, it’s also a city simulator. Never before have I been reminded of this fact as much as when I was working on this effect.

The “Infinity Snap” effect was initially called something else, but I was advised to change it for legal reasons. Regardless of its name, its effect should be recognizable to most contemporary pop culture consumers. Despite the simplicity of its description, “Kill half the population”, what exactly that meant went through three distinct stages.

My first implementation was simple. Pick half the population at random, then change their status to dead. It worked perfectly, so why didn’t I keep this version of the effect? Because Cities Skylines is also a simulation. Dead bodies don’t just disappear; managing the dead is a fundamental aspect of the game. You need graveyards and/or crematoriums, as well as having enough of the city budget allocated to hearses. Normally, this works fine even for large cities since only a small number of people would die at any given time. Now, imagine a city of a million people where half of them would suddenly die all at once. As I was playtesting this, it took years upon years of in-game time for all the bodies to be properly buried, and that’s even after a massive investment in additional graveyards and an increased hearse budget.

Oh right, the budget. Did I mention the secondary effect? Taxes. There are half as many people in the city paying income tax, and half as many people working in the factories and shops, and half as many people buying and consuming goods. Most of the commercial and industrial sector collapsed, creating an even more unfathomably large dent in the city budget. Normally, this would balance out quickly by a sudden influx of people moving in, but not in this case. No, guess what? Nobody wants to move into an apartment or house that’s still occupied by a dead body. Getting rid of them became a gargantuan bottleneck for the economic recovery of the city. Realistic? Absolutely. Playable? Not in the slightest. A single snap would effectively and quasi-permanently destroy the whole city. This is unusably powerful.

My second approach did the opposite. Instead of killing the citizens, they are merely instantly removed from the game. Same effect, but without the city being full of dead bodies. After all, in the piece of pop culture that this is referencing, no bodies were left behind anyway; they merely fade out of existence. This worked, but was extremely unsatisfying from a gameplay point of view. Some viewer traded their coins for an effect, and all they got was the citizen count and taxes dropping briefly before the almost immediate influx of new citizens moving into the vacant homes quickly brought things back to roughly the same state as before. Nobody would want to pay for that.

The third approach had the same outcome, but did it in a flashier and more visual way. Step one is to kill half the population, in the normal sense. This is also done in a staggered way; they don’t all die simultaneously, but rather spread out over a short period of time so you can see dead bodies popping up here and there in the city. After a brief wait, where the city is covered in skulls indicating dead bodies, we do the same process again, but instead remove the bodies altogether, making the skull icons fade out of existence, one by one. I found this version of the effect more visually appealing; you could feel that something important was happening to the city. However, since the bodies are removed, the city will still more or less recover before too long as new citizens move in, preventing a complete economic ruin of the city.

A graph showing the the education levels climbing from zero to a near-perfect score over the course of thirty years, only to then plummet back to just above zero.
I accidentally the smarts.

The fascinating thing about simulation games is that they often have complex emergent properties that you never consider until you one day lean back and say “Well, that actually makes perfect sense”. It turns out that, aside from a brief economic hiccup following the disappearance of half the population, things don’t go back to exactly the same as they were. The lost individuals actually have things that the new ones don’t. One such aspect is education. Apparently, the game assumes that your city is the only place in the entire world with schools, meaning that whenever I use the Infinity Snap, half of my (presumably well-educated) citizens will disappear and people lacking any education whatsoever will move in to fill in the void.

Pictured above is the result of me using the “Infinity Snap” a few times in a row, waiting slightly between each use. Each time, the total amount of collective education was effectively halved. This turned out to have an enormous secondary effect since my city was well-developed and had many jobs that required higher education. As such, a lot of businesses were forced to close down, impacting the economy. This absolutely makes perfect sense, yet came as a total shock to me as I had never considered these potential ramifications.

I feel like I have inadvertently created the most sophisticated simulation ever of what would happen in a real-world snap scenario.

Oh Right, Crowd Control

It may seem weird that throughout this article about making a Crowd Control pack, I haven’t really talked much about actually integrating with Crowd Control. So, why is that? Well, to be blunt, because it doesn’t really matter.

Yes, the effect pack needs to talk to Crowd Control in order to trigger the effect, but honestly, that’s a technicality. The important part of an effect pack isn’t how you talk to Crowd Control; the important part is what effects you have, how they work, how interesting and balanced they are, and how stable the system is. After that, the integration is merely a formality, and not a particularly interesting one at that. And there is absolutely nothing wrong with that; dull integration between two systems is just a thing that happens a lot in software development. It also doesn’t in any way imply that either of the two systems themselves are uninteresting.

Although, to be honest, that’s slightly misleading. I’m glossing over some of my own experiences in this project in an attempt to prove the general point I’m trying to make. Integration was reasonably, though not perfectly, smooth for me and there are a few Crowd Control-specific notions to keep in mind throughout the entirety of the pack development.

First of all, a few things in the API weren’t completely clear, and the lack of documentation forced me to occasionally ask a few questions to one of the Crowd Control developers. I have since been told that this documentation does in fact exist and that I “just didn’t read it”, so I must have overlooked it in my eagerness to get started.

Second of all, the use of bleeding edge functionality. It was recommended that I use the integration style where the Crowd Control software acts as a TCP server and two-way communication happens through NUL-separated JSON messages. Using this built-in functionality (instead of having to also write the server portion of the effect pack integration myself) seemed like a perfect choice. And it was, aside from me doing it at the exact wrong time. The feature was still under active development, resulting in a time or two where I had to wait a few days for the next version of the SDK to be released that added support for a feature I wanted to use. Still, not a big issue, and one that wouldn’t happen again if I were to make a second pack some day.

However, the singularly most important thing to be aware of is that Crowd Control has a philosophy behind it that the effect packs need to adhere to. Seeing as an effect is triggered in exchange for coins, which in turn are (indirectly) obtained using real-world money, the pack needs to be very careful with its effects. The Crowd Control server requires the effect pack to provide feedback regarding the outcome of the effect. As such, proper error handling is essential; if an effect cannot be run, or it tries to run and fails in any way, the server absolutely needs to know so it can refund the coins spent. (Just gobbling up the coins whenever an error occurs would be a terrible user experience as people would feel cheated.) Likewise, if an event is successful, the server must be told in order to finalize the transaction. Let’s just say that the finally keyword is invaluable for this sort of thing.

If a pack isn’t written from the get-go with the knowledge that feedback to the server must be guaranteed, it can, depending on how the pack is written, be problematic to retrofit it with this functionality after the fact.

Lastly, the publishing of the effect pack. Warp World were made aware of my work on the pack from early on in the project, so the whole thing sort of just flowed naturally. Towards the end, I provided a few betas and release candidates for testing purposes. Once we felt sufficiently reassured that it was working properly, they simply took it from there, set up the appropriate backend stuff and just released it. From my point of view, it was effectively seamless.

Conclusion

So yeah. I made a Crowd Control effect pack. I learned how to integrate with Crowd Control. I learned a bunch of stuff about how Cities Skylines works. I also discovered some seemingly bizarre design decisions in Cities Skylines’ engine that I later realized where actually clever optimizations, which has already inspired the way I write performance-critical game logic. (Let’s just say that there are major fundamental philosophical differences in how you write performant game logic versus the “behemoth-sized enterprise Java”-type systems I’m more accustomed to.)

All in all, it was a fun and interesting journey. Now let’s go and destroy someone’s hard work with giant meteor strikes.

Tags: , ,

3D Camera Experiments

I made a lot of programming experiments on my free time while in elementary school, high school and at the university. The ones that were of a visual nature were typically in 2D. Those of a 3D nature were either primitive sprite-based pseudo-3D, vector-style 3D, or various types of raytracers. All of these used home-made rendering engines, either due to the limitations of the programming languages used or the immense learning curve of the graphics libraries I could find at the time. “Proper” 3D graphics became a pretty big knowledge gap; while I could throw together simple rendering engines from scratch, making something that’s actually fast, useful, and, um, not terrible was still beyond me.

A basic island built in Unity serving as the sandbox for my camera experiments. It additionally serves as a testing ground for me learning to create shaders.

In the last year or so, I have made a few experiments in Unity in an attempt to not only gain some experience with a proper game engine (instead building something from scratch as usual), but also start to get a feel for how to use a “real” 3D system. While the intricate specifics of how rendering pipelines work are still mysteries for the future, I at least have a platform where I can generate and manipulate geometry and have a state-of-the-art renderer display it for me without having to worry too much about the details.

For some reason, “How do you program a camera for a third-person platformer?” was a question that I was curious to see if I could figure out an answer to. I often find that starting with nothing and figuring out things by myself as I work is more educational than just googling what the current state-of-the-art solution is. That solution may be better than anything I can come up with, but struggling with the problem myself gives me a deeper understanding of the fundamental problems that such a solution solves in the first place.

Unless there’s some third option I can’t think of, third-person 3D cameras could be roughly grouped into two categories that I call scripted cameras and autonomous cameras. The former has the level designer provide metadata about how the camera behaves in any given area. The latter tries to figure out on its own how it should behave by analyzing the terrain around the player, possibly with minor hints from the level designer to behave in specific ways at certain points of interest. So far, my camera experiments have been restricted to the scripted cameras category.

To allow me to experiment, I constructed a small world – a vaguely spiral-shaped mountainous island – and implemented extremely simple player movement and physics to allow me to walk around in it. My goal was to make a camera where a level designer can relatively easily configure how the camera moves, where it should focus and define areas where the camera behaves differently (such as around points of interest).

A diagram of a simplistic system where the camera position and angle is calculated based on the player location and the camera path.

My very first attempt involved using a 3D Bézier curve to create a virtual path along which the camera can move. For each frame, the camera’s position is then simply set to whatever point along that curve is closest to the player. The camera angle is then set to point towards the player.

By allowing for multiple separate paths (whatever point on any of the paths is closest to the player becomes the camera location), it’s possible for different areas to have independent camera setups without having to tie all possible camera locations into a single seamless path.

This is a very simple system, but it works well enough for basic purposes. There are some drawbacks to it, however.

The first drawback I noted was that it was somewhat awkward to define the path. Bézier curves are simple in 2D, but become more cumbersome in 3D due to the lack of depth perception. It’s considerably harder to estimate where exactly in 3D space a given point or portion of the line is without moving objects close to it to provide some frame of reference or looking at it from all angles to figure out how it lines up with everything else. Defining a path in 3D space suspended in mid-air felt very much like a trial-and-error process.

The second drawback is the strict correlation between camera position and what part of the path is being used. Consider a situation where a player stands in front of a tall tower. As the player moves closer, we may want to have the camera move to a separate path much farther away to show the scale of the building compared to the player. This becomes problematic as the selection of potential camera positions is directly tied to the camera distance.

Interestingly, both these issues can be solved by adding a level of indirection; we detach the camera from the path itself.

A diagram of how the camera position and angle is calculated based on the declared camera path and pivot, as well as the current player position.

We adjust our system by reinterpreting the paths. They’re no longer the set of camera positions; they are the expected player paths, the approximations of where the player will be. The closest point on the closest path relative to the player is then interpreted as an idealized player position. The camera is positioned relative to this idealized position, but still angled to point towards the actual player position.

By instead defining where we expect the player to go, defining the paths became considerably easier since the paths are always on ground level, making the depth perception problem less of an issue.

The camera position relative to the expected path is controlled by three parameters: a pivot point, a distance and a height. The position is calculated as the point that’s distance units away from the idealized point in the direction going away from the pivot. This calculation is done entirely in the XZ plane, keeping the camera at the same Y coordinate as the idealized point. This is then adjusted by moving the camera upwards by height units.

Example of how the pivot influences the camera movement. The two sets of camera positions (blue and red lines) result from the same expected path (black line) and distance, but with different pivot points (blue and red dots). The thinner lines show the approximate camera angle when the player is near the expected path.

The pivot acts as a secondary focus of sorts for the scene, the player being the primary one. A pivot close to a curved path causes exaggerated camera movements to highlight whatever thing is located at the pivot point, while a remote pivot has a more neutral feel to the camera movement.

Most of the paths in my sandbox world set their pivot to a point in the center of the island, moving the camera out towards the ocean and primarily pointing in towards the center. This creates an almost side-scroller feel to the camera; the player primarily moves to the right with occasional movements towards and away from the camera.

For this version of the experiment, I also added some momentum to smooth out the camera movement. This is most notable when swapping paths, giving the player a chance to react to the change since the movement is relative to the current camera angle.

A portion of the island with three different expected paths (green lines). Their pivots are far to the left, placing the camera on the right of each path. The outermost path has a longer distance and lower height for a mostly horizontal shot. The innermost path has a short distance and a higher height, creating an overhead perspective to prevent the player from being obscured by the terrain. The middle path is similar to the innermost.

This system of defining expected paths and assigning pivots, distances and heights works surprisingly well despite being relatively simple. In this form, it does however have a few annoying limitations. If a portion of an otherwise simple path requires special settings (such as a higher camera height), it would require breaking the path into three pieces: the before and after pieces with the same settings, and the extra piece between them with the different settings. This is awkward, so we add an extra layer of indirection; we add the ability to provide hints, i.e. defining areas where the standard camera settings are overridden.

The inner path is a short path set to a short distance and a higher height, allowing the camera to peek above the rock formation. The outer path has a box region, a hint, that overrides the normal distance and height settings of the path, making the camera move slightly upwards to avoid the outer structure.

Hints can be created using any collider to define its region and setting a new distance and height. Additionally, hints may also choose to override the pivot of the path, changing where the camera focuses. One example where this is used on this island is a dead end where the camera moves to focus on the end of it instead of trying to point the camera towards the center of the island like most other parts.

The path splitting in two, creating a dead end. A box defines the area where the camera should use a new pivot (the arrows at the right side of the box). This makes the camera focus on whatever might be at the end of the path (be it a building, an NPC or what have you).

In addition, the system also has the ability to restrict certain paths to only be considered while the player is within some collider region, allowing finer-grained control over how the camera moves. This was necessary in one spot due to the spiral nature of the map; the major path on the second level was sometimes slightly farther away from a second-level player than a minor path on the first level, making the camera focus on the wrong thing. This was solved by limiting the lower path to only be up for consideration while the player is within a cylindrical area around it, one that does not reach all the way up to the next tier of the spiral.

So yeah, that’s basically it for now. A simplistic third-person camera system that still works reasonably well and that is relatively simple to set up to match the level geometry. It’s not a masterpiece, but it’s a reasonable first attempt. Or, well, second attempt.

Unity is pretty cool. I hope to learn more about it.

Tags: , , ,

So You Want to Make a Font for the SNES

It’s the early ’90s. You’re a developer at Nintendo implementing a font in The Legend of Zelda: A Link to the Past. The game is in Japanese with occasional English text, so your font needs to contain hiragana, katakana and as many kanji characters as you can fit, in addition to the Latin alphabet, punctuation, numbers and whatever game-specific symbols and icons you need. You come to the conclusion that a font of 512 characters might be a decent balance; 256 kanji characters, 160 kana characters and 96 characters for the Latin letters and everything else.

The full font from the Japanese version of A Link to the Past.

When the player talks to a character, you want to display a text box that uses your font. The game is written for the Super Nintendo (SNES), so unless you want to use special high-res graphics modes that impose additional restrictions, you have to make due with the default resolution of 256×224 pixels. To make things worse, you’re still stuck in the ’90s with CRT TVs that tend to crop parts of the image. For this reason, corporate has designated a 224×192 pixel “safe” area that all TVs ought to be able to display. The game should never display important information – like, you know, dialogue – outside that area. Add on top of that that you also need enough space for a border, stylish margins, and the game’s HUD (which the text boxes cannot overlap for technical reasons). Oh, and you also want to keep the player and the character they’re talking to visible on-screen, so you have even less screen real estate to work with. All things considered, you have been given 160×48 pixels to display text in. With such limitations, how do you fit as much text as possible while still keeping it legible?

An example of a text box and its space constraints.

As if you didn’t have enough limitations to worry about, the SNES handles its graphics using 8×8 pixel blocks called tiles. So, if you want to use the console’s native capabilities to display graphics, they need to be in multiples of 8×8 pixels. What character sizes would that permit, and how many characters would you then be able to fit?

Let’s start with 8×8. This would allow for 20×6=120 characters on screen at the same time. That’s a nice chunk of text. Now, can you actually make the characters 8×8 pixels large? Well, for the Latin letters, numbers and punctuation, yes. The kana should probably also fit within that space. The problem is that you want to have kanji characters as well. These characters are considerably more intricate and complex, so there’s absolutely no way to fit most of them into an 8×8 tile without them becoming an illegible mess of pixels. This wouldn’t really work.

Let’s go one size up; what about 8×16? Again you run into the same problem; the kanji is just too complex to fit within a width of 8 pixels. 16×8? Still no, and it would look really weird anyway. This brings you to 16×16. That would give you 10×3=30 characters. I mean, it might do? Japanese requires way fewer characters on average than English anyway. Still, is there a way you can do better?

This is where the cost-benefit analysis comes into the picture. Do you want to use 16×16 pixel characters – which the SNES can natively handle – or do you want to do the rendering manually in exchange for the freedom to use any character sizes you want? Perhaps the extra hassle is worth it, just to fit a little bit more text on screen? You eventually decide that yes, it is worth it.

This is still the SNES, so you are still locked into the 8×8 grid. The difference is that you’re not assembling predefined 8×8 tiles into an image, you’re generating new 8×8 tiles on-the-fly and assembling those into an image. You do this by allocating a buffer in Video RAM (VRAM), enough space for 20×6 tiles or 160×48 pixels, the amount of space you were given on-screen. You can then manually draw whatever you want into this area and display the resulting tiles on screen. This is less efficient since you’re not only using even more space in the SNES’s horrifyingly small VRAM, but you also have to spend precious time on manually drawing your characters into this space. This is actually more tricky than it sounds due to the way the SNES tiles are stored. Even calculating which bits in which bytes to change to affect a given pixel would be really slow.

A simplified view of how bit shifts and OR can be used to render characters despite being misaligned with the 8×8 grid.

If you internally store each character in the font as a 16×16 tile, regardless of the actual size of the character, you can optimize the drawing process a lot by writing code capable of taking a portion of the character, offsetting it by a few pixels and pasting it into the buffer. This can be done somewhat efficiently by some clever uses of bit-shifting and ORing the bytes that make up the tile, essentially taking advantage of the tile storage format rather than being encumbered by it. Copying-and-offsetting the characters like this is still several times slower than just simply copying them, but moving the characters closer to each other is sort of the whole point to free up more space on-screen for even more characters.

In the end, you find that the arbitrary-looking character size of 11×14 pixels is suitable; the characters are just barely wide enough for the kanji to be legible. Furthermore, the number of characters on-screen has now increased to 14×3=42, a 40% increase compared to when you had 16×16 pixel characters. Unfortunately, the characters being 14 pixels tall isn’t enough to fit any more lines in, so you resort to displaying them as if they were 11×16 to prevent having to deal with the slow offset management in both the X and Y directions.

Okay, so let’s stop for a second to remind yourself where you are. You have a font with 11×14 pixel characters and a system capable of drawing 16×16 pixel characters as if they were 11×14 pixel characters. Now you just need a way to actually store the font in the game’s ROM and some way to be able to load it.

The simplest way to store the font is to just save the padded 16×16 pixel characters as-is, but oh no, you forgot that you’re still stuck in the ’90s and cartridge space is at a premium. Sure, that extra padding only costs an additional 204 bits per character, but that quickly balloons to almost 13 KB of unused space for the full 512-character font. That’s an unacceptable waste; you desperately need to trim it down.

Problem: the SNES is a 16-bit console (except when it’s an 8-bit console; long story), meaning it deals with values 16 bits at a time. If you were to store your 11×14 font in the most obvious way – 22 consecutive bits per line times 14 lines – you have to jump through various hoops to compensate for each line going in and out of phase with your 16 bit window. It would make your life so much easier if you could ensure that you always dealt with your font in 16-bit chunks, i.e. contiguous 8-pixel segments.

Each 11×14 character is subdivided into 20 8-pixel strips and stored as an 8×20 character. This makes each line 16 bits – the native word length of the SNES – at the cost of 6 redundant pixels (12 bits).

That’s when inspiration strikes: “I shall make it more complicated!” An 11×14 character requires 308 bits to store. That’s 19 groups of 16 bits, plus an extra four bits. If you allow yourself to waste just a tiny bit of space, 12 bits per character, that’s 20 groups of 16 bits. Now, if you can find a way to logically subdivide an 11×14 pixel character into 8-pixel (16-bit) strips, only wasting at most 6 pixels (12 bits), you’re golden. Turns out you can deal with the 8 left-most pixels as 14 individual strips and then cover the rest of the pixels with six vertically rotated strips. Good enough!

Once that’s in place, the characters could now be stored as-is, but it just feels like something is missing. Ah yes, your old friend compression. A lot of compression algorithms used in games at this time focus on encoding stuff like “now there’s a long stretch of zeroes (empty space)”, “now we repeat the same byte/word a few times”, or “this part is exactly like that other part we did a few moments ago”. How would this approach work with your font? Let’s start by taking a look at the numbers and the Latin alphabet.

The alphanumerical portion of the font. Note how a lot of the letters are drawn to be made up of similar portions.

So let’s see, you have the zero which is basically the same row of pixels repeated over and over, save for the top and the bottom. You could probably save a little bit of space there. Oh, and the letter O is identical to the zero so no need to store that. The 1 and the I are identical, save for the very top, so that’s a few more bytes gained. E and F are identical until the very bottom; same goes for O and Q. 0, A, D, H, and U all have identical longer stretches of two vertical white lines, so that can probably be shared. R is P until halfway through, at which point it becomes an H. Yeah, there’s a lot of repetition in this font. Surely, you can use some fancy compression here to save space.

Now repeat the exercise for the kanji characters.

Oh wait, how silly of you. For a moment there you forgot you were a Japanese programmer who doesn’t primarily focus on the Latin alphabet, but rather the Japanese kana and kanji. Okay, these are a bit of a problem. Compression works by taking advantage of repeating patterns and/or data being predictable based on the data that came before. While that might work with the relatively simple Latin characters, that’s less of the case with Japanese ones. Sure, there are a few kana that look similar enough to save a few bytes here and there, but when you look at the kanji, all bets are off. Kanji characters are intricate. From a computer’s perspective, that means they’re chaotic and unpredictable; even more so at lower resolutions. So, seeing as the kanji, half of your font, probably won’t compress well at all with any compression system simple enough to implement into a ’90s Nintendo game, you may be forced to abandon compression altogether.

Hold on a second. You listed three types of compression tricks often used at this point in time. The same bytes/words repeated several times in a row, repetition of past data, and… What was the third one? Ah, yes, encoding large portions of empty space. You look at your font data. It’s currently 20KB, but it’s peppered with tons of 0x00 bytes. In fact, almost 6KB – close to a third of the total data – consists 0x00 bytes! Can you find some simplistic way to prevent having to store all those zeroes? Unfortunately, they’re spread out all over the font, so the RLE approach of “Now there are 17 0x00s, now there are these non-0x00 bytes: AA, BB, CC…” – which would work best for long stretches of 0x00s – might not be as efficient as you’d hope.

Since there’s going to be so many 0x00s, how can you get rid of them while somehow still storing enough information to know where to paste them back in? Actually, what you could do is to add a separate bitmask with a bit for each byte in the original data. If you store a 0 there, it means “this byte is 0x00” while a 1 means “this byte is something else”. The 0x00s can then safely be thrown away altogether, only retaining the non-0x00s. This effectively shrinks 0x00 bytes down to a single bit, at the expense of effectively increasing non-0x00 bytes from 8 to 9 bits. Still, if enough bytes get removed this way, it’ll be worth it. Now, how can you maximize the number of 0x00 bytes?

A visual description of how a four-color 8×8 pixel tile is stored on the SNES. The two bitplanes of the tile are separated and then interleaved byte-wise. Note how the blue part only shows up on odd lines, the red part only shows up on even lines and the white part shows up on both even and odd lines.

This is where a quirk of the SNES graphics format comes into effect. Each pixel in the font can be one of four colors, meaning you need two bits per pixel. How would any sane person store this? Well, you store the first bit and then the second bit of the first pixel, then the first and second bits for for the second pixel, and so on? No, that’s not what the SNES does. Instead, it stores all the first bits for the first set of 8 pixels, then all the second bits for the first set of 8 pixels, then all the first bits for the second set of 8 pixels, and so on. In other words, it splits the two-bits-per-pixel image into two separate one-bit-per-pixel images and interleaves the bytes together. As such, each line of pixels is split into the “low bits” byte and the “high bits” byte.

This split of each bit pair into separate bitplanes turns out to be really useful to you since it happens to massively improve your compression. You see, the four colors of your font are stored as such:

  • 00 – Background
  • 01 – Blue
  • 10 – Red
  • 11 – White

This specific color setup as a very interesting property: If a line contains nothing but the background color, both the line’s bytes will become 0x00 and can be omitted. If a line contains only the background and either blue or red, at least one of the bytes will become 0x00 and that one can be omitted. If a line contains white or both blue and red, both bytes will be non-0x00 and thus neither can be omitted. Now, if you look at the font, you might observe that there are a lot of empty lines, a lot of lines with background-and-blue, and some with background-and-red. These lines are the ones that shrink in size and thus save space.

In your font, almost all lines with white in them also use at least two other colors, usually background and blue. Three colors is impossible to store in a single bit, so that’s why you store white as 11, the least efficient value; you wouldn’t be able to compress those lines well anyway.

Graphical explanation of how the font encoding and compression works. In this example, the original 11×14 character is 308 bits, the encoded 8×20 character is 320 bits, and the compressed character is 144 bits plus the 40 bit bitmask for a total of 184 bits, 40% smaller than the original character. Note how the portion of the 8×20 image that only contains background-and-blue lines results in every other byte being omitted.

All in all, your font of 512 characters – each with 11×14 pixels – would have required 19.25KB if stored naively. With all these techniques you’ve just invented, you’ve managed to reduce it to… err… 16.57KB… which is an, um, mere 14% size reduction saving a grand total of 2.7KB. Okay, I mean, let’s put this into perspective. This is a large game squeezed into 1024KB, so the space you’ve saved corresponds to 0.25% of the total game size. Besides, even if that doesn’t sound too impressive on its own, keep in mind that if you try to squeeze a little bit out of every part of the game, all the little bits and pieces eventually add up. In the end, it’ll hopefully leave you with just enough space for that one extra feature you’d love to squeeze in before you wrap up the game.

Also, the 14% size reduction is compared to the most simplistic storage of the 11×14 characters. If you had instead done the extra-lazy storage of 16×16 characters, that would have instead been 32KB, so maybe you could argue that you’ve saved 48% of the space?

Okay, even if the space savings aren’t that great, the system you’ve just made is pretty darn cool and impressive from a technical standpoint. All things considered, you’re really proud of what you’ve accomplished here. Hm, what’s that? “The Americans are going to rewrite your entire font system from scratch when they localize the game?” Wait, what? “Something that’s way, way simpler?”

Oh.

Tags: , , , , ,

Procedural Overworld Generation

I sometimes have conversations with others about various game ideas. A lot of these conversations involve procedural generation, a topic I find absolutely fascinating. Procedural generation in its simplest sense refers to when you have a program generating some form of content instead of having it manually designed by a human. A current mainstream example would be Minecraft where the game automatically generates a seemingly infinite unique world for each player.

The creation of worlds or levels is a common use of procedural generation, though many other things can be generated as well. This ranges from more tangible things like graphics or music to the more abstract. An example of the latter would be the AI Director from Left 4 Dead, a system that constantly changes how and when hordes of zombies will attack each time you play a level so as to increase suspense and uncertainty.

Given my personal interest in the topic, I’ve been in numerous discussions about how to generate various different aspects of various different sorts of games, some of which I hope to discuss on this blog in the future.

Today’s showcase is a proof-of-concept procedural generator for overworld maps in the style of games like Super Mario Bros. 3 (SMB3) or Shovel Knight, i.e. level icons connected by a non-linear mesh of orthogonal roads. In this prototype, I use graphics borrowed from SMB3 as a placeholder. Despite this appearance, the generator is not intended to generate actual SMB3 maps (though integrating something similar into an SMB3 randomizer would be cool), but rather maps for an unannounced game idea still in an extremely early “throwing ideas around” phase.

I intend to go into more detail about the actual algorithms behind this generator in a future post, but for now, I’ll show some screenshots and describe some of the things it can do.

A screenshot from a prototype map generation UI. Various settings can be tweaked on the fly to see how they impact the map generation.

As is commonplace in procedural generation, the input to the generator consists of a seed – from which pseudo-randomness is derived – and a set of settings that constrain the generated map in various ways. During gameplay, the seed would be randomly generated whereas the settings would be carefully chosen in advance by the developer. Presumably, a few different sets of settings would be carefully crafted to produce different sorts of maps. These have to balance desired outcome with generation speed as some types of maps are considerably more difficult – i.e. slower – to generate.

Another, slightly smaller, example of a map generated by this system.

The basic structure of the basic road network is controlled by two settings. The first one is called complexity which determines how many additional features – such as branches or small loops – should be added to the network. To illustrate, consider this comparison of a map with a complexity of one and another map generated using the same settings, but with a complexity of five:

Comparison between a low and a high complexity map with the same number of levels.

Of course, the second image looks a bit exaggerated since that amount of road segments for a map with a single level is ridiculous. For a map with numerous levels, the same network would make more sense.

The second road network setting is called connectivity and controls how interconnected the map is. The map that’s initially generated is relatively linear, modulo a few minor branches and cycles as controlled by the complexity. The purpose of connectivity is to then take this semi-linear map and tie it together into a larger loop. This effectively creates a map with a vaguely circular structure and, roughly speaking, two primary routes that one can take through the world. A higher connectivity value would repeat this process more times, although doing so rapidly decreases the chances being able to generate a map. This is due to the risk of the road network being non-planar (which is undesirable for this this type of map) skyrocketing as the interconnectivity increases.

To further illustrate how connectivity affects the map, consider this comparison of a map with a connectivity of zero and another map with a connectivity of one:

A zero-connectivity map showcasing a pseudo-linear network, and a one-connectivity map where the road network was joined into a larger loop.

Aside from the road network settings, one can also control the number of rivers to generate, whether or not to generate a port level on the river, how many shortcut caves to generate (which are here represented as the vaguely analogous SMB3 warp pipes) and lastly how many actual levels to place.

Another generated map where only the left half was joined into a loop, causing a choke point in the middle of the map and disconnected branching on the right.

While the main purpose of this generator is to generate the road network itself, it also has basic biome and decoration generation. For reasons specific to the unannounced game, the map is primarily a dry terrain (with palm trees), but with occasional patches of more lush terrain (bushes) and some other patches of badlands (mushrooms??), though the approach can be adapted for many other scenarios. An additional detail to note is that the rivers partially override the natural biome generation by increasing the probability of lush terrain near them. Likewise, the castle at the end of the world has the opposite effect, increasing the badlands probability near it.

The biomes may look a bit patchy in the images, though that’s partially an artifact of the medium. The biome for any given tile is actually on a spectrum from lush to badlands, but the current SMB3 assets I’m using – with three types of terrain – cause the edges between the biomes to appear aliased. It would be possible to do smoother transitions even with these assets, though putting time into making a system for interpolating the placeholder graphics (which would not necessarily carry over into whatever the final appearance might be) seems a bit unnecessary.

So, that’s what this generator is currently capable of. In a future post, I will go more in depth about how the generator actually works. (Spoiler: it’s actually a pipeline of several different generators to handle the various aspects of the map.)

Tags: , ,

It Begins

Oops, I made a blog.

I often spend my free time working on various small programming projects. These are usually never seen by anyone but myself (and occasionally an extremely small group of people) and eventually just descend into the opaque depths that is my Programming folder. This blog is my attempt at making that folder, and perhaps also the time I spend putting stuff into it, more transparent. My assumption is that there are more people than the few I occasionally talk to that would be interested in some of this stuff.

Instead of waiting until everything is “just perfect” (spoiler: no piece of software is ever perfect), my intent is to post random stuff I do, regardless of how “good”, “useful” or “done” it is. My mental model for this blog is closer to programming diary than peer-reviewed computer science article.

The posts on this blog will probably fall into three categories:

  • Shorter updates or hypothetical musings.
  • Some retrospectives on some older projects of mine.
  • Descriptions of some of my ideas and algorithms.

I would say “stay tuned,” but this is not a TV channel.