Roguelike News

The articles in the Roguelike
Development Topic list have been
generously donated by very kind
individuals. All rights remain
with the authors.

Main | News | Development | Archives | Articles
Links | FAQs | Contact : rogue@skoardy.demon.co.uk
Compressing Savefiles - Ross Morgan-Linial [rmorganl@fhcrc.org].
(or, It's a Big World Out There)

It's fairly easy to write a simple algorithm for reading and writing
savefiles. Just dump all your data structures in a fixed order, and 
then read them in later (although, if your data structures have 
pointers, it's a bit more complicated - but that deserves another 
article). However, the problem with this simple algorithm is that the 
savefiles can be _big_. If you plan to save 100 levels, each 20x80 
tiles, and each tile takes 10 bytes, that's 1.6 megabytes right 
there... and that's just for the maps.


A Solution: Run-Length Encoding

Fortunately, there is a very simple algorithm that can achieve enormous
compression on the average map. That algorithm is called run-length 
encoding. The basic idea is simple. Most of the tiles in your dungeon 
will be walls or floors, and these walls and floors will tend to be 
grouped together in blocks. Instead of writing one byte to indicate 
the type of each tile in your dungeon, you use two bytes to indicate 
the type of a sequence of tiles. The first byte indicates how many 
consecutive tiles (in a left-to-right, top-to-bottom traverse of the 
map) are the type indicated by the second byte. Therefore, the basic 
algorithm to compress your dungeon is simple:

Algorithm 1: Compressing a dungeon, take 1
   For each tile in the dungeon, in some order
      If that tile is the same as the last tile and the tile count is < 255
         Increment the tile count
      Otherwise
         Output the tile count and the type of the last tile
         Set the tile count to 1
   Output the tile count and the type of the last tile

The algorithm to decompress it is even simpler:

Algorithm 2: Decompressing a dungeon, take 1
   Until you reach the end of the dungeon
      Read a run count
      Read a tile
      For a number of tiles equal to the run count
         Set that tile to the tile you read


Fixing the Problems

If you try the preceding algorithms, you'll find that they work well, 
but there is some room for improvement. Some data in each tile can be 
compressed more effectively than with RLE, and some data just isn't 
amenable to RLE encoding and tends to break the coding into a 
succession of one-tile entries.

Most games keep the data about in a monster's or object's location in 
two places: in the data structure representing the monster or object, 
and in the data structure representing the map. Because the 
information about the monsters and objects on the map has to be saved 
anyway, we can completely omit the monster and object information from 
the map portion of the savefile. When we read in the savefile, we will 
reconstruct the data about monsters and objects on the map from the 
data about the monsters and objects themselves. This will save two 
two-byte or four-byte fields, and allow the RLE algorithm to compress 
the map more effectively.

Some data fields usually have a constant value, or a value that can be 
easily computed from other values in the tile structure. We can save 
space, and make the RLE more effective, by omitting these values and 
only saving the exceptional values in the savefile. The easiest way to 
do this is emit a string of X-Y-value groups into the savefile, 
followed by a sentinel element:

Algorithm 3: Saving sparse data
   For each tile in the dungeon, in some order
      Compute the expected value of the data
      If the value is not the expected value
         Emit the X and Y coordinates of the current tile
         Emit the actual value of the data
   Emit a sentinel value larger than any possible real X

Algorithm 4: Reading sparse data
   For each tile in the dungeon, in some order
      Set the value of the data to the expected value
   Repeat until the sentinel is read
      Read the X coordinate
      If the X coordinate is the sentinel, stop
      Read the Y coordinate
      Read the real value
      Set the data at the indicated tile to the real value read

An alternative, if exceptional values are more common, is to emit a 
bit-packed array (bit map) indicating which tiles have unusual values, 
followed by the values in order.

One final trick, which might reduce the size of savefiles, is to 
run-length encode the various fields of the map data separately. If 
the fields don't tend to have the same runs, this significantly 
reduces the size of the RLE data. On the other hand, it might not. The 
only way to find out is to try it and see.


Conclusion

A naive savefile-creating algorithm can easily create savefiles that 
are five or ten times larger than necessary. Run length encoding is a 
simple, easy to understand and easy to code compression algorithm that 
can achieve very high compression ratios on the type of data usually 
found in a level map. With a bit of thought, and some experimentation, 
run length encoding can make savefiles much easier on your, and 
everyone else's, hard drives.


The Author:
Ross Morgan-Linial
rmorganl@fhcrc.org
Current RL project: Untitled roguelike game
Projected beta release: "When it's done"
© Copyright 1999 Darren Hebden.