(or, It's a Big World Out There)
It's fairly easy to write a simple algorithm for reading and writing
savefiles. Just dump all your data structures in a fixed order, and
then read them in later (although, if your data structures have
pointers, it's a bit more complicated - but that deserves another
article). However, the problem with this simple algorithm is that the
savefiles can be _big_. If you plan to save 100 levels, each 20x80
tiles, and each tile takes 10 bytes, that's 1.6 megabytes right
there... and that's just for the maps.
A Solution: Run-Length Encoding
Fortunately, there is a very simple algorithm that can achieve enormous
compression on the average map. That algorithm is called run-length
encoding. The basic idea is simple. Most of the tiles in your dungeon
will be walls or floors, and these walls and floors will tend to be
grouped together in blocks. Instead of writing one byte to indicate
the type of each tile in your dungeon, you use two bytes to indicate
the type of a sequence of tiles. The first byte indicates how many
consecutive tiles (in a left-to-right, top-to-bottom traverse of the
map) are the type indicated by the second byte. Therefore, the basic
algorithm to compress your dungeon is simple:
Algorithm 1: Compressing a dungeon, take 1
For each tile in the dungeon, in some order
If that tile is the same as the last tile and the tile count is < 255
Increment the tile count
Otherwise
Output the tile count and the type of the last tile
Set the tile count to 1
Output the tile count and the type of the last tile
The algorithm to decompress it is even simpler:
Algorithm 2: Decompressing a dungeon, take 1
Until you reach the end of the dungeon
Read a run count
Read a tile
For a number of tiles equal to the run count
Set that tile to the tile you read
Fixing the Problems
If you try the preceding algorithms, you'll find that they work well,
but there is some room for improvement. Some data in each tile can be
compressed more effectively than with RLE, and some data just isn't
amenable to RLE encoding and tends to break the coding into a
succession of one-tile entries.
Most games keep the data about in a monster's or object's location in
two places: in the data structure representing the monster or object,
and in the data structure representing the map. Because the
information about the monsters and objects on the map has to be saved
anyway, we can completely omit the monster and object information from
the map portion of the savefile. When we read in the savefile, we will
reconstruct the data about monsters and objects on the map from the
data about the monsters and objects themselves. This will save two
two-byte or four-byte fields, and allow the RLE algorithm to compress
the map more effectively.
Some data fields usually have a constant value, or a value that can be
easily computed from other values in the tile structure. We can save
space, and make the RLE more effective, by omitting these values and
only saving the exceptional values in the savefile. The easiest way to
do this is emit a string of X-Y-value groups into the savefile,
followed by a sentinel element:
Algorithm 3: Saving sparse data
For each tile in the dungeon, in some order
Compute the expected value of the data
If the value is not the expected value
Emit the X and Y coordinates of the current tile
Emit the actual value of the data
Emit a sentinel value larger than any possible real X
Algorithm 4: Reading sparse data
For each tile in the dungeon, in some order
Set the value of the data to the expected value
Repeat until the sentinel is read
Read the X coordinate
If the X coordinate is the sentinel, stop
Read the Y coordinate
Read the real value
Set the data at the indicated tile to the real value read
An alternative, if exceptional values are more common, is to emit a
bit-packed array (bit map) indicating which tiles have unusual values,
followed by the values in order.
One final trick, which might reduce the size of savefiles, is to
run-length encode the various fields of the map data separately. If
the fields don't tend to have the same runs, this significantly
reduces the size of the RLE data. On the other hand, it might not. The
only way to find out is to try it and see.
Conclusion
A naive savefile-creating algorithm can easily create savefiles that
are five or ten times larger than necessary. Run length encoding is a
simple, easy to understand and easy to code compression algorithm that
can achieve very high compression ratios on the type of data usually
found in a level map. With a bit of thought, and some experimentation,
run length encoding can make savefiles much easier on your, and
everyone else's, hard drives.
The Author:
Ross Morgan-Linial
rmorganl@fhcrc.org
Current RL project: Untitled roguelike game
Projected beta release: "When it's done"
|