doc/compression.txt

   1
   2 Description of the NTFS (de)compression algorithm (based on a modified LZ77
   3 algorithm)
   4
   5 Copyright (c) 2001 Anton Altaparmakov
   6
   7 This document is published under the GNU General Public License.
   8
   9 Credits: This is based on notes taken from various places (most notably from
  10 Regis Duchesne's NTFS documentation and from various LZ77 descriptions) and
  11 further refined by looking at a few compressed streams to figure out some
  12 uncertainties.
  13
  14 Note: You should also read the runlist description with regards to compression
  15 in linux-ntfs/include/layout.h. Just search for "Attribute compression".
  16 FIXME: Should merge the info from there into this document some time.
  17
  18 Compressed data is organized in logical "compression" blocks (cb). Each cb has
  19 a size (cb_size) of 2^compression_unit clusters. In all versions of Windows,
  20 NTFS (NT/2k/XP, NTFS 1.2-3.1), the only valid compression_unit is 4, IOW, each
  21 cb is 2^4 = 16 clusters in size.
  22
  23 We detect and warn about a compression_unit != 4 but we try to decompress the
  24 data anyway.
  25
  26 Compression is only supported for cluster sizes between 512 and 4096. Thus a
  27 cb can be between 8 and 64kiB in size.
  28
  29 Each cb is independent of the other cbs and is thus the minimal unit we have
  30 to parse even if we wanted to decompress only one byte.
  31
  32 Also, a cb can be totally uncompressed and this would be indicated as a sparse
  33 cb in the runlist.
  34
  35 Thus, we need to look at the runlist of the compressed data stream, starting
  36 at the beginning of the first cb overlapping @page. So we convert the page
  37 offset into units of clusters (vcn), and round the vcn down to a mutliple of
  38 cb_size clusters.
  39
  40 We then scan the runlist for the appropriate position. Based on what we find
  41 there, we decide how to proceed.
  42
  43 If the cb is not compressed at all, and covers the whole of @page, we pretend
  44 to be accessing an uncompressed file, so we fall back to what we do in
  45 aops.c::ntfs_file_readpage(), i.e. we do:
  46         return block_read_full_page(page, ntfs_file_get_block);
  47
  48 If the cb is completely sparse, and covers the whole of @page, we can just
  49 zero out @page and complete the io (set @page up-to-date, unlock it, and
  50 finally return 0).
  51
  52 In all other cases we initiate the decompression engine, but first some more
  53 on the compression algorithm.
  54
  55 Before compression the data of each cb is further divided into 4kiB blocks, we
  56 call them "sub compression" blocks (sb), each including a header specifying
  57 its compressed length. So we could just scan the cb for the first sb
  58 overlapping @page and skip the sbs before that, or we could decompress the
  59 whole cb injecting the superfluous decompressed pages into the page cache as a
  60 form of read ahead (this is what zisofs does for example).
  61
  62 In either case, we then need to read and decompress all sbs overlapping @page,
  63 potentially having to decompress one or more other cbs, too.
  64
  65 As soon as @page is completed we could either stop or continue until we finish
  66 the current cb, injecting pages as we go along (again following the zisofs
  67 example).
  68
  69 Because the sbs follow each other directly, we need to actually read in the
  70 whole cb in order to be able to scan through the cb to find the first sb
  71 overlapping @page, so it does make sense to follow the zisofs approach of
  72 decompressing the whole cb and injecting pages as we go along. So all
  73 discussion from now on will assume that we are going to do that. Although it
  74 might make sense not to decompress any sbs locate before @page because this
  75 would be a kind of "read-behind" which is probably silly, unless someone is
  76 reading the file backwards. Performing read-ahead by decompressing all sbs
  77 following @page OTOH, is very likely to be a good idea.
  78
  79 So, we read the whole cb from disk and start at the first sb.
  80
  81 As mentioned above, each sb is started with a header. The header is 16 bits of
  82 which the lower twelve bits (i.e. bits 0 to 11) are the length (L) - 3 of the
  83 sb (including the two bytes for the header itself, or L - 1 not counting the
  84 two bytes for the header). The higher four bits are set to 1011 (0xb) by the
  85 compressor for a compressed block, or to 0000 for an uncompressed block, but
  86 the decompressor only checks the most significant bit taking a 1 to signify a
  87 compressed block, and a 0 an uncompressed block.
  88
  89 So from the header we know how many compressed bytes we need to decompress to
  90 obtain the next 4kiB of uncompressed data and if we didn't want to decompress
  91 this sb we could just seek to the next next one using the length read from the
  92 header. We could then continue seeking until we reach the first sb overlapping
  93 @page.
  94
  95 In either case, we will reach a sb which we want to decompress.
  96
  97 Having dealt with the 16-bit header of the sb, we now have length bytes of
  98 compressed data to decompress. This compressed stream is further split into
  99 tokens which are organized into groups of eight tokens. Each token group (tg)
 100 starts with a tag byte, which is an eight bit bitmap, the bits specifying the
 101 type of each of the following eight tokens. The least significant bit (LSB)
 102 corresponds to the first token and the most significant bit (MSB) corresponds
 103 to the last token.
 104
 105 The two types of tokens are symbol tokens, specified by a zero bit, and phrase
 106 tokens, specified by a set bit.
 107
 108 A symbol token (st) is a single byte and is to be taken literally and copied
 109 into the sliding window (the decompressed data).
 110
 111 A phrase token (pt) is a pointer back into the sliding window (in bytes),
 112 together with a length (again in bytes), starting at the byte the back pointer
 113 is pointing to. Thus a phrase token defines a sequence of bytes in the sliding
 114 window which need to be copied at the current position into the sliding window
 115 (the decompressed data stream).
 116
 117 Each pt consists of 2 bytes split into the back pointer (p) and the length (l),
 118 each of variable bit width (but the sum of the widths of p and l is fixed at
 119 16 bits). p is at least 4 bits and l is at most 12 bits.
 120
 121 The most significant bits contain the back pointer (p), while the least
 122 significant bits contain the length (l).
 123
 124 l is actually stored as the number of bytes minus 3 (unsigned) as anything
 125 shorter than that would be at least as long as the 2 bytes needed for the
 126 actual pt, so no compression would be achieved.
 127
 128 p is stored as the positive number of bytes minus 1 (unsigned) as going zero
 129 bytes back is meaningless.
 130
 131 Note that decompression has to occur byte by byte, as it is possible that some
 132 of the bytes pointed to by the pt will only be generated in the sliding window
 133 as the byte sequence pointed to by the pt is being copied into it!
 134
 135 To give a concrete example; a block full of the letter A would be compressed
 136 by storing the byte A once as a symbol token, followed by a single phrase
 137 token with back pointer -1 (p = 0, therefore go back by -(0 + 1) bytes) and
 138 length 4095 (l=0xffc, therefore length 0xffc + 3 bytes).
 139
 140 The widths of p and l are determined from the current position within the
 141 decompressed data (cur_pos). We don't actually care about the widths as such
 142 however, but instead we want the mask (l_mask) with which to AND the pt to
 143 obtain l, and the number of bits (p_shift) by which to right shift the pt to
 144 obtain p. These are determined using the following algorithm:
 145
 146 for (i = cur_pos, l_mask = 0xfff, p_shift = 12; i >= 0x10; i >>= 1) {
 147         l_mask >>= 1;
 148         p_shift--;
 149 }
 150
 151 Note, that as usual in NTFS, the sb header, as well as each pt, are stored in
 152 little endian format.
 153