drivers/lib/bzip2/manual.texi

   1 \input texinfo  @c                                  -*- Texinfo -*-
   2 @setfilename bzip2.info
   3
   4 @ignore
   5 This file documents bzip2 version 1.0, and associated library
   6 libbzip2, written by Julian Seward (jseward@acm.org).
   7
   8 Copyright (C) 1996-2000 Julian R Seward
   9
  10 Permission is granted to make and distribute verbatim copies of
  11 this manual provided the copyright notice and this permission notice
  12 are preserved on all copies.
  13
  14 Permission is granted to copy and distribute translations of this manual
  15 into another language, under the above conditions for verbatim copies.
  16 @end ignore
  17
  18 @ifinfo
  19 @format
  20 START-INFO-DIR-ENTRY
  21 * Bzip2: (bzip2).               A program and library for data compression.
  22 END-INFO-DIR-ENTRY
  23 @end format
  24
  25 @end ifinfo
  26
  27 @iftex
  28 @c @finalout
  29 @settitle bzip2 and libbzip2
  30 @titlepage
  31 @title bzip2 and libbzip2
  32 @subtitle a program and library for data compression
  33 @subtitle copyright (C) 1996-2000 Julian Seward
  34 @subtitle version 1.0 of 21 March 2000
  35 @author Julian Seward
  36
  37 @end titlepage
  38
  39 @parindent 0mm
  40 @parskip 2mm
  41
  42 @end iftex
  43 @node Top, Overview, (dir), (dir)
  44
  45 This program, @code{bzip2},
  46 and associated library @code{libbzip2}, are
  47 Copyright (C) 1996-2000 Julian R Seward.  All rights reserved.
  48
  49 Redistribution and use in source and binary forms, with or without
  50 modification, are permitted provided that the following conditions
  51 are met:
  52 @itemize @bullet
  53 @item
  54    Redistributions of source code must retain the above copyright
  55    notice, this list of conditions and the following disclaimer.
  56 @item
  57    The origin of this software must not be misrepresented; you must
  58    not claim that you wrote the original software.  If you use this
  59    software in a product, an acknowledgment in the product
  60    documentation would be appreciated but is not required.
  61 @item
  62    Altered source versions must be plainly marked as such, and must
  63    not be misrepresented as being the original software.
  64 @item
  65    The name of the author may not be used to endorse or promote
  66    products derived from this software without specific prior written
  67    permission.
  68 @end itemize
  69 THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
  70 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  71 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  72 ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
  73 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  74 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
  75 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  76 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  77 WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  78 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  79 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  80
  81 Julian Seward, Cambridge, UK.
  82
  83 @code{jseward@@acm.org}
  84
  85 @code{http://sourceware.cygnus.com/bzip2}
  86
  87 @code{http://www.cacheprof.org}
  88
  89 @code{http://www.muraroa.demon.co.uk}
  90
  91 @code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000.
  92
  93 PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
  94 algorithms.  However, I do not have the resources available to carry out
  95 a full patent search.  Therefore I cannot give any guarantee of the
  96 above statement.
  97
  98
  99
 100
 101
 102
 103
 104 @node Overview, Implementation, Top, Top
 105 @chapter Introduction
 106
 107 @code{bzip2}  compresses  files  using the Burrows-Wheeler
 108 block-sorting text compression algorithm,  and  Huffman  coding.
 109 Compression  is  generally  considerably  better than that
 110 achieved by more conventional LZ77/LZ78-based compressors,
 111 and  approaches  the performance of the PPM family of statistical compressors.
 112
 113 @code{bzip2} is built on top of @code{libbzip2}, a flexible library
 114 for handling compressed data in the @code{bzip2} format.  This manual
 115 describes both how to use the program and
 116 how to work with the library interface.  Most of the
 117 manual is devoted to this library, not the program,
 118 which is good news if your interest is only in the program.
 119
 120 Chapter 2 describes how to use @code{bzip2}; this is the only part
 121 you need to read if you just want to know how to operate the program.
 122 Chapter 3 describes the programming interfaces in detail, and
 123 Chapter 4 records some miscellaneous notes which I thought
 124 ought to be recorded somewhere.
 125
 126
 127 @chapter How to use @code{bzip2}
 128
 129 This chapter contains a copy of the @code{bzip2} man page,
 130 and nothing else.
 131
 132 @quotation
 133
 134 @unnumberedsubsubsec NAME
 135 @itemize
 136 @item @code{bzip2}, @code{bunzip2}
 137 - a block-sorting file compressor, v1.0
 138 @item @code{bzcat}
 139 - decompresses files to stdout
 140 @item @code{bzip2recover}
 141 - recovers data from damaged bzip2 files
 142 @end itemize
 143
 144 @unnumberedsubsubsec SYNOPSIS
 145 @itemize
 146 @item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ...  ]
 147 @item @code{bunzip2} [ -fkvsVL ] [ filenames ...  ]
 148 @item @code{bzcat} [ -s ] [ filenames ...  ]
 149 @item @code{bzip2recover} filename
 150 @end itemize
 151
 152 @unnumberedsubsubsec DESCRIPTION
 153
 154 @code{bzip2} compresses files using the Burrows-Wheeler block sorting
 155 text compression algorithm, and Huffman coding.  Compression is
 156 generally considerably better than that achieved by more conventional
 157 LZ77/LZ78-based compressors, and approaches the performance of the PPM
 158 family of statistical compressors.
 159
 160 The command-line options are deliberately very similar to those of GNU
 161 @code{gzip}, but they are not identical.
 162
 163 @code{bzip2} expects a list of file names to accompany the command-line
 164 flags.  Each file is replaced by a compressed version of itself, with
 165 the name @code{original_name.bz2}.  Each compressed file has the same
 166 modification date, permissions, and, when possible, ownership as the
 167 corresponding original, so that these properties can be correctly
 168 restored at decompression time.  File name handling is naive in the
 169 sense that there is no mechanism for preserving original file names,
 170 permissions, ownerships or dates in filesystems which lack these
 171 concepts, or have serious file name length restrictions, such as MS-DOS.
 172
 173 @code{bzip2} and @code{bunzip2} will by default not overwrite existing
 174 files.  If you want this to happen, specify the @code{-f} flag.
 175
 176 If no file names are specified, @code{bzip2} compresses from standard
 177 input to standard output.  In this case, @code{bzip2} will decline to
 178 write compressed output to a terminal, as this would be entirely
 179 incomprehensible and therefore pointless.
 180
 181 @code{bunzip2} (or @code{bzip2 -d}) decompresses all
 182 specified files.  Files which were not created by @code{bzip2}
 183 will be detected and ignored, and a warning issued.
 184 @code{bzip2} attempts to guess the filename for the decompressed file
 185 from that of the compressed file as follows:
 186 @itemize
 187 @item @code{filename.bz2 } becomes @code{filename}
 188 @item @code{filename.bz  } becomes @code{filename}
 189 @item @code{filename.tbz2} becomes @code{filename.tar}
 190 @item @code{filename.tbz } becomes @code{filename.tar}
 191 @item @code{anyothername } becomes @code{anyothername.out}
 192 @end itemize
 193 If the file does not end in one of the recognised endings,
 194 @code{.bz2}, @code{.bz},
 195 @code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot
 196 guess the name of the original file, and uses the original name
 197 with @code{.out} appended.
 198
 199 As with compression, supplying no
 200 filenames causes decompression from standard input to standard output.
 201
 202 @code{bunzip2} will correctly decompress a file which is the
 203 concatenation of two or more compressed files.  The result is the
 204 concatenation of the corresponding uncompressed files.  Integrity
 205 testing (@code{-t}) of concatenated compressed files is also supported.
 206
 207 You can also compress or decompress files to the standard output by
 208 giving the @code{-c} flag.  Multiple files may be compressed and
 209 decompressed like this.  The resulting outputs are fed sequentially to
 210 stdout.  Compression of multiple files in this manner generates a stream
 211 containing multiple compressed file representations.  Such a stream
 212 can be decompressed correctly only by @code{bzip2} version 0.9.0 or
 213 later.  Earlier versions of @code{bzip2} will stop after decompressing
 214 the first file in the stream.
 215
 216 @code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to
 217 the standard output.
 218
 219 @code{bzip2} will read arguments from the environment variables
 220 @code{BZIP2} and @code{BZIP}, in that order, and will process them
 221 before any arguments read from the command line.  This gives a
 222 convenient way to supply default arguments.
 223
 224 Compression is always performed, even if the compressed file is slightly
 225 larger than the original.  Files of less than about one hundred bytes
 226 tend to get larger, since the compression mechanism has a constant
 227 overhead in the region of 50 bytes.  Random data (including the output
 228 of most file compressors) is coded at about 8.05 bits per byte, giving
 229 an expansion of around 0.5%.
 230
 231 As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to
 232 make sure that the decompressed version of a file is identical to the
 233 original.  This guards against corruption of the compressed data, and
 234 against undetected bugs in @code{bzip2} (hopefully very unlikely).  The
 235 chances of data corruption going undetected is microscopic, about one
 236 chance in four billion for each file processed.  Be aware, though, that
 237 the check occurs upon decompression, so it can only tell you that
 238 something is wrong.  It can't help you recover the original uncompressed
 239 data.  You can use @code{bzip2recover} to try to recover data from
 240 damaged files.
 241
 242 Return values: 0 for a normal exit, 1 for environmental problems (file
 243 not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
 244 compressed file, 3 for an internal consistency error (eg, bug) which
 245 caused @code{bzip2} to panic.
 246
 247
 248 @unnumberedsubsubsec OPTIONS
 249 @table @code
 250 @item -c  --stdout
 251 Compress or decompress to standard output.
 252 @item -d  --decompress
 253 Force decompression.  @code{bzip2}, @code{bunzip2} and @code{bzcat} are
 254 really the same program, and the decision about what actions to take is
 255 done on the basis of which name is used.  This flag overrides that
 256 mechanism, and forces bzip2 to decompress.
 257 @item -z --compress
 258 The complement to @code{-d}: forces compression, regardless of the
 259 invokation name.
 260 @item -t --test
 261 Check integrity of the specified file(s), but don't decompress them.
 262 This really performs a trial decompression and throws away the result.
 263 @item -f --force
 264 Force overwrite of output files.  Normally, @code{bzip2} will not overwrite
 265 existing output files.  Also forces @code{bzip2} to break hard links
 266 to files, which it otherwise wouldn't do.
 267 @item -k --keep
 268 Keep (don't delete) input files during compression
 269 or decompression.
 270 @item -s --small
 271 Reduce memory usage, for compression, decompression and testing.  Files
 272 are decompressed and tested using a modified algorithm which only
 273 requires 2.5 bytes per block byte.  This means any file can be
 274 decompressed in 2300k of memory, albeit at about half the normal speed.
 275
 276 During compression, @code{-s} selects a block size of 200k, which limits
 277 memory use to around the same figure, at the expense of your compression
 278 ratio.  In short, if your machine is low on memory (8 megabytes or
 279 less), use -s for everything.  See MEMORY MANAGEMENT below.
 280 @item -q --quiet
 281 Suppress non-essential warning messages.  Messages pertaining to
 282 I/O errors and other critical events will not be suppressed.
 283 @item -v --verbose
 284 Verbose mode -- show the compression ratio for each file processed.
 285 Further @code{-v}'s increase the verbosity level, spewing out lots of
 286 information which is primarily of interest for diagnostic purposes.
 287 @item -L --license -V --version
 288 Display the software version, license terms and conditions.
 289 @item -1 to -9
 290 Set the block size to 100 k, 200 k ..  900 k when compressing.  Has no
 291 effect when decompressing.  See MEMORY MANAGEMENT below.
 292 @item --
 293 Treats all subsequent arguments as file names, even if they start
 294 with a dash.  This is so you can handle files with names beginning
 295 with a dash, for example: @code{bzip2 -- -myfilename}.
 296 @item --repetitive-fast
 297 @item --repetitive-best
 298 These flags are redundant in versions 0.9.5 and above.  They provided
 299 some coarse control over the behaviour of the sorting algorithm in
 300 earlier versions, which was sometimes useful.  0.9.5 and above have an
 301 improved algorithm which renders these flags irrelevant.
 302 @end table
 303
 304
 305 @unnumberedsubsubsec MEMORY MANAGEMENT
 306
 307 @code{bzip2} compresses large files in blocks.  The block size affects
 308 both the compression ratio achieved, and the amount of memory needed for
 309 compression and decompression.  The flags @code{-1} through @code{-9}
 310 specify the block size to be 100,000 bytes through 900,000 bytes (the
 311 default) respectively.  At decompression time, the block size used for
 312 compression is read from the header of the compressed file, and
 313 @code{bunzip2} then allocates itself just enough memory to decompress
 314 the file.  Since block sizes are stored in compressed files, it follows
 315 that the flags @code{-1} to @code{-9} are irrelevant to and so ignored
 316 during decompression.
 317
 318 Compression and decompression requirements, in bytes, can be estimated
 319 as:
 320 @example
 321      Compression:   400k + ( 8 x block size )
 322
 323      Decompression: 100k + ( 4 x block size ), or
 324                     100k + ( 2.5 x block size )
 325 @end example
 326 Larger block sizes give rapidly diminishing marginal returns.  Most of
 327 the compression comes from the first two or three hundred k of block
 328 size, a fact worth bearing in mind when using @code{bzip2} on small machines.
 329 It is also important to appreciate that the decompression memory
 330 requirement is set at compression time by the choice of block size.
 331
 332 For files compressed with the default 900k block size, @code{bunzip2}
 333 will require about 3700 kbytes to decompress.  To support decompression
 334 of any file on a 4 megabyte machine, @code{bunzip2} has an option to
 335 decompress using approximately half this amount of memory, about 2300
 336 kbytes.  Decompression speed is also halved, so you should use this
 337 option only where necessary.  The relevant flag is @code{-s}.
 338
 339 In general, try and use the largest block size memory constraints allow,
 340 since that maximises the compression achieved.  Compression and
 341 decompression speed are virtually unaffected by block size.
 342
 343 Another significant point applies to files which fit in a single block
 344 -- that means most files you'd encounter using a large block size.  The
 345 amount of real memory touched is proportional to the size of the file,
 346 since the file is smaller than a block.  For example, compressing a file
 347 20,000 bytes long with the flag @code{-9} will cause the compressor to
 348 allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
 349 kbytes of it.  Similarly, the decompressor will allocate 3700k but only
 350 touch 100k + 20000 * 4 = 180 kbytes.
 351
 352 Here is a table which summarises the maximum memory usage for different
 353 block sizes.  Also recorded is the total compressed size for 14 files of
 354 the Calgary Text Compression Corpus totalling 3,141,622 bytes.  This
 355 column gives some feel for how compression varies with block size.
 356 These figures tend to understate the advantage of larger block sizes for
 357 larger files, since the Corpus is dominated by smaller files.
 358 @example
 359           Compress   Decompress   Decompress   Corpus
 360    Flag     usage      usage       -s usage     Size
 361
 362     -1      1200k       500k         350k      914704
 363     -2      2000k       900k         600k      877703
 364     -3      2800k      1300k         850k      860338
 365     -4      3600k      1700k        1100k      846899
 366     -5      4400k      2100k        1350k      845160
 367     -6      5200k      2500k        1600k      838626
 368     -7      6100k      2900k        1850k      834096
 369     -8      6800k      3300k        2100k      828642
 370     -9      7600k      3700k        2350k      828642
 371 @end example
 372
 373 @unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES
 374
 375 @code{bzip2} compresses files in blocks, usually 900kbytes long.  Each
 376 block is handled independently.  If a media or transmission error causes
 377 a multi-block @code{.bz2} file to become damaged, it may be possible to
 378 recover data from the undamaged blocks in the file.
 379
 380 The compressed representation of each block is delimited by a 48-bit
 381 pattern, which makes it possible to find the block boundaries with
 382 reasonable certainty.  Each block also carries its own 32-bit CRC, so
 383 damaged blocks can be distinguished from undamaged ones.
 384
 385 @code{bzip2recover} is a simple program whose purpose is to search for
 386 blocks in @code{.bz2} files, and write each block out into its own
 387 @code{.bz2} file.  You can then use @code{bzip2 -t} to test the
 388 integrity of the resulting files, and decompress those which are
 389 undamaged.
 390
 391 @code{bzip2recover}
 392 takes a single argument, the name of the damaged file,
 393 and writes a number of files @code{rec0001file.bz2},
 394        @code{rec0002file.bz2}, etc, containing the  extracted  blocks.
 395        The  output  filenames  are  designed  so  that the use of
 396        wildcards in subsequent processing -- for example,
 397 @code{bzip2 -dc  rec*file.bz2 > recovered_data} -- lists the files in
 398        the correct order.
 399
 400 @code{bzip2recover} should be of most use dealing with large @code{.bz2}
 401        files,  as  these will contain many blocks.  It is clearly
 402        futile to use it on damaged single-block  files,  since  a
 403        damaged  block  cannot  be recovered.  If you wish to minimise
 404 any potential data loss through media  or  transmission errors,
 405 you might consider compressing with a smaller
 406        block size.
 407
 408
 409 @unnumberedsubsubsec PERFORMANCE NOTES
 410
 411 The sorting phase of compression gathers together similar strings in the
 412 file.  Because of this, files containing very long runs of repeated
 413 symbols, like "aabaabaabaab ..."  (repeated several hundred times) may
 414 compress more slowly than normal.  Versions 0.9.5 and above fare much
 415 better than previous versions in this respect.  The ratio between
 416 worst-case and average-case compression time is in the region of 10:1.
 417 For previous versions, this figure was more like 100:1.  You can use the
 418 @code{-vvvv} option to monitor progress in great detail, if you want.
 419
 420 Decompression speed is unaffected by these phenomena.
 421
 422 @code{bzip2} usually allocates several megabytes of memory to operate
 423 in, and then charges all over it in a fairly random fashion.  This means
 424 that performance, both for compressing and decompressing, is largely
 425 determined by the speed at which your machine can service cache misses.
 426 Because of this, small changes to the code to reduce the miss rate have
 427 been observed to give disproportionately large performance improvements.
 428 I imagine @code{bzip2} will perform best on machines with very large
 429 caches.
 430
 431
 432 @unnumberedsubsubsec CAVEATS
 433
 434 I/O error messages are not as helpful as they could be.  @code{bzip2}
 435 tries hard to detect I/O errors and exit cleanly, but the details of
 436 what the problem is sometimes seem rather misleading.
 437
 438 This manual page pertains to version 1.0 of @code{bzip2}.  Compressed
 439 data created by this version is entirely forwards and backwards
 440 compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
 441 0.9.5, but with the following exception: 0.9.0 and above can correctly
 442 decompress multiple concatenated compressed files.  0.1pl2 cannot do
 443 this; it will stop after decompressing just the first file in the
 444 stream.
 445
 446 @code{bzip2recover} uses 32-bit integers to represent bit positions in
 447 compressed files, so it cannot handle compressed files more than 512
 448 megabytes long.  This could easily be fixed.
 449
 450
 451 @unnumberedsubsubsec AUTHOR
 452 Julian Seward, @code{jseward@@acm.org}.
 453
 454 The ideas embodied in @code{bzip2} are due to (at least) the following
 455 people: Michael Burrows and David Wheeler (for the block sorting
 456 transformation), David Wheeler (again, for the Huffman coder), Peter
 457 Fenwick (for the structured coding model in the original @code{bzip},
 458 and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
 459 (for the arithmetic coder in the original @code{bzip}).  I am much
 460 indebted for their help, support and advice.  See the manual in the
 461 source distribution for pointers to sources of documentation.  Christian
 462 von Roques encouraged me to look for faster sorting algorithms, so as to
 463 speed up compression.  Bela Lubkin encouraged me to improve the
 464 worst-case compression performance.  Many people sent patches, helped
 465 with portability problems, lent machines, gave advice and were generally
 466 helpful.
 467
 468 @end quotation
 469
 470
 471
 472
 473 @chapter Programming with @code{libbzip2}
 474
 475 This chapter describes the programming interface to @code{libbzip2}.
 476
 477 For general background information, particularly about memory
 478 use and performance aspects, you'd be well advised to read Chapter 2
 479 as well.
 480
 481 @section Top-level structure
 482
 483 @code{libbzip2} is a flexible library for compressing and decompressing
 484 data in the @code{bzip2} data format.  Although packaged as a single
 485 entity, it helps to regard the library as three separate parts: the low
 486 level interface, and the high level interface, and some utility
 487 functions.
 488
 489 The structure of @code{libbzip2}'s interfaces is similar to
 490 that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib}
 491 library.
 492
 493 All externally visible symbols have names beginning @code{BZ2_}.
 494 This is new in version 1.0.  The intention is to minimise pollution
 495 of the namespaces of library clients.
 496
 497 @subsection Low-level summary
 498
 499 This interface provides services for compressing and decompressing
 500 data in memory.  There's no provision for dealing with files, streams
 501 or any other I/O mechanisms, just straight memory-to-memory work.
 502 In fact, this part of the library can be compiled without inclusion
 503 of @code{stdio.h}, which may be helpful for embedded applications.
 504
 505 The low-level part of the library has no global variables and
 506 is therefore thread-safe.
 507
 508 Six routines make up the low level interface:
 509 @code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, and @* @code{BZ2_bzCompressEnd}
 510 for compression,
 511 and a corresponding trio @code{BZ2_bzDecompressInit}, @* @code{BZ2_bzDecompress}
 512 and @code{BZ2_bzDecompressEnd} for decompression.
 513 The @code{*Init} functions allocate
 514 memory for compression/decompression and do other
 515 initialisations, whilst the @code{*End} functions close down operations
 516 and release memory.
 517
 518 The real work is done by @code{BZ2_bzCompress} and @code{BZ2_bzDecompress}.
 519 These compress and decompress data from a user-supplied input buffer
 520 to a user-supplied output buffer.  These buffers can be any size;
 521 arbitrary quantities of data are handled by making repeated calls
 522 to these functions.  This is a flexible mechanism allowing a
 523 consumer-pull style of activity, or producer-push, or a mixture of
 524 both.
 525
 526
 527
 528 @subsection High-level summary
 529
 530 This interface provides some handy wrappers around the low-level
 531 interface to facilitate reading and writing @code{bzip2} format
 532 files (@code{.bz2} files).  The routines provide hooks to facilitate
 533 reading files in which the @code{bzip2} data stream is embedded
 534 within some larger-scale file structure, or where there are
 535 multiple @code{bzip2} data streams concatenated end-to-end.
 536
 537 For reading files, @code{BZ2_bzReadOpen}, @code{BZ2_bzRead},
 538 @code{BZ2_bzReadClose} and @* @code{BZ2_bzReadGetUnused} are supplied.  For
 539 writing files, @code{BZ2_bzWriteOpen}, @code{BZ2_bzWrite} and
 540 @code{BZ2_bzWriteFinish} are available.
 541
 542 As with the low-level library, no global variables are used
 543 so the library is per se thread-safe.  However, if I/O errors
 544 occur whilst reading or writing the underlying compressed files,
 545 you may have to consult @code{errno} to determine the cause of
 546 the error.  In that case, you'd need a C library which correctly
 547 supports @code{errno} in a multithreaded environment.
 548
 549 To make the library a little simpler and more portable,
 550 @code{BZ2_bzReadOpen} and @code{BZ2_bzWriteOpen} require you to pass them file
 551 handles (@code{FILE*}s) which have previously been opened for reading or
 552 writing respectively.  That avoids portability problems associated with
 553 file operations and file attributes, whilst not being much of an
 554 imposition on the programmer.
 555
 556
 557
 558 @subsection Utility functions summary
 559 For very simple needs, @code{BZ2_bzBuffToBuffCompress} and
 560 @code{BZ2_bzBuffToBuffDecompress} are provided.  These compress
 561 data in memory from one buffer to another buffer in a single
 562 function call.  You should assess whether these functions
 563 fulfill your memory-to-memory compression/decompression
 564 requirements before investing effort in understanding the more
 565 general but more complex low-level interface.
 566
 567 Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} /
 568 @code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to
 569 give better @code{zlib} compatibility.  These functions are
 570 @code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
 571 @code{BZ2_bzclose},
 572 @code{BZ2_bzerror} and @code{BZ2_bzlibVersion}.  You may find these functions
 573 more convenient for simple file reading and writing, than those in the
 574 high-level interface.  These functions are not (yet) officially part of
 575 the library, and are minimally documented here.  If they break, you
 576 get to keep all the pieces.  I hope to document them properly when time
 577 permits.
 578
 579 Yoshioka also contributed modifications to allow the library to be
 580 built as a Windows DLL.
 581
 582
 583 @section Error handling
 584
 585 The library is designed to recover cleanly in all situations, including
 586 the worst-case situation of decompressing random data.  I'm not
 587 100% sure that it can always do this, so you might want to add
 588 a signal handler to catch segmentation violations during decompression
 589 if you are feeling especially paranoid.  I would be interested in
 590 hearing more about the robustness of the library to corrupted
 591 compressed data.
 592
 593 Version 1.0 is much more robust in this respect than
 594 0.9.0 or 0.9.5.  Investigations with Checker (a tool for
 595 detecting problems with memory management, similar to Purify)
 596 indicate that, at least for the few files I tested, all single-bit
 597 errors in the decompressed data are caught properly, with no
 598 segmentation faults, no reads of uninitialised data and no
 599 out of range reads or writes.  So it's certainly much improved,
 600 although I wouldn't claim it to be totally bombproof.
 601
 602 The file @code{bzlib.h} contains all definitions needed to use
 603 the library.  In particular, you should definitely not include
 604 @code{bzlib_private.h}.
 605
 606 In @code{bzlib.h}, the various return values are defined.  The following
 607 list is not intended as an exhaustive description of the circumstances
 608 in which a given value may be returned -- those descriptions are given
 609 later.  Rather, it is intended to convey the rough meaning of each
 610 return value.  The first five actions are normal and not intended to
 611 denote an error situation.
 612 @table @code
 613 @item BZ_OK
 614 The requested action was completed successfully.
 615 @item BZ_RUN_OK
 616 @itemx BZ_FLUSH_OK
 617 @itemx BZ_FINISH_OK
 618 In @code{BZ2_bzCompress}, the requested flush/finish/nothing-special action
 619 was completed successfully.
 620 @item BZ_STREAM_END
 621 Compression of data was completed, or the logical stream end was
 622 detected during decompression.
 623 @end table
 624
 625 The following return values indicate an error of some kind.
 626 @table @code
 627 @item BZ_CONFIG_ERROR
 628 Indicates that the library has been improperly compiled on your
 629 platform -- a major configuration error.  Specifically, it means
 630 that @code{sizeof(char)}, @code{sizeof(short)} and @code{sizeof(int)}
 631 are not 1, 2 and 4 respectively, as they should be.  Note that the
 632 library should still work properly on 64-bit platforms which follow
 633 the LP64 programming model -- that is, where @code{sizeof(long)}
 634 and @code{sizeof(void*)} are 8.  Under LP64, @code{sizeof(int)} is
 635 still 4, so @code{libbzip2}, which doesn't use the @code{long} type,
 636 is OK.
 637 @item BZ_SEQUENCE_ERROR
 638 When using the library, it is important to call the functions in the
 639 correct sequence and with data structures (buffers etc) in the correct
 640 states.  @code{libbzip2} checks as much as it can to ensure this is
 641 happening, and returns @code{BZ_SEQUENCE_ERROR} if not.  Code which
 642 complies precisely with the function semantics, as detailed below,
 643 should never receive this value; such an event denotes buggy code
 644 which you should investigate.
 645 @item BZ_PARAM_ERROR
 646 Returned when a parameter to a function call is out of range
 647 or otherwise manifestly incorrect.  As with @code{BZ_SEQUENCE_ERROR},
 648 this denotes a bug in the client code.  The distinction between
 649 @code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth
 650 making.
 651 @item BZ_MEM_ERROR
 652 Returned when a request to allocate memory failed.  Note that the
 653 quantity of memory needed to decompress a stream cannot be determined
 654 until the stream's header has been read.  So @code{BZ2_bzDecompress} and
 655 @code{BZ2_bzRead} may return @code{BZ_MEM_ERROR} even though some of
 656 the compressed data has been read.  The same is not true for
 657 compression; once @code{BZ2_bzCompressInit} or @code{BZ2_bzWriteOpen} have
 658 successfully completed, @code{BZ_MEM_ERROR} cannot occur.
 659 @item BZ_DATA_ERROR
 660 Returned when a data integrity error is detected during decompression.
 661 Most importantly, this means when stored and computed CRCs for the
 662 data do not match.  This value is also returned upon detection of any
 663 other anomaly in the compressed data.
 664 @item BZ_DATA_ERROR_MAGIC
 665 As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to
 666 know when the compressed stream does not start with the correct
 667 magic bytes (@code{'B' 'Z' 'h'}).
 668 @item BZ_IO_ERROR
 669 Returned by @code{BZ2_bzRead} and @code{BZ2_bzWrite} when there is an error
 670 reading or writing in the compressed file, and by @code{BZ2_bzReadOpen}
 671 and @code{BZ2_bzWriteOpen} for attempts to use a file for which the
 672 error indicator (viz, @code{ferror(f)}) is set.
 673 On receipt of @code{BZ_IO_ERROR}, the caller should consult
 674 @code{errno} and/or @code{perror} to acquire operating-system
 675 specific information about the problem.
 676 @item BZ_UNEXPECTED_EOF
 677 Returned by @code{BZ2_bzRead} when the compressed file finishes
 678 before the logical end of stream is detected.
 679 @item BZ_OUTBUFF_FULL
 680 Returned by @code{BZ2_bzBuffToBuffCompress} and
 681 @code{BZ2_bzBuffToBuffDecompress} to indicate that the output data
 682 will not fit into the output buffer provided.
 683 @end table
 684
 685
 686
 687 @section Low-level interface
 688
 689 @subsection @code{BZ2_bzCompressInit}
 690 @example
 691 typedef
 692    struct @{
 693       char *next_in;
 694       unsigned int avail_in;
 695       unsigned int total_in_lo32;
 696       unsigned int total_in_hi32;
 697
 698       char *next_out;
 699       unsigned int avail_out;
 700       unsigned int total_out_lo32;
 701       unsigned int total_out_hi32;
 702
 703       void *state;
 704
 705       void *(*bzalloc)(void *,int,int);
 706       void (*bzfree)(void *,void *);
 707       void *opaque;
 708    @}
 709    bz_stream;
 710
 711 int BZ2_bzCompressInit ( bz_stream *strm,
 712                          int blockSize100k,
 713                          int verbosity,
 714                          int workFactor );
 715
 716 @end example
 717
 718 Prepares for compression.  The @code{bz_stream} structure
 719 holds all data pertaining to the compression activity.
 720 A @code{bz_stream} structure should be allocated and initialised
 721 prior to the call.
 722 The fields of @code{bz_stream}
 723 comprise the entirety of the user-visible data.  @code{state}
 724 is a pointer to the private data structures required for compression.
 725
 726 Custom memory allocators are supported, via fields @code{bzalloc},
 727 @code{bzfree},
 728 and @code{opaque}.  The value
 729 @code{opaque} is passed to as the first argument to
 730 all calls to @code{bzalloc} and @code{bzfree}, but is
 731 otherwise ignored by the library.
 732 The call @code{bzalloc ( opaque, n, m )} is expected to return a
 733 pointer @code{p} to
 734 @code{n * m} bytes of memory, and @code{bzfree ( opaque, p )}
 735 should free
 736 that memory.
 737
 738 If you don't want to use a custom memory allocator, set @code{bzalloc},
 739 @code{bzfree} and
 740 @code{opaque} to @code{NULL},
 741 and the library will then use the standard @code{malloc}/@code{free}
 742 routines.
 743
 744 Before calling @code{BZ2_bzCompressInit}, fields @code{bzalloc},
 745 @code{bzfree} and @code{opaque} should
 746 be filled appropriately, as just described.  Upon return, the internal
 747 state will have been allocated and initialised, and @code{total_in_lo32},
 748 @code{total_in_hi32}, @code{total_out_lo32} and
 749 @code{total_out_hi32} will have been set to zero.
 750 These four fields are used by the library
 751 to inform the caller of the total amount of data passed into and out of
 752 the library, respectively.  You should not try to change them.
 753 As of version 1.0, 64-bit counts are maintained, even on 32-bit
 754 platforms, using the @code{_hi32} fields to store the upper 32 bits
 755 of the count.  So, for example, the total amount of data in
 756 is @code{(total_in_hi32 << 32) + total_in_lo32}.
 757
 758 Parameter @code{blockSize100k} specifies the block size to be used for
 759 compression.  It should be a value between 1 and 9 inclusive, and the
 760 actual block size used is 100000 x this figure.  9 gives the best
 761 compression but takes most memory.
 762
 763 Parameter @code{verbosity} should be set to a number between 0 and 4
 764 inclusive.  0 is silent, and greater numbers give increasingly verbose
 765 monitoring/debugging output.  If the library has been compiled with
 766 @code{-DBZ_NO_STDIO}, no such output will appear for any verbosity
 767 setting.
 768
 769 Parameter @code{workFactor} controls how the compression phase behaves
 770 when presented with worst case, highly repetitive, input data.  If
 771 compression runs into difficulties caused by repetitive data, the
 772 library switches from the standard sorting algorithm to a fallback
 773 algorithm.  The fallback is slower than the standard algorithm by
 774 perhaps a factor of three, but always behaves reasonably, no matter how
 775 bad the input.
 776
 777 Lower values of @code{workFactor} reduce the amount of effort the
 778 standard algorithm will expend before resorting to the fallback.  You
 779 should set this parameter carefully; too low, and many inputs will be
 780 handled by the fallback algorithm and so compress rather slowly, too
 781 high, and your average-to-worst case compression times can become very
 782 large.  The default value of 30 gives reasonable behaviour over a wide
 783 range of circumstances.
 784
 785 Allowable values range from 0 to 250 inclusive.  0 is a special case,
 786 equivalent to using the default value of 30.
 787
 788 Note that the compressed output generated is the same regardless of
 789 whether or not the fallback algorithm is used.
 790
 791 Be aware also that this parameter may disappear entirely in future
 792 versions of the library.  In principle it should be possible to devise a
 793 good way to automatically choose which algorithm to use.  Such a
 794 mechanism would render the parameter obsolete.
 795
 796 Possible return values:
 797 @display
 798       @code{BZ_CONFIG_ERROR}
 799          if the library has been mis-compiled
 800       @code{BZ_PARAM_ERROR}
 801          if @code{strm} is @code{NULL}
 802          or @code{blockSize} < 1 or @code{blockSize} > 9
 803          or @code{verbosity} < 0 or @code{verbosity} > 4
 804          or @code{workFactor} < 0 or @code{workFactor} > 250
 805       @code{BZ_MEM_ERROR}
 806          if not enough memory is available
 807       @code{BZ_OK}
 808          otherwise
 809 @end display
 810 Allowable next actions:
 811 @display
 812       @code{BZ2_bzCompress}
 813          if @code{BZ_OK} is returned
 814       no specific action needed in case of error
 815 @end display
 816
 817 @subsection @code{BZ2_bzCompress}
 818 @example
 819    int BZ2_bzCompress ( bz_stream *strm, int action );
 820 @end example
 821 Provides more input and/or output buffer space for the library.  The
 822 caller maintains input and output buffers, and calls @code{BZ2_bzCompress} to
 823 transfer data between them.
 824
 825 Before each call to @code{BZ2_bzCompress}, @code{next_in} should point at
 826 the data to be compressed, and @code{avail_in} should indicate how many
 827 bytes the library may read.  @code{BZ2_bzCompress} updates @code{next_in},
 828 @code{avail_in} and @code{total_in} to reflect the number of bytes it
 829 has read.
 830
 831 Similarly, @code{next_out} should point to a buffer in which the
 832 compressed data is to be placed, with @code{avail_out} indicating how
 833 much output space is available.  @code{BZ2_bzCompress} updates
 834 @code{next_out}, @code{avail_out} and @code{total_out} to reflect the
 835 number of bytes output.
 836
 837 You may provide and remove as little or as much data as you like on each
 838 call of @code{BZ2_bzCompress}.  In the limit, it is acceptable to supply and
 839 remove data one byte at a time, although this would be terribly
 840 inefficient.  You should always ensure that at least one byte of output
 841 space is available at each call.
 842
 843 A second purpose of @code{BZ2_bzCompress} is to request a change of mode of the
 844 compressed stream.
 845
 846 Conceptually, a compressed stream can be in one of four states: IDLE,
 847 RUNNING, FLUSHING and FINISHING.  Before initialisation
 848 (@code{BZ2_bzCompressInit}) and after termination (@code{BZ2_bzCompressEnd}), a
 849 stream is regarded as IDLE.
 850
 851 Upon initialisation (@code{BZ2_bzCompressInit}), the stream is placed in the
 852 RUNNING state.  Subsequent calls to @code{BZ2_bzCompress} should pass
 853 @code{BZ_RUN} as the requested action; other actions are illegal and
 854 will result in @code{BZ_SEQUENCE_ERROR}.
 855
 856 At some point, the calling program will have provided all the input data
 857 it wants to.  It will then want to finish up -- in effect, asking the
 858 library to process any data it might have buffered internally.  In this
 859 state, @code{BZ2_bzCompress} will no longer attempt to read data from
 860 @code{next_in}, but it will want to write data to @code{next_out}.
 861 Because the output buffer supplied by the user can be arbitrarily small,
 862 the finishing-up operation cannot necessarily be done with a single call
 863 of @code{BZ2_bzCompress}.
 864
 865 Instead, the calling program passes @code{BZ_FINISH} as an action to
 866 @code{BZ2_bzCompress}.  This changes the stream's state to FINISHING.  Any
 867 remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and
 868 transferred to the output buffer.  To do this, @code{BZ2_bzCompress} must be
 869 called repeatedly until all the output has been consumed.  At that
 870 point, @code{BZ2_bzCompress} returns @code{BZ_STREAM_END}, and the stream's
 871 state is set back to IDLE.  @code{BZ2_bzCompressEnd} should then be
 872 called.
 873
 874 Just to make sure the calling program does not cheat, the library makes
 875 a note of @code{avail_in} at the time of the first call to
 876 @code{BZ2_bzCompress} which has @code{BZ_FINISH} as an action (ie, at the
 877 time the program has announced its intention to not supply any more
 878 input).  By comparing this value with that of @code{avail_in} over
 879 subsequent calls to @code{BZ2_bzCompress}, the library can detect any
 880 attempts to slip in more data to compress.  Any calls for which this is
 881 detected will return @code{BZ_SEQUENCE_ERROR}.  This indicates a
 882 programming mistake which should be corrected.
 883
 884 Instead of asking to finish, the calling program may ask
 885 @code{BZ2_bzCompress} to take all the remaining input, compress it and
 886 terminate the current (Burrows-Wheeler) compression block.  This could
 887 be useful for error control purposes.  The mechanism is analogous to
 888 that for finishing: call @code{BZ2_bzCompress} with an action of
 889 @code{BZ_FLUSH}, remove output data, and persist with the
 890 @code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned.  As
 891 with finishing, @code{BZ2_bzCompress} detects any attempt to provide more
 892 input data once the flush has begun.
 893
 894 Once the flush is complete, the stream returns to the normal RUNNING
 895 state.
 896
 897 This all sounds pretty complex, but isn't really.  Here's a table
 898 which shows which actions are allowable in each state, what action
 899 will be taken, what the next state is, and what the non-error return
 900 values are.  Note that you can't explicitly ask what state the
 901 stream is in, but nor do you need to -- it can be inferred from the
 902 values returned by @code{BZ2_bzCompress}.
 903 @display
 904 IDLE/@code{any}
 905       Illegal.  IDLE state only exists after @code{BZ2_bzCompressEnd} or
 906       before @code{BZ2_bzCompressInit}.
 907       Return value = @code{BZ_SEQUENCE_ERROR}
 908
 909 RUNNING/@code{BZ_RUN}
 910       Compress from @code{next_in} to @code{next_out} as much as possible.
 911       Next state = RUNNING
 912       Return value = @code{BZ_RUN_OK}
 913
 914 RUNNING/@code{BZ_FLUSH}
 915       Remember current value of @code{next_in}.  Compress from @code{next_in}
 916       to @code{next_out} as much as possible, but do not accept any more input.
 917       Next state = FLUSHING
 918       Return value = @code{BZ_FLUSH_OK}
 919
 920 RUNNING/@code{BZ_FINISH}
 921       Remember current value of @code{next_in}.  Compress from @code{next_in}
 922       to @code{next_out} as much as possible, but do not accept any more input.
 923       Next state = FINISHING
 924       Return value = @code{BZ_FINISH_OK}
 925
 926 FLUSHING/@code{BZ_FLUSH}
 927       Compress from @code{next_in} to @code{next_out} as much as possible,
 928       but do not accept any more input.
 929       If all the existing input has been used up and all compressed
 930       output has been removed
 931          Next state = RUNNING; Return value = @code{BZ_RUN_OK}
 932       else
 933          Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK}
 934
 935 FLUSHING/other
 936       Illegal.
 937       Return value = @code{BZ_SEQUENCE_ERROR}
 938
 939 FINISHING/@code{BZ_FINISH}
 940       Compress from @code{next_in} to @code{next_out} as much as possible,
 941       but to not accept any more input.
 942       If all the existing input has been used up and all compressed
 943       output has been removed
 944          Next state = IDLE; Return value = @code{BZ_STREAM_END}
 945       else
 946          Next state = FINISHING; Return value = @code{BZ_FINISHING}
 947
 948 FINISHING/other
 949       Illegal.
 950       Return value = @code{BZ_SEQUENCE_ERROR}
 951 @end display
 952
 953 That still looks complicated?  Well, fair enough.  The usual sequence
 954 of calls for compressing a load of data is:
 955 @itemize @bullet
 956 @item Get started with @code{BZ2_bzCompressInit}.
 957 @item Shovel data in and shlurp out its compressed form using zero or more
 958 calls of @code{BZ2_bzCompress} with action = @code{BZ_RUN}.
 959 @item Finish up.
 960 Repeatedly call @code{BZ2_bzCompress} with action = @code{BZ_FINISH},
 961 copying out the compressed output, until @code{BZ_STREAM_END} is returned.
 962 @item Close up and go home.  Call @code{BZ2_bzCompressEnd}.
 963 @end itemize
 964 If the data you want to compress fits into your input buffer all
 965 at once, you can skip the calls of @code{BZ2_bzCompress ( ..., BZ_RUN )} and
 966 just do the @code{BZ2_bzCompress ( ..., BZ_FINISH )} calls.
 967
 968 All required memory is allocated by @code{BZ2_bzCompressInit}.  The
 969 compression library can accept any data at all (obviously).  So you
 970 shouldn't get any error return values from the @code{BZ2_bzCompress} calls.
 971 If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in
 972 your programming.
 973
 974 Trivial other possible return values:
 975 @display
 976       @code{BZ_PARAM_ERROR}
 977          if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL}
 978 @end display
 979
 980 @subsection @code{BZ2_bzCompressEnd}
 981 @example
 982 int BZ2_bzCompressEnd ( bz_stream *strm );
 983 @end example
 984 Releases all memory associated with a compression stream.
 985
 986 Possible return values:
 987 @display
 988    @code{BZ_PARAM_ERROR}    if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
 989    @code{BZ_OK}    otherwise
 990 @end display
 991
 992
 993 @subsection @code{BZ2_bzDecompressInit}
 994 @example
 995 int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
 996 @end example
 997 Prepares for decompression.  As with @code{BZ2_bzCompressInit}, a
 998 @code{bz_stream} record should be allocated and initialised before the
 999 call.  Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be
1000 set if a custom memory allocator is required, or made @code{NULL} for
1001 the normal @code{malloc}/@code{free} routines.  Upon return, the internal
1002 state will have been initialised, and @code{total_in} and
1003 @code{total_out} will be zero.
1004
1005 For the meaning of parameter @code{verbosity}, see @code{BZ2_bzCompressInit}.
1006
1007 If @code{small} is nonzero, the library will use an alternative
1008 decompression algorithm which uses less memory but at the cost of
1009 decompressing more slowly (roughly speaking, half the speed, but the
1010 maximum memory requirement drops to around 2300k).  See Chapter 2 for
1011 more information on memory management.
1012
1013 Note that the amount of memory needed to decompress
1014 a stream cannot be determined until the stream's header has been read,
1015 so even if @code{BZ2_bzDecompressInit} succeeds, a subsequent
1016 @code{BZ2_bzDecompress} could fail with @code{BZ_MEM_ERROR}.
1017
1018 Possible return values:
1019 @display
1020       @code{BZ_CONFIG_ERROR}
1021          if the library has been mis-compiled
1022       @code{BZ_PARAM_ERROR}
1023          if @code{(small != 0 && small != 1)}
1024          or @code{(verbosity < 0 || verbosity > 4)}
1025       @code{BZ_MEM_ERROR}
1026          if insufficient memory is available
1027 @end display
1028
1029 Allowable next actions:
1030 @display
1031       @code{BZ2_bzDecompress}
1032          if @code{BZ_OK} was returned
1033       no specific action required in case of error
1034 @end display
1035
1036
1037
1038 @subsection @code{BZ2_bzDecompress}
1039 @example
1040 int BZ2_bzDecompress ( bz_stream *strm );
1041 @end example
1042 Provides more input and/out output buffer space for the library.  The
1043 caller maintains input and output buffers, and uses @code{BZ2_bzDecompress}
1044 to transfer data between them.
1045
1046 Before each call to @code{BZ2_bzDecompress}, @code{next_in}
1047 should point at the compressed data,
1048 and @code{avail_in} should indicate how many bytes the library
1049 may read.  @code{BZ2_bzDecompress} updates @code{next_in}, @code{avail_in}
1050 and @code{total_in}
1051 to reflect the number of bytes it has read.
1052
1053 Similarly, @code{next_out} should point to a buffer in which the uncompressed
1054 output is to be placed, with @code{avail_out} indicating how much output space
1055 is available.  @code{BZ2_bzCompress} updates @code{next_out},
1056 @code{avail_out} and @code{total_out} to reflect
1057 the number of bytes output.
1058
1059 You may provide and remove as little or as much data as you like on
1060 each call of @code{BZ2_bzDecompress}.
1061 In the limit, it is acceptable to
1062 supply and remove data one byte at a time, although this would be
1063 terribly inefficient.  You should always ensure that at least one
1064 byte of output space is available at each call.
1065
1066 Use of @code{BZ2_bzDecompress} is simpler than @code{BZ2_bzCompress}.
1067
1068 You should provide input and remove output as described above, and
1069 repeatedly call @code{BZ2_bzDecompress} until @code{BZ_STREAM_END} is
1070 returned.  Appearance of @code{BZ_STREAM_END} denotes that
1071 @code{BZ2_bzDecompress} has detected the logical end of the compressed
1072 stream.  @code{BZ2_bzDecompress} will not produce @code{BZ_STREAM_END} until
1073 all output data has been placed into the output buffer, so once
1074 @code{BZ_STREAM_END} appears, you are guaranteed to have available all
1075 the decompressed output, and @code{BZ2_bzDecompressEnd} can safely be
1076 called.
1077
1078 If case of an error return value, you should call @code{BZ2_bzDecompressEnd}
1079 to clean up and release memory.
1080
1081 Possible return values:
1082 @display
1083       @code{BZ_PARAM_ERROR}
1084          if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
1085          or @code{strm->avail_out < 1}
1086       @code{BZ_DATA_ERROR}
1087          if a data integrity error is detected in the compressed stream
1088       @code{BZ_DATA_ERROR_MAGIC}
1089          if the compressed stream doesn't begin with the right magic bytes
1090       @code{BZ_MEM_ERROR}
1091          if there wasn't enough memory available
1092       @code{BZ_STREAM_END}
1093          if the logical end of the data stream was detected and all
1094          output in has been consumed, eg @code{s->avail_out > 0}
1095       @code{BZ_OK}
1096          otherwise
1097 @end display
1098 Allowable next actions:
1099 @display
1100       @code{BZ2_bzDecompress}
1101          if @code{BZ_OK} was returned
1102       @code{BZ2_bzDecompressEnd}
1103          otherwise
1104 @end display
1105
1106
1107 @subsection @code{BZ2_bzDecompressEnd}
1108 @example
1109 int BZ2_bzDecompressEnd ( bz_stream *strm );
1110 @end example
1111 Releases all memory associated with a decompression stream.
1112
1113 Possible return values:
1114 @display
1115       @code{BZ_PARAM_ERROR}
1116          if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
1117       @code{BZ_OK}
1118          otherwise
1119 @end display
1120
1121 Allowable next actions:
1122 @display
1123       None.
1124 @end display
1125
1126
1127 @section High-level interface
1128
1129 This interface provides functions for reading and writing
1130 @code{bzip2} format files.  First, some general points.
1131
1132 @itemize @bullet
1133 @item All of the functions take an @code{int*} first argument,
1134   @code{bzerror}.
1135   After each call, @code{bzerror} should be consulted first to determine
1136   the outcome of the call.  If @code{bzerror} is @code{BZ_OK},
1137   the call completed
1138   successfully, and only then should the return value of the function
1139   (if any) be consulted.  If @code{bzerror} is @code{BZ_IO_ERROR},
1140   there was an error
1141   reading/writing the underlying compressed file, and you should
1142   then consult @code{errno}/@code{perror} to determine the
1143   cause of the difficulty.
1144   @code{bzerror} may also be set to various other values; precise details are
1145   given on a per-function basis below.
1146 @item If @code{bzerror} indicates an error
1147   (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}),
1148   you should immediately call @code{BZ2_bzReadClose} (or @code{BZ2_bzWriteClose},
1149   depending on whether you are attempting to read or to write)
1150   to free up all resources associated
1151   with the stream.  Once an error has been indicated, behaviour of all calls
1152   except @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) is undefined.
1153   The implication is that (1) @code{bzerror} should
1154   be checked after each call, and (2) if @code{bzerror} indicates an error,
1155   @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) should then be called to clean up.
1156 @item The @code{FILE*} arguments passed to
1157    @code{BZ2_bzReadOpen}/@code{BZ2_bzWriteOpen}
1158   should be set to binary mode.
1159   Most Unix systems will do this by default, but other platforms,
1160   including Windows and Mac, will not.  If you omit this, you may
1161   encounter problems when moving code to new platforms.
1162 @item Memory allocation requests are handled by
1163   @code{malloc}/@code{free}.
1164   At present
1165   there is no facility for user-defined memory allocators in the file I/O
1166   functions (could easily be added, though).
1167 @end itemize
1168
1169
1170
1171 @subsection @code{BZ2_bzReadOpen}
1172 @example
1173    typedef void BZFILE;
1174
1175    BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
1176                             int small, int verbosity,
1177                             void *unused, int nUnused );
1178 @end example
1179 Prepare to read compressed data from file handle @code{f}.  @code{f}
1180 should refer to a file which has been opened for reading, and for which
1181 the error indicator (@code{ferror(f)})is not set.  If @code{small} is 1,
1182 the library will try to decompress using less memory, at the expense of
1183 speed.
1184
1185 For reasons explained below, @code{BZ2_bzRead} will decompress the
1186 @code{nUnused} bytes starting at @code{unused}, before starting to read
1187 from the file @code{f}.  At most @code{BZ_MAX_UNUSED} bytes may be
1188 supplied like this.  If this facility is not required, you should pass
1189 @code{NULL} and @code{0} for @code{unused} and n@code{Unused}
1190 respectively.
1191
1192 For the meaning of parameters @code{small} and @code{verbosity},
1193 see @code{BZ2_bzDecompressInit}.
1194
1195 The amount of memory needed to decompress a file cannot be determined
1196 until the file's header has been read.  So it is possible that
1197 @code{BZ2_bzReadOpen} returns @code{BZ_OK} but a subsequent call of
1198 @code{BZ2_bzRead} will return @code{BZ_MEM_ERROR}.
1199
1200 Possible assignments to @code{bzerror}:
1201 @display
1202       @code{BZ_CONFIG_ERROR}
1203          if the library has been mis-compiled
1204       @code{BZ_PARAM_ERROR}
1205          if @code{f} is @code{NULL}
1206          or @code{small} is neither @code{0} nor @code{1}
1207          or @code{(unused == NULL && nUnused != 0)}
1208          or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))}
1209       @code{BZ_IO_ERROR}
1210          if @code{ferror(f)} is nonzero
1211       @code{BZ_MEM_ERROR}
1212          if insufficient memory is available
1213       @code{BZ_OK}
1214          otherwise.
1215 @end display
1216
1217 Possible return values:
1218 @display
1219       Pointer to an abstract @code{BZFILE}
1220          if @code{bzerror} is @code{BZ_OK}
1221       @code{NULL}
1222          otherwise
1223 @end display
1224
1225 Allowable next actions:
1226 @display
1227       @code{BZ2_bzRead}
1228          if @code{bzerror} is @code{BZ_OK}
1229       @code{BZ2_bzClose}
1230          otherwise
1231 @end display
1232
1233
1234 @subsection @code{BZ2_bzRead}
1235 @example
1236    int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
1237 @end example
1238 Reads up to @code{len} (uncompressed) bytes from the compressed file
1239 @code{b} into
1240 the buffer @code{buf}.  If the read was successful,
1241 @code{bzerror} is set to @code{BZ_OK}
1242 and the number of bytes read is returned.  If the logical end-of-stream
1243 was detected, @code{bzerror} will be set to @code{BZ_STREAM_END},
1244 and the number
1245 of bytes read is returned.  All other @code{bzerror} values denote an error.
1246
1247 @code{BZ2_bzRead} will supply @code{len} bytes,
1248 unless the logical stream end is detected
1249 or an error occurs.  Because of this, it is possible to detect the
1250 stream end by observing when the number of bytes returned is
1251 less than the number
1252 requested.  Nevertheless, this is regarded as inadvisable; you should
1253 instead check @code{bzerror} after every call and watch out for
1254 @code{BZ_STREAM_END}.
1255
1256 Internally, @code{BZ2_bzRead} copies data from the compressed file in chunks
1257 of size @code{BZ_MAX_UNUSED} bytes
1258 before decompressing it.  If the file contains more bytes than strictly
1259 needed to reach the logical end-of-stream, @code{BZ2_bzRead} will almost certainly
1260 read some of the trailing data before signalling @code{BZ_SEQUENCE_END}.
1261 To collect the read but unused data once @code{BZ_SEQUENCE_END} has
1262 appeared, call @code{BZ2_bzReadGetUnused} immediately before @code{BZ2_bzReadClose}.
1263
1264 Possible assignments to @code{bzerror}:
1265 @display
1266       @code{BZ_PARAM_ERROR}
1267          if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
1268       @code{BZ_SEQUENCE_ERROR}
1269          if @code{b} was opened with @code{BZ2_bzWriteOpen}
1270       @code{BZ_IO_ERROR}
1271          if there is an error reading from the compressed file
1272       @code{BZ_UNEXPECTED_EOF}
1273          if the compressed file ended before the logical end-of-stream was detected
1274       @code{BZ_DATA_ERROR}
1275          if a data integrity error was detected in the compressed stream
1276       @code{BZ_DATA_ERROR_MAGIC}
1277          if the stream does not begin with the requisite header bytes (ie, is not
1278          a @code{bzip2} data file).  This is really a special case of @code{BZ_DATA_ERROR}.
1279       @code{BZ_MEM_ERROR}
1280          if insufficient memory was available
1281       @code{BZ_STREAM_END}
1282          if the logical end of stream was detected.
1283       @code{BZ_OK}
1284          otherwise.
1285 @end display
1286
1287 Possible return values:
1288 @display
1289       number of bytes read
1290          if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END}
1291       undefined
1292          otherwise
1293 @end display
1294
1295 Allowable next actions:
1296 @display
1297       collect data from @code{buf}, then @code{BZ2_bzRead} or @code{BZ2_bzReadClose}
1298          if @code{bzerror} is @code{BZ_OK}
1299       collect data from @code{buf}, then @code{BZ2_bzReadClose} or @code{BZ2_bzReadGetUnused}
1300          if @code{bzerror} is @code{BZ_SEQUENCE_END}
1301       @code{BZ2_bzReadClose}
1302          otherwise
1303 @end display
1304
1305
1306
1307 @subsection @code{BZ2_bzReadGetUnused}
1308 @example
1309    void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
1310                               void** unused, int* nUnused );
1311 @end example
1312 Returns data which was read from the compressed file but was not needed
1313 to get to the logical end-of-stream.  @code{*unused} is set to the address
1314 of the data, and @code{*nUnused} to the number of bytes.  @code{*nUnused} will
1315 be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive.
1316
1317 This function may only be called once @code{BZ2_bzRead} has signalled
1318 @code{BZ_STREAM_END} but before @code{BZ2_bzReadClose}.
1319
1320 Possible assignments to @code{bzerror}:
1321 @display
1322       @code{BZ_PARAM_ERROR}
1323          if @code{b} is @code{NULL}
1324          or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL}
1325       @code{BZ_SEQUENCE_ERROR}
1326          if @code{BZ_STREAM_END} has not been signalled
1327          or if @code{b} was opened with @code{BZ2_bzWriteOpen}
1328      @code{BZ_OK}
1329          otherwise
1330 @end display
1331
1332 Allowable next actions:
1333 @display
1334       @code{BZ2_bzReadClose}
1335 @end display
1336
1337
1338 @subsection @code{BZ2_bzReadClose}
1339 @example
1340    void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
1341 @end example
1342 Releases all memory pertaining to the compressed file @code{b}.
1343 @code{BZ2_bzReadClose} does not call @code{fclose} on the underlying file
1344 handle, so you should do that yourself if appropriate.
1345 @code{BZ2_bzReadClose} should be called to clean up after all error
1346 situations.
1347
1348 Possible assignments to @code{bzerror}:
1349 @display
1350       @code{BZ_SEQUENCE_ERROR}
1351          if @code{b} was opened with @code{BZ2_bzOpenWrite}
1352       @code{BZ_OK}
1353          otherwise
1354 @end display
1355
1356 Allowable next actions:
1357 @display
1358       none
1359 @end display
1360
1361
1362
1363 @subsection @code{BZ2_bzWriteOpen}
1364 @example
1365    BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
1366                              int blockSize100k, int verbosity,
1367                              int workFactor );
1368 @end example
1369 Prepare to write compressed data to file handle @code{f}.
1370 @code{f} should refer to
1371 a file which has been opened for writing, and for which the error
1372 indicator (@code{ferror(f)})is not set.
1373
1374 For the meaning of parameters @code{blockSize100k},
1375 @code{verbosity} and @code{workFactor}, see
1376 @* @code{BZ2_bzCompressInit}.
1377
1378 All required memory is allocated at this stage, so if the call
1379 completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a
1380 subsequent call to @code{BZ2_bzWrite}.
1381
1382 Possible assignments to @code{bzerror}:
1383 @display
1384       @code{BZ_CONFIG_ERROR}
1385          if the library has been mis-compiled
1386       @code{BZ_PARAM_ERROR}
1387          if @code{f} is @code{NULL}
1388          or @code{blockSize100k < 1} or @code{blockSize100k > 9}
1389       @code{BZ_IO_ERROR}
1390          if @code{ferror(f)} is nonzero
1391       @code{BZ_MEM_ERROR}
1392          if insufficient memory is available
1393       @code{BZ_OK}
1394          otherwise
1395 @end display
1396
1397 Possible return values:
1398 @display
1399       Pointer to an abstract @code{BZFILE}
1400          if @code{bzerror} is @code{BZ_OK}
1401       @code{NULL}
1402          otherwise
1403 @end display
1404
1405 Allowable next actions:
1406 @display
1407       @code{BZ2_bzWrite}
1408          if @code{bzerror} is @code{BZ_OK}
1409          (you could go directly to @code{BZ2_bzWriteClose}, but this would be pretty pointless)
1410       @code{BZ2_bzWriteClose}
1411          otherwise
1412 @end display
1413
1414
1415
1416 @subsection @code{BZ2_bzWrite}
1417 @example
1418    void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
1419 @end example
1420 Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be
1421 compressed and written to the file.
1422
1423 Possible assignments to @code{bzerror}:
1424 @display
1425       @code{BZ_PARAM_ERROR}
1426          if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
1427       @code{BZ_SEQUENCE_ERROR}
1428          if b was opened with @code{BZ2_bzReadOpen}
1429       @code{BZ_IO_ERROR}
1430          if there is an error writing the compressed file.
1431       @code{BZ_OK}
1432          otherwise
1433 @end display
1434
1435
1436
1437
1438 @subsection @code{BZ2_bzWriteClose}
1439 @example
1440    void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
1441                            int abandon,
1442                            unsigned int* nbytes_in,
1443                            unsigned int* nbytes_out );
1444
1445    void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
1446                              int abandon,
1447                              unsigned int* nbytes_in_lo32,
1448                              unsigned int* nbytes_in_hi32,
1449                              unsigned int* nbytes_out_lo32,
1450                              unsigned int* nbytes_out_hi32 );
1451 @end example
1452
1453 Compresses and flushes to the compressed file all data so far supplied
1454 by @code{BZ2_bzWrite}.  The logical end-of-stream markers are also written, so
1455 subsequent calls to @code{BZ2_bzWrite} are illegal.  All memory associated
1456 with the compressed file @code{b} is released.
1457 @code{fflush} is called on the
1458 compressed file, but it is not @code{fclose}'d.
1459
1460 If @code{BZ2_bzWriteClose} is called to clean up after an error, the only
1461 action is to release the memory.  The library records the error codes
1462 issued by previous calls, so this situation will be detected
1463 automatically.  There is no attempt to complete the compression
1464 operation, nor to @code{fflush} the compressed file.  You can force this
1465 behaviour to happen even in the case of no error, by passing a nonzero
1466 value to @code{abandon}.
1467
1468 If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the
1469 total volume of uncompressed data handled.  Similarly, @code{nbytes_out}
1470 will be set to the total volume of compressed data written.  For
1471 compatibility with older versions of the library, @code{BZ2_bzWriteClose}
1472 only yields the lower 32 bits of these counts.  Use
1473 @code{BZ2_bzWriteClose64} if you want the full 64 bit counts.  These
1474 two functions are otherwise absolutely identical.
1475
1476
1477 Possible assignments to @code{bzerror}:
1478 @display
1479       @code{BZ_SEQUENCE_ERROR}
1480          if @code{b} was opened with @code{BZ2_bzReadOpen}
1481       @code{BZ_IO_ERROR}
1482          if there is an error writing the compressed file
1483       @code{BZ_OK}
1484          otherwise
1485 @end display
1486
1487 @subsection Handling embedded compressed data streams
1488
1489 The high-level library facilitates use of
1490 @code{bzip2} data streams which form some part of a surrounding, larger
1491 data stream.
1492 @itemize @bullet
1493 @item For writing, the library takes an open file handle, writes
1494 compressed data to it, @code{fflush}es it but does not @code{fclose} it.
1495 The calling application can write its own data before and after the
1496 compressed data stream, using that same file handle.
1497 @item Reading is more complex, and the facilities are not as general
1498 as they could be since generality is hard to reconcile with efficiency.
1499 @code{BZ2_bzRead} reads from the compressed file in blocks of size
1500 @code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot
1501 the logical end of compressed stream.
1502 To recover this data once decompression has
1503 ended, call @code{BZ2_bzReadGetUnused} after the last call of @code{BZ2_bzRead}
1504 (the one returning @code{BZ_STREAM_END}) but before calling
1505 @code{BZ2_bzReadClose}.
1506 @end itemize
1507
1508 This mechanism makes it easy to decompress multiple @code{bzip2}
1509 streams placed end-to-end.  As the end of one stream, when @code{BZ2_bzRead}
1510 returns @code{BZ_STREAM_END}, call @code{BZ2_bzReadGetUnused} to collect the
1511 unused data (copy it into your own buffer somewhere).
1512 That data forms the start of the next compressed stream.
1513 To start uncompressing that next stream, call @code{BZ2_bzReadOpen} again,
1514 feeding in the unused data via the @code{unused}/@code{nUnused}
1515 parameters.
1516 Keep doing this until @code{BZ_STREAM_END} return coincides with the
1517 physical end of file (@code{feof(f)}).  In this situation
1518 @code{BZ2_bzReadGetUnused}
1519 will of course return no data.
1520
1521 This should give some feel for how the high-level interface can be used.
1522 If you require extra flexibility, you'll have to bite the bullet and get
1523 to grips with the low-level interface.
1524
1525 @subsection Standard file-reading/writing code
1526 Here's how you'd write data to a compressed file:
1527 @example @code
1528 FILE*   f;
1529 BZFILE* b;
1530 int     nBuf;
1531 char    buf[ /* whatever size you like */ ];
1532 int     bzerror;
1533 int     nWritten;
1534
1535 f = fopen ( "myfile.bz2", "w" );
1536 if (!f) @{
1537    /* handle error */
1538 @}
1539 b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
1540 if (bzerror != BZ_OK) @{
1541    BZ2_bzWriteClose ( b );
1542    /* handle error */
1543 @}
1544
1545 while ( /* condition */ ) @{
1546    /* get data to write into buf, and set nBuf appropriately */
1547    nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
1548    if (bzerror == BZ_IO_ERROR) @{
1549       BZ2_bzWriteClose ( &bzerror, b );
1550       /* handle error */
1551    @}
1552 @}
1553
1554 BZ2_bzWriteClose ( &bzerror, b );
1555 if (bzerror == BZ_IO_ERROR) @{
1556    /* handle error */
1557 @}
1558 @end example
1559 And to read from a compressed file:
1560 @example
1561 FILE*   f;
1562 BZFILE* b;
1563 int     nBuf;
1564 char    buf[ /* whatever size you like */ ];
1565 int     bzerror;
1566 int     nWritten;
1567
1568 f = fopen ( "myfile.bz2", "r" );
1569 if (!f) @{
1570    /* handle error */
1571 @}
1572 b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
1573 if (bzerror != BZ_OK) @{
1574    BZ2_bzReadClose ( &bzerror, b );
1575    /* handle error */
1576 @}
1577
1578 bzerror = BZ_OK;
1579 while (bzerror == BZ_OK && /* arbitrary other conditions */) @{
1580    nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
1581    if (bzerror == BZ_OK) @{
1582       /* do something with buf[0 .. nBuf-1] */
1583    @}
1584 @}
1585 if (bzerror != BZ_STREAM_END) @{
1586    BZ2_bzReadClose ( &bzerror, b );
1587    /* handle error */
1588 @} else @{
1589    BZ2_bzReadClose ( &bzerror );
1590 @}
1591 @end example
1592
1593
1594
1595 @section Utility functions
1596 @subsection @code{BZ2_bzBuffToBuffCompress}
1597 @example
1598    int BZ2_bzBuffToBuffCompress( char*         dest,
1599                                  unsigned int* destLen,
1600                                  char*         source,
1601                                  unsigned int  sourceLen,
1602                                  int           blockSize100k,
1603                                  int           verbosity,
1604                                  int           workFactor );
1605 @end example
1606 Attempts to compress the data in @code{source[0 .. sourceLen-1]}
1607 into the destination buffer, @code{dest[0 .. *destLen-1]}.
1608 If the destination buffer is big enough, @code{*destLen} is
1609 set to the size of the compressed data, and @code{BZ_OK} is
1610 returned.  If the compressed data won't fit, @code{*destLen}
1611 is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
1612
1613 Compression in this manner is a one-shot event, done with a single call
1614 to this function.  The resulting compressed data is a complete
1615 @code{bzip2} format data stream.  There is no mechanism for making
1616 additional calls to provide extra input data.  If you want that kind of
1617 mechanism, use the low-level interface.
1618
1619 For the meaning of parameters @code{blockSize100k}, @code{verbosity}
1620 and @code{workFactor}, @* see @code{BZ2_bzCompressInit}.
1621
1622 To guarantee that the compressed data will fit in its buffer, allocate
1623 an output buffer of size 1% larger than the uncompressed data, plus
1624 six hundred extra bytes.
1625
1626 @code{BZ2_bzBuffToBuffDecompress} will not write data at or
1627 beyond @code{dest[*destLen]}, even in case of buffer overflow.
1628
1629 Possible return values:
1630 @display
1631       @code{BZ_CONFIG_ERROR}
1632          if the library has been mis-compiled
1633       @code{BZ_PARAM_ERROR}
1634          if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
1635          or @code{blockSize100k < 1} or @code{blockSize100k > 9}
1636          or @code{verbosity < 0} or @code{verbosity > 4}
1637          or @code{workFactor < 0} or @code{workFactor > 250}
1638       @code{BZ_MEM_ERROR}
1639          if insufficient memory is available
1640       @code{BZ_OUTBUFF_FULL}
1641          if the size of the compressed data exceeds @code{*destLen}
1642       @code{BZ_OK}
1643          otherwise
1644 @end display
1645
1646
1647
1648 @subsection @code{BZ2_bzBuffToBuffDecompress}
1649 @example
1650    int BZ2_bzBuffToBuffDecompress ( char*         dest,
1651                                     unsigned int* destLen,
1652                                     char*         source,
1653                                     unsigned int  sourceLen,
1654                                     int           small,
1655                                     int           verbosity );
1656 @end example
1657 Attempts to decompress the data in @code{source[0 .. sourceLen-1]}
1658 into the destination buffer, @code{dest[0 .. *destLen-1]}.
1659 If the destination buffer is big enough, @code{*destLen} is
1660 set to the size of the uncompressed data, and @code{BZ_OK} is
1661 returned.  If the compressed data won't fit, @code{*destLen}
1662 is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
1663
1664 @code{source} is assumed to hold a complete @code{bzip2} format
1665 data stream.  @* @code{BZ2_bzBuffToBuffDecompress} tries to decompress
1666 the entirety of the stream into the output buffer.
1667
1668 For the meaning of parameters @code{small} and @code{verbosity},
1669 see @code{BZ2_bzDecompressInit}.
1670
1671 Because the compression ratio of the compressed data cannot be known in
1672 advance, there is no easy way to guarantee that the output buffer will
1673 be big enough.  You may of course make arrangements in your code to
1674 record the size of the uncompressed data, but such a mechanism is beyond
1675 the scope of this library.
1676
1677 @code{BZ2_bzBuffToBuffDecompress} will not write data at or
1678 beyond @code{dest[*destLen]}, even in case of buffer overflow.
1679
1680 Possible return values:
1681 @display
1682       @code{BZ_CONFIG_ERROR}
1683          if the library has been mis-compiled
1684       @code{BZ_PARAM_ERROR}
1685          if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
1686          or @code{small != 0 && small != 1}
1687          or @code{verbosity < 0} or @code{verbosity > 4}
1688       @code{BZ_MEM_ERROR}
1689          if insufficient memory is available
1690       @code{BZ_OUTBUFF_FULL}
1691          if the size of the compressed data exceeds @code{*destLen}
1692       @code{BZ_DATA_ERROR}
1693          if a data integrity error was detected in the compressed data
1694       @code{BZ_DATA_ERROR_MAGIC}
1695          if the compressed data doesn't begin with the right magic bytes
1696       @code{BZ_UNEXPECTED_EOF}
1697          if the compressed data ends unexpectedly
1698       @code{BZ_OK}
1699          otherwise
1700 @end display
1701
1702
1703
1704 @section @code{zlib} compatibility functions
1705 Yoshioka Tsuneo has contributed some functions to
1706 give better @code{zlib} compatibility.  These functions are
1707 @code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
1708 @code{BZ2_bzclose},
1709 @code{BZ2_bzerror} and @code{BZ2_bzlibVersion}.
1710 These functions are not (yet) officially part of
1711 the library.  If they break, you get to keep all the pieces.
1712 Nevertheless, I think they work ok.
1713 @example
1714 typedef void BZFILE;
1715
1716 const char * BZ2_bzlibVersion ( void );
1717 @end example
1718 Returns a string indicating the library version.
1719 @example
1720 BZFILE * BZ2_bzopen  ( const char *path, const char *mode );
1721 BZFILE * BZ2_bzdopen ( int        fd,    const char *mode );
1722 @end example
1723 Opens a @code{.bz2} file for reading or writing, using either its name
1724 or a pre-existing file descriptor.
1725 Analogous to @code{fopen} and @code{fdopen}.
1726 @example
1727 int BZ2_bzread  ( BZFILE* b, void* buf, int len );
1728 int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
1729 @end example
1730 Reads/writes data from/to a previously opened @code{BZFILE}.
1731 Analogous to @code{fread} and @code{fwrite}.
1732 @example
1733 int  BZ2_bzflush ( BZFILE* b );
1734 void BZ2_bzclose ( BZFILE* b );
1735 @end example
1736 Flushes/closes a @code{BZFILE}.  @code{BZ2_bzflush} doesn't actually do
1737 anything.  Analogous to @code{fflush} and @code{fclose}.
1738
1739 @example
1740 const char * BZ2_bzerror ( BZFILE *b, int *errnum )
1741 @end example
1742 Returns a string describing the more recent error status of
1743 @code{b}, and also sets @code{*errnum} to its numerical value.
1744
1745
1746 @section Using the library in a @code{stdio}-free environment
1747
1748 @subsection Getting rid of @code{stdio}
1749
1750 In a deeply embedded application, you might want to use just
1751 the memory-to-memory functions.  You can do this conveniently
1752 by compiling the library with preprocessor symbol @code{BZ_NO_STDIO}
1753 defined.  Doing this gives you a library containing only the following
1754 eight functions:
1755
1756 @code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, @code{BZ2_bzCompressEnd} @*
1757 @code{BZ2_bzDecompressInit}, @code{BZ2_bzDecompress}, @code{BZ2_bzDecompressEnd} @*
1758 @code{BZ2_bzBuffToBuffCompress}, @code{BZ2_bzBuffToBuffDecompress}
1759
1760 When compiled like this, all functions will ignore @code{verbosity}
1761 settings.
1762
1763 @subsection Critical error handling
1764 @code{libbzip2} contains a number of internal assertion checks which
1765 should, needless to say, never be activated.  Nevertheless, if an
1766 assertion should fail, behaviour depends on whether or not the library
1767 was compiled with @code{BZ_NO_STDIO} set.
1768
1769 For a normal compile, an assertion failure yields the message
1770 @example
1771    bzip2/libbzip2: internal error number N.
1772    This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
1773    Please report it to me at: jseward@@acm.org.  If this happened
1774    when you were using some program which uses libbzip2 as a
1775    component, you should also report this bug to the author(s)
1776    of that program.  Please make an effort to report this bug;
1777    timely and accurate bug reports eventually lead to higher
1778    quality software.  Thanks.  Julian Seward, 21 March 2000.
1779 @end example
1780 where @code{N} is some error code number.  @code{exit(3)}
1781 is then called.
1782
1783 For a @code{stdio}-free library, assertion failures result
1784 in a call to a function declared as:
1785 @example
1786    extern void bz_internal_error ( int errcode );
1787 @end example
1788 The relevant code is passed as a parameter.  You should supply
1789 such a function.
1790
1791 In either case, once an assertion failure has occurred, any
1792 @code{bz_stream} records involved can be regarded as invalid.
1793 You should not attempt to resume normal operation with them.
1794
1795 You may, of course, change critical error handling to suit
1796 your needs.  As I said above, critical errors indicate bugs
1797 in the library and should not occur.  All "normal" error
1798 situations are indicated via error return codes from functions,
1799 and can be recovered from.
1800
1801
1802 @section Making a Windows DLL
1803 Everything related to Windows has been contributed by Yoshioka Tsuneo
1804 @* (@code{QWF00133@@niftyserve.or.jp} /
1805 @code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to
1806 him (but perhaps Cc: me, @code{jseward@@acm.org}).
1807
1808 My vague understanding of what to do is: using Visual C++ 5.0,
1809 open the project file @code{libbz2.dsp}, and build.  That's all.
1810
1811 If you can't
1812 open the project file for some reason, make a new one, naming these files:
1813 @code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
1814 @code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
1815 @code{randtable.c} and @code{libbz2.def}.  You will also need
1816 to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
1817
1818 If you don't use VC++, you may need to define the proprocessor symbol
1819 @code{_WIN32}.
1820
1821 Finally, @code{dlltest.c} is a sample program using the DLL.  It has a
1822 project file, @code{dlltest.dsp}.
1823
1824 If you just want a makefile for Visual C, have a look at
1825 @code{makefile.msc}.
1826
1827 Be aware that if you compile @code{bzip2} itself on Win32, you must set
1828 @code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file
1829 @code{bzip2.c}, before compiling.  Otherwise the resulting binary won't
1830 work correctly.
1831
1832 I haven't tried any of this stuff myself, but it all looks plausible.
1833
1834
1835
1836 @chapter Miscellanea
1837
1838 These are just some random thoughts of mine.  Your mileage may
1839 vary.
1840
1841 @section Limitations of the compressed file format
1842 @code{bzip2-1.0}, @code{0.9.5} and @code{0.9.0}
1843 use exactly the same file format as the previous
1844 version, @code{bzip2-0.1}.  This decision was made in the interests of
1845 stability.  Creating yet another incompatible compressed file format
1846 would create further confusion and disruption for users.
1847
1848 Nevertheless, this is not a painless decision.  Development
1849 work since the release of @code{bzip2-0.1} in August 1997
1850 has shown complexities in the file format which slow down
1851 decompression and, in retrospect, are unnecessary.  These are:
1852 @itemize @bullet
1853 @item The run-length encoder, which is the first of the
1854       compression transformations, is entirely irrelevant.
1855       The original purpose was to protect the sorting algorithm
1856       from the very worst case input: a string of repeated
1857       symbols.  But algorithm steps Q6a and Q6b in the original
1858       Burrows-Wheeler technical report (SRC-124) show how
1859       repeats can be handled without difficulty in block
1860       sorting.
1861 @item The randomisation mechanism doesn't really need to be
1862       there.  Udi Manber and Gene Myers published a suffix
1863       array construction algorithm a few years back, which
1864       can be employed to sort any block, no matter how
1865       repetitive, in O(N log N) time.  Subsequent work by
1866       Kunihiko Sadakane has produced a derivative O(N (log N)^2)
1867       algorithm which usually outperforms the Manber-Myers
1868       algorithm.
1869
1870       I could have changed to Sadakane's algorithm, but I find
1871       it to be slower than @code{bzip2}'s existing algorithm for
1872       most inputs, and the randomisation mechanism protects
1873       adequately against bad cases.  I didn't think it was
1874       a good tradeoff to make.  Partly this is due to the fact
1875       that I was not flooded with email complaints about
1876       @code{bzip2-0.1}'s performance on repetitive data, so
1877       perhaps it isn't a problem for real inputs.
1878
1879       Probably the best long-term solution,
1880       and the one I have incorporated into 0.9.5 and above,
1881       is to use the existing sorting
1882       algorithm initially, and fall back to a O(N (log N)^2)
1883       algorithm if the standard algorithm gets into difficulties.
1884 @item The compressed file format was never designed to be
1885       handled by a library, and I have had to jump though
1886       some hoops to produce an efficient implementation of
1887       decompression.  It's a bit hairy.  Try passing
1888       @code{decompress.c} through the C preprocessor
1889       and you'll see what I mean.  Much of this complexity
1890       could have been avoided if the compressed size of
1891       each block of data was recorded in the data stream.
1892 @item An Adler-32 checksum, rather than a CRC32 checksum,
1893       would be faster to compute.
1894 @end itemize
1895 It would be fair to say that the @code{bzip2} format was frozen
1896 before I properly and fully understood the performance
1897 consequences of doing so.
1898
1899 Improvements which I was able to incorporate into
1900 0.9.0, despite using the same file format, are:
1901 @itemize @bullet
1902 @item Single array implementation of the inverse BWT.  This
1903       significantly speeds up decompression, presumably
1904       because it reduces the number of cache misses.
1905 @item Faster inverse MTF transform for large MTF values.  The
1906       new implementation is based on the notion of sliding blocks
1907       of values.
1908 @item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
1909       and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
1910       Duh!  Well, you live and learn.
1911
1912 @end itemize
1913 Further ahead, it would be nice
1914 to be able to do random access into files.  This will
1915 require some careful design of compressed file formats.
1916
1917
1918
1919 @section Portability issues
1920 After some consideration, I have decided not to use
1921 GNU @code{autoconf} to configure 0.9.5 or 1.0.
1922
1923 @code{autoconf}, admirable and wonderful though it is,
1924 mainly assists with portability problems between Unix-like
1925 platforms.  But @code{bzip2} doesn't have much in the way
1926 of portability problems on Unix; most of the difficulties appear
1927 when porting to the Mac, or to Microsoft's operating systems.
1928 @code{autoconf} doesn't help in those cases, and brings in a
1929 whole load of new complexity.
1930
1931 Most people should be able to compile the library and program
1932 under Unix straight out-of-the-box, so to speak, especially
1933 if you have a version of GNU C available.
1934
1935 There are a couple of @code{__inline__} directives in the code.  GNU C
1936 (@code{gcc}) should be able to handle them.  If you're not using
1937 GNU C, your C compiler shouldn't see them at all.
1938 If your compiler does, for some reason, see them and doesn't
1939 like them, just @code{#define} @code{__inline__} to be @code{/* */}.  One
1940 easy way to do this is to compile with the flag @code{-D__inline__=},
1941 which should be understood by most Unix compilers.
1942
1943 If you still have difficulties, try compiling with the macro
1944 @code{BZ_STRICT_ANSI} defined.  This should enable you to build the
1945 library in a strictly ANSI compliant environment.  Building the program
1946 itself like this is dangerous and not supported, since you remove
1947 @code{bzip2}'s checks against compressing directories, symbolic links,
1948 devices, and other not-really-a-file entities.  This could cause
1949 filesystem corruption!
1950
1951 One other thing: if you create a @code{bzip2} binary for public
1952 distribution, please try and link it statically (@code{gcc -s}).  This
1953 avoids all sorts of library-version issues that others may encounter
1954 later on.
1955
1956 If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and
1957 @code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling.
1958 Otherwise the resulting binary won't work correctly.
1959
1960
1961
1962 @section Reporting bugs
1963 I tried pretty hard to make sure @code{bzip2} is
1964 bug free, both by design and by testing.  Hopefully
1965 you'll never need to read this section for real.
1966
1967 Nevertheless, if @code{bzip2} dies with a segmentation
1968 fault, a bus error or an internal assertion failure, it
1969 will ask you to email me a bug report.  Experience with
1970 version 0.1 shows that almost all these problems can
1971 be traced to either compiler bugs or hardware problems.
1972 @itemize @bullet
1973 @item
1974 Recompile the program with no optimisation, and see if it
1975 works.  And/or try a different compiler.
1976 I heard all sorts of stories about various flavours
1977 of GNU C (and other compilers) generating bad code for
1978 @code{bzip2}, and I've run across two such examples myself.
1979
1980 2.7.X versions of GNU C are known to generate bad code from
1981 time to time, at high optimisation levels.
1982 If you get problems, try using the flags
1983 @code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}.
1984 You should specifically @emph{not} use @code{-funroll-loops}.
1985
1986 You may notice that the Makefile runs six tests as part of
1987 the build process.  If the program passes all of these, it's
1988 a pretty good (but not 100%) indication that the compiler has
1989 done its job correctly.
1990 @item
1991 If @code{bzip2} crashes randomly, and the crashes are not
1992 repeatable, you may have a flaky memory subsystem.  @code{bzip2}
1993 really hammers your memory hierarchy, and if it's a bit marginal,
1994 you may get these problems.  Ditto if your disk or I/O subsystem
1995 is slowly failing.  Yup, this really does happen.
1996
1997 Try using a different machine of the same type, and see if
1998 you can repeat the problem.
1999 @item This isn't really a bug, but ... If @code{bzip2} tells
2000 you your file is corrupted on decompression, and you
2001 obtained the file via FTP, there is a possibility that you
2002 forgot to tell FTP to do a binary mode transfer.  That absolutely
2003 will cause the file to be non-decompressible.  You'll have to transfer
2004 it again.
2005 @end itemize
2006
2007 If you've incorporated @code{libbzip2} into your own program
2008 and are getting problems, please, please, please, check that the
2009 parameters you are passing in calls to the library, are
2010 correct, and in accordance with what the documentation says
2011 is allowable.  I have tried to make the library robust against
2012 such problems, but I'm sure I haven't succeeded.
2013
2014 Finally, if the above comments don't help, you'll have to send
2015 me a bug report.  Now, it's just amazing how many people will
2016 send me a bug report saying something like
2017 @display
2018    bzip2 crashed with segmentation fault on my machine
2019 @end display
2020 and absolutely nothing else.  Needless to say, a such a report
2021 is @emph{totally, utterly, completely and comprehensively 100% useless;
2022 a waste of your time, my time, and net bandwidth}.
2023 With no details at all, there's no way I can possibly begin
2024 to figure out what the problem is.
2025
2026 The rules of the game are: facts, facts, facts.  Don't omit
2027 them because "oh, they won't be relevant".  At the bare
2028 minimum:
2029 @display
2030    Machine type.  Operating system version.
2031    Exact version of @code{bzip2} (do @code{bzip2 -V}).
2032    Exact version of the compiler used.
2033    Flags passed to the compiler.
2034 @end display
2035 However, the most important single thing that will help me is
2036 the file that you were trying to compress or decompress at the
2037 time the problem happened.  Without that, my ability to do anything
2038 more than speculate about the cause, is limited.
2039
2040 Please remember that I connect to the Internet with a modem, so
2041 you should contact me before mailing me huge files.
2042
2043
2044 @section Did you get the right package?
2045
2046 @code{bzip2} is a resource hog.  It soaks up large amounts of CPU cycles
2047 and memory.  Also, it gives very large latencies.  In the worst case, you
2048 can feed many megabytes of uncompressed data into the library before
2049 getting any compressed output, so this probably rules out applications
2050 requiring interactive behaviour.
2051
2052 These aren't faults of my implementation, I hope, but more
2053 an intrinsic property of the Burrows-Wheeler transform (unfortunately).
2054 Maybe this isn't what you want.
2055
2056 If you want a compressor and/or library which is faster, uses less
2057 memory but gets pretty good compression, and has minimal latency,
2058 consider Jean-loup
2059 Gailly's and Mark Adler's work, @code{zlib-1.1.2} and
2060 @code{gzip-1.2.4}.  Look for them at
2061
2062 @code{http://www.cdrom.com/pub/infozip/zlib} and
2063 @code{http://www.gzip.org} respectively.
2064
2065 For something faster and lighter still, you might try Markus F X J
2066 Oberhumer's @code{LZO} real-time compression/decompression library, at
2067 @* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}.
2068
2069 If you want to use the @code{bzip2} algorithms to compress small blocks
2070 of data, 64k bytes or smaller, for example on an on-the-fly disk
2071 compressor, you'd be well advised not to use this library.  Instead,
2072 I've made a special library tuned for that kind of use.  It's part of
2073 @code{e2compr-0.40}, an on-the-fly disk compressor for the Linux
2074 @code{ext2} filesystem.  Look at
2075 @code{http://www.netspace.net.au/~reiter/e2compr}.
2076
2077
2078
2079 @section Testing
2080
2081 A record of the tests I've done.
2082
2083 First, some data sets:
2084 @itemize @bullet
2085 @item B: a directory containing 6001 files, one for every length in the
2086       range 0 to 6000 bytes.  The files contain random lowercase
2087       letters.  18.7 megabytes.
2088 @item H: my home directory tree.  Documents, source code, mail files,
2089       compressed data.  H contains B, and also a directory of
2090       files designed as boundary cases for the sorting; mostly very
2091       repetitive, nasty files.  565 megabytes.
2092 @item A: directory tree holding various applications built from source:
2093       @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc.
2094       2200 megabytes.
2095 @end itemize
2096 The tests conducted are as follows.  Each test means compressing
2097 (a copy of) each file in the data set, decompressing it and
2098 comparing it against the original.
2099
2100 First, a bunch of tests with block sizes and internal buffer
2101 sizes set very small,
2102 to detect any problems with the
2103 blocking and buffering mechanisms.
2104 This required modifying the source code so as to try to
2105 break it.
2106 @enumerate
2107 @item Data set H, with
2108       buffer size of 1 byte, and block size of 23 bytes.
2109 @item Data set B, buffer sizes 1 byte, block size 1 byte.
2110 @item As (2) but small-mode decompression.
2111 @item As (2) with block size 2 bytes.
2112 @item As (2) with block size 3 bytes.
2113 @item As (2) with block size 4 bytes.
2114 @item As (2) with block size 5 bytes.
2115 @item As (2) with block size 6 bytes and small-mode decompression.
2116 @item H with buffer size of 1 byte, but normal block
2117       size (up to 900000 bytes).
2118 @end enumerate
2119 Then some tests with unmodified source code.
2120 @enumerate
2121 @item H, all settings normal.
2122 @item As (1), with small-mode decompress.
2123 @item H, compress with flag @code{-1}.
2124 @item H, compress with flag @code{-s}, decompress with flag @code{-s}.
2125 @item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
2126       @code{bzip2-0.9.5} decompressing, all settings normal.
2127 @item Backwards compatibility:  H, @code{bzip2-0.9.5} compressing,
2128       @code{bzip2-0.1pl2} decompressing, all settings normal.
2129 @item Bigger tests: A, all settings normal.
2130 @item As (7), using the fallback (Sadakane-like) sorting algorithm.
2131 @item As (8), compress with flag @code{-1}, decompress with flag
2132       @code{-s}.
2133 @item H, using the fallback sorting algorithm.
2134 @item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing,
2135       @code{bzip2-0.9.5} decompressing, all settings normal.
2136 @item Backwards compatibility:  A, @code{bzip2-0.9.5} compressing,
2137       @code{bzip2-0.1pl2} decompressing, all settings normal.
2138 @item Misc test: about 400 megabytes of @code{.tar} files with
2139       @code{bzip2} compiled with Checker (a memory access error
2140        detector, like Purify).
2141 @item Misc tests to make sure it builds and runs ok on non-Linux/x86
2142       platforms.
2143 @end enumerate
2144 These tests were conducted on a 225 MHz IDT WinChip machine, running
2145 Linux 2.0.36.  They represent nearly a week of continuous computation.
2146 All tests completed successfully.
2147
2148
2149 @section Further reading
2150 @code{bzip2} is not research work, in the sense that it doesn't present
2151 any new ideas.  Rather, it's an engineering exercise based on existing
2152 ideas.
2153
2154 Four documents describe essentially all the ideas behind @code{bzip2}:
2155 @example
2156 Michael Burrows and D. J. Wheeler:
2157   "A block-sorting lossless data compression algorithm"
2158    10th May 1994.
2159    Digital SRC Research Report 124.
2160    ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
2161    If you have trouble finding it, try searching at the
2162    New Zealand Digital Library, http://www.nzdl.org.
2163
2164 Daniel S. Hirschberg and Debra A. LeLewer
2165   "Efficient Decoding of Prefix Codes"
2166    Communications of the ACM, April 1990, Vol 33, Number 4.
2167    You might be able to get an electronic copy of this
2168       from the ACM Digital Library.
2169
2170 David J. Wheeler
2171    Program bred3.c and accompanying document bred3.ps.
2172    This contains the idea behind the multi-table Huffman
2173    coding scheme.
2174    ftp://ftp.cl.cam.ac.uk/users/djw3/
2175
2176 Jon L. Bentley and Robert Sedgewick
2177   "Fast Algorithms for Sorting and Searching Strings"
2178    Available from Sedgewick's web page,
2179    www.cs.princeton.edu/~rs
2180 @end example
2181 The following paper gives valuable additional insights into the
2182 algorithm, but is not immediately the basis of any code
2183 used in bzip2.
2184 @example
2185 Peter Fenwick:
2186    Block Sorting Text Compression
2187    Proceedings of the 19th Australasian Computer Science Conference,
2188      Melbourne, Australia.  Jan 31 - Feb 2, 1996.
2189    ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
2190 @end example
2191 Kunihiko Sadakane's sorting algorithm, mentioned above,
2192 is available from:
2193 @example
2194 http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
2195 @end example
2196 The Manber-Myers suffix array construction
2197 algorithm is described in a paper
2198 available from:
2199 @example
2200 http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
2201 @end example
2202 Finally, the following paper documents some recent investigations
2203 I made into the performance of sorting algorithms:
2204 @example
2205 Julian Seward:
2206    On the Performance of BWT Sorting Algorithms
2207    Proceedings of the IEEE Data Compression Conference 2000
2208      Snowbird, Utah.  28-30 March 2000.
2209 @end example
2210
2211
2212 @contents
2213
2214 @bye
2215