3 <!-- This HTML file has been created by texi2html 1.54
4 from manual.texi on 23 March 2000 -->
6 <TITLE>bzip2 and libbzip2 - Programming with libbzip2</TITLE>
7 <link href="manual_4.html" rel=Next>
8 <link href="manual_2.html" rel=Previous>
9 <link href="manual_toc.html" rel=ToC>
13 <p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.
17 <H1><A NAME="SEC12" HREF="manual_toc.html#TOC12">Programming with <CODE>libbzip2</CODE></A></H1>
20 This chapter describes the programming interface to <CODE>libbzip2</CODE>.
24 For general background information, particularly about memory
25 use and performance aspects, you'd be well advised to read Chapter 2
31 <H2><A NAME="SEC13" HREF="manual_toc.html#TOC13">Top-level structure</A></H2>
34 <CODE>libbzip2</CODE> is a flexible library for compressing and decompressing
35 data in the <CODE>bzip2</CODE> data format. Although packaged as a single
36 entity, it helps to regard the library as three separate parts: the low
37 level interface, and the high level interface, and some utility
42 The structure of <CODE>libbzip2</CODE>'s interfaces is similar to
43 that of Jean-loup Gailly's and Mark Adler's excellent <CODE>zlib</CODE>
48 All externally visible symbols have names beginning <CODE>BZ2_</CODE>.
49 This is new in version 1.0. The intention is to minimise pollution
50 of the namespaces of library clients.
55 <H3><A NAME="SEC14" HREF="manual_toc.html#TOC14">Low-level summary</A></H3>
58 This interface provides services for compressing and decompressing
59 data in memory. There's no provision for dealing with files, streams
60 or any other I/O mechanisms, just straight memory-to-memory work.
61 In fact, this part of the library can be compiled without inclusion
62 of <CODE>stdio.h</CODE>, which may be helpful for embedded applications.
66 The low-level part of the library has no global variables and
67 is therefore thread-safe.
71 Six routines make up the low level interface:
72 <CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, and <BR> <CODE>BZ2_bzCompressEnd</CODE>
74 and a corresponding trio <CODE>BZ2_bzDecompressInit</CODE>, <BR> <CODE>BZ2_bzDecompress</CODE>
75 and <CODE>BZ2_bzDecompressEnd</CODE> for decompression.
76 The <CODE>*Init</CODE> functions allocate
77 memory for compression/decompression and do other
78 initialisations, whilst the <CODE>*End</CODE> functions close down operations
83 The real work is done by <CODE>BZ2_bzCompress</CODE> and <CODE>BZ2_bzDecompress</CODE>.
84 These compress and decompress data from a user-supplied input buffer
85 to a user-supplied output buffer. These buffers can be any size;
86 arbitrary quantities of data are handled by making repeated calls
87 to these functions. This is a flexible mechanism allowing a
88 consumer-pull style of activity, or producer-push, or a mixture of
95 <H3><A NAME="SEC15" HREF="manual_toc.html#TOC15">High-level summary</A></H3>
98 This interface provides some handy wrappers around the low-level
99 interface to facilitate reading and writing <CODE>bzip2</CODE> format
100 files (<CODE>.bz2</CODE> files). The routines provide hooks to facilitate
101 reading files in which the <CODE>bzip2</CODE> data stream is embedded
102 within some larger-scale file structure, or where there are
103 multiple <CODE>bzip2</CODE> data streams concatenated end-to-end.
107 For reading files, <CODE>BZ2_bzReadOpen</CODE>, <CODE>BZ2_bzRead</CODE>,
108 <CODE>BZ2_bzReadClose</CODE> and <BR> <CODE>BZ2_bzReadGetUnused</CODE> are supplied. For
109 writing files, <CODE>BZ2_bzWriteOpen</CODE>, <CODE>BZ2_bzWrite</CODE> and
110 <CODE>BZ2_bzWriteFinish</CODE> are available.
114 As with the low-level library, no global variables are used
115 so the library is per se thread-safe. However, if I/O errors
116 occur whilst reading or writing the underlying compressed files,
117 you may have to consult <CODE>errno</CODE> to determine the cause of
118 the error. In that case, you'd need a C library which correctly
119 supports <CODE>errno</CODE> in a multithreaded environment.
123 To make the library a little simpler and more portable,
124 <CODE>BZ2_bzReadOpen</CODE> and <CODE>BZ2_bzWriteOpen</CODE> require you to pass them file
125 handles (<CODE>FILE*</CODE>s) which have previously been opened for reading or
126 writing respectively. That avoids portability problems associated with
127 file operations and file attributes, whilst not being much of an
128 imposition on the programmer.
134 <H3><A NAME="SEC16" HREF="manual_toc.html#TOC16">Utility functions summary</A></H3>
136 For very simple needs, <CODE>BZ2_bzBuffToBuffCompress</CODE> and
137 <CODE>BZ2_bzBuffToBuffDecompress</CODE> are provided. These compress
138 data in memory from one buffer to another buffer in a single
139 function call. You should assess whether these functions
140 fulfill your memory-to-memory compression/decompression
141 requirements before investing effort in understanding the more
142 general but more complex low-level interface.
146 Yoshioka Tsuneo (<CODE>QWF00133@niftyserve.or.jp</CODE> /
147 <CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>) has contributed some functions to
148 give better <CODE>zlib</CODE> compatibility. These functions are
149 <CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>,
150 <CODE>BZ2_bzclose</CODE>,
151 <CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>. You may find these functions
152 more convenient for simple file reading and writing, than those in the
153 high-level interface. These functions are not (yet) officially part of
154 the library, and are minimally documented here. If they break, you
155 get to keep all the pieces. I hope to document them properly when time
160 Yoshioka also contributed modifications to allow the library to be
161 built as a Windows DLL.
167 <H2><A NAME="SEC17" HREF="manual_toc.html#TOC17">Error handling</A></H2>
170 The library is designed to recover cleanly in all situations, including
171 the worst-case situation of decompressing random data. I'm not
172 100% sure that it can always do this, so you might want to add
173 a signal handler to catch segmentation violations during decompression
174 if you are feeling especially paranoid. I would be interested in
175 hearing more about the robustness of the library to corrupted
180 Version 1.0 is much more robust in this respect than
181 0.9.0 or 0.9.5. Investigations with Checker (a tool for
182 detecting problems with memory management, similar to Purify)
183 indicate that, at least for the few files I tested, all single-bit
184 errors in the decompressed data are caught properly, with no
185 segmentation faults, no reads of uninitialised data and no
186 out of range reads or writes. So it's certainly much improved,
187 although I wouldn't claim it to be totally bombproof.
191 The file <CODE>bzlib.h</CODE> contains all definitions needed to use
192 the library. In particular, you should definitely not include
193 <CODE>bzlib_private.h</CODE>.
197 In <CODE>bzlib.h</CODE>, the various return values are defined. The following
198 list is not intended as an exhaustive description of the circumstances
199 in which a given value may be returned -- those descriptions are given
200 later. Rather, it is intended to convey the rough meaning of each
201 return value. The first five actions are normal and not intended to
202 denote an error situation.
205 <DT><CODE>BZ_OK</CODE>
207 The requested action was completed successfully.
208 <DT><CODE>BZ_RUN_OK</CODE>
210 <DT><CODE>BZ_FLUSH_OK</CODE>
212 <DT><CODE>BZ_FINISH_OK</CODE>
214 In <CODE>BZ2_bzCompress</CODE>, the requested flush/finish/nothing-special action
215 was completed successfully.
216 <DT><CODE>BZ_STREAM_END</CODE>
218 Compression of data was completed, or the logical stream end was
219 detected during decompression.
223 The following return values indicate an error of some kind.
226 <DT><CODE>BZ_CONFIG_ERROR</CODE>
228 Indicates that the library has been improperly compiled on your
229 platform -- a major configuration error. Specifically, it means
230 that <CODE>sizeof(char)</CODE>, <CODE>sizeof(short)</CODE> and <CODE>sizeof(int)</CODE>
231 are not 1, 2 and 4 respectively, as they should be. Note that the
232 library should still work properly on 64-bit platforms which follow
233 the LP64 programming model -- that is, where <CODE>sizeof(long)</CODE>
234 and <CODE>sizeof(void*)</CODE> are 8. Under LP64, <CODE>sizeof(int)</CODE> is
235 still 4, so <CODE>libbzip2</CODE>, which doesn't use the <CODE>long</CODE> type,
237 <DT><CODE>BZ_SEQUENCE_ERROR</CODE>
239 When using the library, it is important to call the functions in the
240 correct sequence and with data structures (buffers etc) in the correct
241 states. <CODE>libbzip2</CODE> checks as much as it can to ensure this is
242 happening, and returns <CODE>BZ_SEQUENCE_ERROR</CODE> if not. Code which
243 complies precisely with the function semantics, as detailed below,
244 should never receive this value; such an event denotes buggy code
245 which you should investigate.
246 <DT><CODE>BZ_PARAM_ERROR</CODE>
248 Returned when a parameter to a function call is out of range
249 or otherwise manifestly incorrect. As with <CODE>BZ_SEQUENCE_ERROR</CODE>,
250 this denotes a bug in the client code. The distinction between
251 <CODE>BZ_PARAM_ERROR</CODE> and <CODE>BZ_SEQUENCE_ERROR</CODE> is a bit hazy, but still worth
253 <DT><CODE>BZ_MEM_ERROR</CODE>
255 Returned when a request to allocate memory failed. Note that the
256 quantity of memory needed to decompress a stream cannot be determined
257 until the stream's header has been read. So <CODE>BZ2_bzDecompress</CODE> and
258 <CODE>BZ2_bzRead</CODE> may return <CODE>BZ_MEM_ERROR</CODE> even though some of
259 the compressed data has been read. The same is not true for
260 compression; once <CODE>BZ2_bzCompressInit</CODE> or <CODE>BZ2_bzWriteOpen</CODE> have
261 successfully completed, <CODE>BZ_MEM_ERROR</CODE> cannot occur.
262 <DT><CODE>BZ_DATA_ERROR</CODE>
264 Returned when a data integrity error is detected during decompression.
265 Most importantly, this means when stored and computed CRCs for the
266 data do not match. This value is also returned upon detection of any
267 other anomaly in the compressed data.
268 <DT><CODE>BZ_DATA_ERROR_MAGIC</CODE>
270 As a special case of <CODE>BZ_DATA_ERROR</CODE>, it is sometimes useful to
271 know when the compressed stream does not start with the correct
272 magic bytes (<CODE>'B' 'Z' 'h'</CODE>).
273 <DT><CODE>BZ_IO_ERROR</CODE>
275 Returned by <CODE>BZ2_bzRead</CODE> and <CODE>BZ2_bzWrite</CODE> when there is an error
276 reading or writing in the compressed file, and by <CODE>BZ2_bzReadOpen</CODE>
277 and <CODE>BZ2_bzWriteOpen</CODE> for attempts to use a file for which the
278 error indicator (viz, <CODE>ferror(f)</CODE>) is set.
279 On receipt of <CODE>BZ_IO_ERROR</CODE>, the caller should consult
280 <CODE>errno</CODE> and/or <CODE>perror</CODE> to acquire operating-system
281 specific information about the problem.
282 <DT><CODE>BZ_UNEXPECTED_EOF</CODE>
284 Returned by <CODE>BZ2_bzRead</CODE> when the compressed file finishes
285 before the logical end of stream is detected.
286 <DT><CODE>BZ_OUTBUFF_FULL</CODE>
288 Returned by <CODE>BZ2_bzBuffToBuffCompress</CODE> and
289 <CODE>BZ2_bzBuffToBuffDecompress</CODE> to indicate that the output data
290 will not fit into the output buffer provided.
295 <H2><A NAME="SEC18" HREF="manual_toc.html#TOC18">Low-level interface</A></H2>
299 <H3><A NAME="SEC19" HREF="manual_toc.html#TOC19"><CODE>BZ2_bzCompressInit</CODE></A></H3>
305 unsigned int avail_in;
306 unsigned int total_in_lo32;
307 unsigned int total_in_hi32;
310 unsigned int avail_out;
311 unsigned int total_out_lo32;
312 unsigned int total_out_hi32;
316 void *(*bzalloc)(void *,int,int);
317 void (*bzfree)(void *,void *);
322 int BZ2_bzCompressInit ( bz_stream *strm,
330 Prepares for compression. The <CODE>bz_stream</CODE> structure
331 holds all data pertaining to the compression activity.
332 A <CODE>bz_stream</CODE> structure should be allocated and initialised
334 The fields of <CODE>bz_stream</CODE>
335 comprise the entirety of the user-visible data. <CODE>state</CODE>
336 is a pointer to the private data structures required for compression.
340 Custom memory allocators are supported, via fields <CODE>bzalloc</CODE>,
342 and <CODE>opaque</CODE>. The value
343 <CODE>opaque</CODE> is passed to as the first argument to
344 all calls to <CODE>bzalloc</CODE> and <CODE>bzfree</CODE>, but is
345 otherwise ignored by the library.
346 The call <CODE>bzalloc ( opaque, n, m )</CODE> is expected to return a
347 pointer <CODE>p</CODE> to
348 <CODE>n * m</CODE> bytes of memory, and <CODE>bzfree ( opaque, p )</CODE>
354 If you don't want to use a custom memory allocator, set <CODE>bzalloc</CODE>,
355 <CODE>bzfree</CODE> and
356 <CODE>opaque</CODE> to <CODE>NULL</CODE>,
357 and the library will then use the standard <CODE>malloc</CODE>/<CODE>free</CODE>
362 Before calling <CODE>BZ2_bzCompressInit</CODE>, fields <CODE>bzalloc</CODE>,
363 <CODE>bzfree</CODE> and <CODE>opaque</CODE> should
364 be filled appropriately, as just described. Upon return, the internal
365 state will have been allocated and initialised, and <CODE>total_in_lo32</CODE>,
366 <CODE>total_in_hi32</CODE>, <CODE>total_out_lo32</CODE> and
367 <CODE>total_out_hi32</CODE> will have been set to zero.
368 These four fields are used by the library
369 to inform the caller of the total amount of data passed into and out of
370 the library, respectively. You should not try to change them.
371 As of version 1.0, 64-bit counts are maintained, even on 32-bit
372 platforms, using the <CODE>_hi32</CODE> fields to store the upper 32 bits
373 of the count. So, for example, the total amount of data in
374 is <CODE>(total_in_hi32 << 32) + total_in_lo32</CODE>.
378 Parameter <CODE>blockSize100k</CODE> specifies the block size to be used for
379 compression. It should be a value between 1 and 9 inclusive, and the
380 actual block size used is 100000 x this figure. 9 gives the best
381 compression but takes most memory.
385 Parameter <CODE>verbosity</CODE> should be set to a number between 0 and 4
386 inclusive. 0 is silent, and greater numbers give increasingly verbose
387 monitoring/debugging output. If the library has been compiled with
388 <CODE>-DBZ_NO_STDIO</CODE>, no such output will appear for any verbosity
393 Parameter <CODE>workFactor</CODE> controls how the compression phase behaves
394 when presented with worst case, highly repetitive, input data. If
395 compression runs into difficulties caused by repetitive data, the
396 library switches from the standard sorting algorithm to a fallback
397 algorithm. The fallback is slower than the standard algorithm by
398 perhaps a factor of three, but always behaves reasonably, no matter how
403 Lower values of <CODE>workFactor</CODE> reduce the amount of effort the
404 standard algorithm will expend before resorting to the fallback. You
405 should set this parameter carefully; too low, and many inputs will be
406 handled by the fallback algorithm and so compress rather slowly, too
407 high, and your average-to-worst case compression times can become very
408 large. The default value of 30 gives reasonable behaviour over a wide
409 range of circumstances.
413 Allowable values range from 0 to 250 inclusive. 0 is a special case,
414 equivalent to using the default value of 30.
418 Note that the compressed output generated is the same regardless of
419 whether or not the fallback algorithm is used.
423 Be aware also that this parameter may disappear entirely in future
424 versions of the library. In principle it should be possible to devise a
425 good way to automatically choose which algorithm to use. Such a
426 mechanism would render the parameter obsolete.
430 Possible return values:
433 <CODE>BZ_CONFIG_ERROR</CODE>
434 if the library has been mis-compiled
435 <CODE>BZ_PARAM_ERROR</CODE>
436 if <CODE>strm</CODE> is <CODE>NULL</CODE>
437 or <CODE>blockSize</CODE> < 1 or <CODE>blockSize</CODE> > 9
438 or <CODE>verbosity</CODE> < 0 or <CODE>verbosity</CODE> > 4
439 or <CODE>workFactor</CODE> < 0 or <CODE>workFactor</CODE> > 250
440 <CODE>BZ_MEM_ERROR</CODE>
441 if not enough memory is available
447 Allowable next actions:
450 <CODE>BZ2_bzCompress</CODE>
451 if <CODE>BZ_OK</CODE> is returned
452 no specific action needed in case of error
457 <H3><A NAME="SEC20" HREF="manual_toc.html#TOC20"><CODE>BZ2_bzCompress</CODE></A></H3>
460 int BZ2_bzCompress ( bz_stream *strm, int action );
464 Provides more input and/or output buffer space for the library. The
465 caller maintains input and output buffers, and calls <CODE>BZ2_bzCompress</CODE> to
466 transfer data between them.
470 Before each call to <CODE>BZ2_bzCompress</CODE>, <CODE>next_in</CODE> should point at
471 the data to be compressed, and <CODE>avail_in</CODE> should indicate how many
472 bytes the library may read. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_in</CODE>,
473 <CODE>avail_in</CODE> and <CODE>total_in</CODE> to reflect the number of bytes it
478 Similarly, <CODE>next_out</CODE> should point to a buffer in which the
479 compressed data is to be placed, with <CODE>avail_out</CODE> indicating how
480 much output space is available. <CODE>BZ2_bzCompress</CODE> updates
481 <CODE>next_out</CODE>, <CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect the
482 number of bytes output.
486 You may provide and remove as little or as much data as you like on each
487 call of <CODE>BZ2_bzCompress</CODE>. In the limit, it is acceptable to supply and
488 remove data one byte at a time, although this would be terribly
489 inefficient. You should always ensure that at least one byte of output
490 space is available at each call.
494 A second purpose of <CODE>BZ2_bzCompress</CODE> is to request a change of mode of the
499 Conceptually, a compressed stream can be in one of four states: IDLE,
500 RUNNING, FLUSHING and FINISHING. Before initialisation
501 (<CODE>BZ2_bzCompressInit</CODE>) and after termination (<CODE>BZ2_bzCompressEnd</CODE>), a
502 stream is regarded as IDLE.
506 Upon initialisation (<CODE>BZ2_bzCompressInit</CODE>), the stream is placed in the
507 RUNNING state. Subsequent calls to <CODE>BZ2_bzCompress</CODE> should pass
508 <CODE>BZ_RUN</CODE> as the requested action; other actions are illegal and
509 will result in <CODE>BZ_SEQUENCE_ERROR</CODE>.
513 At some point, the calling program will have provided all the input data
514 it wants to. It will then want to finish up -- in effect, asking the
515 library to process any data it might have buffered internally. In this
516 state, <CODE>BZ2_bzCompress</CODE> will no longer attempt to read data from
517 <CODE>next_in</CODE>, but it will want to write data to <CODE>next_out</CODE>.
518 Because the output buffer supplied by the user can be arbitrarily small,
519 the finishing-up operation cannot necessarily be done with a single call
520 of <CODE>BZ2_bzCompress</CODE>.
524 Instead, the calling program passes <CODE>BZ_FINISH</CODE> as an action to
525 <CODE>BZ2_bzCompress</CODE>. This changes the stream's state to FINISHING. Any
526 remaining input (ie, <CODE>next_in[0 .. avail_in-1]</CODE>) is compressed and
527 transferred to the output buffer. To do this, <CODE>BZ2_bzCompress</CODE> must be
528 called repeatedly until all the output has been consumed. At that
529 point, <CODE>BZ2_bzCompress</CODE> returns <CODE>BZ_STREAM_END</CODE>, and the stream's
530 state is set back to IDLE. <CODE>BZ2_bzCompressEnd</CODE> should then be
535 Just to make sure the calling program does not cheat, the library makes
536 a note of <CODE>avail_in</CODE> at the time of the first call to
537 <CODE>BZ2_bzCompress</CODE> which has <CODE>BZ_FINISH</CODE> as an action (ie, at the
538 time the program has announced its intention to not supply any more
539 input). By comparing this value with that of <CODE>avail_in</CODE> over
540 subsequent calls to <CODE>BZ2_bzCompress</CODE>, the library can detect any
541 attempts to slip in more data to compress. Any calls for which this is
542 detected will return <CODE>BZ_SEQUENCE_ERROR</CODE>. This indicates a
543 programming mistake which should be corrected.
547 Instead of asking to finish, the calling program may ask
548 <CODE>BZ2_bzCompress</CODE> to take all the remaining input, compress it and
549 terminate the current (Burrows-Wheeler) compression block. This could
550 be useful for error control purposes. The mechanism is analogous to
551 that for finishing: call <CODE>BZ2_bzCompress</CODE> with an action of
552 <CODE>BZ_FLUSH</CODE>, remove output data, and persist with the
553 <CODE>BZ_FLUSH</CODE> action until the value <CODE>BZ_RUN</CODE> is returned. As
554 with finishing, <CODE>BZ2_bzCompress</CODE> detects any attempt to provide more
555 input data once the flush has begun.
559 Once the flush is complete, the stream returns to the normal RUNNING
564 This all sounds pretty complex, but isn't really. Here's a table
565 which shows which actions are allowable in each state, what action
566 will be taken, what the next state is, and what the non-error return
567 values are. Note that you can't explicitly ask what state the
568 stream is in, but nor do you need to -- it can be inferred from the
569 values returned by <CODE>BZ2_bzCompress</CODE>.
572 IDLE/<CODE>any</CODE>
573 Illegal. IDLE state only exists after <CODE>BZ2_bzCompressEnd</CODE> or
574 before <CODE>BZ2_bzCompressInit</CODE>.
575 Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
577 RUNNING/<CODE>BZ_RUN</CODE>
578 Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible.
580 Return value = <CODE>BZ_RUN_OK</CODE>
582 RUNNING/<CODE>BZ_FLUSH</CODE>
583 Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE>
584 to <CODE>next_out</CODE> as much as possible, but do not accept any more input.
585 Next state = FLUSHING
586 Return value = <CODE>BZ_FLUSH_OK</CODE>
588 RUNNING/<CODE>BZ_FINISH</CODE>
589 Remember current value of <CODE>next_in</CODE>. Compress from <CODE>next_in</CODE>
590 to <CODE>next_out</CODE> as much as possible, but do not accept any more input.
591 Next state = FINISHING
592 Return value = <CODE>BZ_FINISH_OK</CODE>
594 FLUSHING/<CODE>BZ_FLUSH</CODE>
595 Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible,
596 but do not accept any more input.
597 If all the existing input has been used up and all compressed
598 output has been removed
599 Next state = RUNNING; Return value = <CODE>BZ_RUN_OK</CODE>
601 Next state = FLUSHING; Return value = <CODE>BZ_FLUSH_OK</CODE>
605 Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
607 FINISHING/<CODE>BZ_FINISH</CODE>
608 Compress from <CODE>next_in</CODE> to <CODE>next_out</CODE> as much as possible,
609 but to not accept any more input.
610 If all the existing input has been used up and all compressed
611 output has been removed
612 Next state = IDLE; Return value = <CODE>BZ_STREAM_END</CODE>
614 Next state = FINISHING; Return value = <CODE>BZ_FINISHING</CODE>
618 Return value = <CODE>BZ_SEQUENCE_ERROR</CODE>
622 That still looks complicated? Well, fair enough. The usual sequence
623 of calls for compressing a load of data is:
626 <LI>Get started with <CODE>BZ2_bzCompressInit</CODE>.
628 <LI>Shovel data in and shlurp out its compressed form using zero or more
630 calls of <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_RUN</CODE>.
633 Repeatedly call <CODE>BZ2_bzCompress</CODE> with action = <CODE>BZ_FINISH</CODE>,
634 copying out the compressed output, until <CODE>BZ_STREAM_END</CODE> is returned.
635 <LI>Close up and go home. Call <CODE>BZ2_bzCompressEnd</CODE>.
640 If the data you want to compress fits into your input buffer all
641 at once, you can skip the calls of <CODE>BZ2_bzCompress ( ..., BZ_RUN )</CODE> and
642 just do the <CODE>BZ2_bzCompress ( ..., BZ_FINISH )</CODE> calls.
646 All required memory is allocated by <CODE>BZ2_bzCompressInit</CODE>. The
647 compression library can accept any data at all (obviously). So you
648 shouldn't get any error return values from the <CODE>BZ2_bzCompress</CODE> calls.
649 If you do, they will be <CODE>BZ_SEQUENCE_ERROR</CODE>, and indicate a bug in
654 Trivial other possible return values:
657 <CODE>BZ_PARAM_ERROR</CODE>
658 if <CODE>strm</CODE> is <CODE>NULL</CODE>, or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
663 <H3><A NAME="SEC21" HREF="manual_toc.html#TOC21"><CODE>BZ2_bzCompressEnd</CODE></A></H3>
666 int BZ2_bzCompressEnd ( bz_stream *strm );
670 Releases all memory associated with a compression stream.
674 Possible return values:
677 <CODE>BZ_PARAM_ERROR</CODE> if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
678 <CODE>BZ_OK</CODE> otherwise
683 <H3><A NAME="SEC22" HREF="manual_toc.html#TOC22"><CODE>BZ2_bzDecompressInit</CODE></A></H3>
686 int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
690 Prepares for decompression. As with <CODE>BZ2_bzCompressInit</CODE>, a
691 <CODE>bz_stream</CODE> record should be allocated and initialised before the
692 call. Fields <CODE>bzalloc</CODE>, <CODE>bzfree</CODE> and <CODE>opaque</CODE> should be
693 set if a custom memory allocator is required, or made <CODE>NULL</CODE> for
694 the normal <CODE>malloc</CODE>/<CODE>free</CODE> routines. Upon return, the internal
695 state will have been initialised, and <CODE>total_in</CODE> and
696 <CODE>total_out</CODE> will be zero.
700 For the meaning of parameter <CODE>verbosity</CODE>, see <CODE>BZ2_bzCompressInit</CODE>.
704 If <CODE>small</CODE> is nonzero, the library will use an alternative
705 decompression algorithm which uses less memory but at the cost of
706 decompressing more slowly (roughly speaking, half the speed, but the
707 maximum memory requirement drops to around 2300k). See Chapter 2 for
708 more information on memory management.
712 Note that the amount of memory needed to decompress
713 a stream cannot be determined until the stream's header has been read,
714 so even if <CODE>BZ2_bzDecompressInit</CODE> succeeds, a subsequent
715 <CODE>BZ2_bzDecompress</CODE> could fail with <CODE>BZ_MEM_ERROR</CODE>.
719 Possible return values:
722 <CODE>BZ_CONFIG_ERROR</CODE>
723 if the library has been mis-compiled
724 <CODE>BZ_PARAM_ERROR</CODE>
725 if <CODE>(small != 0 && small != 1)</CODE>
726 or <CODE>(verbosity < 0 || verbosity > 4)</CODE>
727 <CODE>BZ_MEM_ERROR</CODE>
728 if insufficient memory is available
732 Allowable next actions:
735 <CODE>BZ2_bzDecompress</CODE>
736 if <CODE>BZ_OK</CODE> was returned
737 no specific action required in case of error
746 <H3><A NAME="SEC23" HREF="manual_toc.html#TOC23"><CODE>BZ2_bzDecompress</CODE></A></H3>
749 int BZ2_bzDecompress ( bz_stream *strm );
753 Provides more input and/out output buffer space for the library. The
754 caller maintains input and output buffers, and uses <CODE>BZ2_bzDecompress</CODE>
755 to transfer data between them.
759 Before each call to <CODE>BZ2_bzDecompress</CODE>, <CODE>next_in</CODE>
760 should point at the compressed data,
761 and <CODE>avail_in</CODE> should indicate how many bytes the library
762 may read. <CODE>BZ2_bzDecompress</CODE> updates <CODE>next_in</CODE>, <CODE>avail_in</CODE>
763 and <CODE>total_in</CODE>
764 to reflect the number of bytes it has read.
768 Similarly, <CODE>next_out</CODE> should point to a buffer in which the uncompressed
769 output is to be placed, with <CODE>avail_out</CODE> indicating how much output space
770 is available. <CODE>BZ2_bzCompress</CODE> updates <CODE>next_out</CODE>,
771 <CODE>avail_out</CODE> and <CODE>total_out</CODE> to reflect
772 the number of bytes output.
776 You may provide and remove as little or as much data as you like on
777 each call of <CODE>BZ2_bzDecompress</CODE>.
778 In the limit, it is acceptable to
779 supply and remove data one byte at a time, although this would be
780 terribly inefficient. You should always ensure that at least one
781 byte of output space is available at each call.
785 Use of <CODE>BZ2_bzDecompress</CODE> is simpler than <CODE>BZ2_bzCompress</CODE>.
789 You should provide input and remove output as described above, and
790 repeatedly call <CODE>BZ2_bzDecompress</CODE> until <CODE>BZ_STREAM_END</CODE> is
791 returned. Appearance of <CODE>BZ_STREAM_END</CODE> denotes that
792 <CODE>BZ2_bzDecompress</CODE> has detected the logical end of the compressed
793 stream. <CODE>BZ2_bzDecompress</CODE> will not produce <CODE>BZ_STREAM_END</CODE> until
794 all output data has been placed into the output buffer, so once
795 <CODE>BZ_STREAM_END</CODE> appears, you are guaranteed to have available all
796 the decompressed output, and <CODE>BZ2_bzDecompressEnd</CODE> can safely be
801 If case of an error return value, you should call <CODE>BZ2_bzDecompressEnd</CODE>
802 to clean up and release memory.
806 Possible return values:
809 <CODE>BZ_PARAM_ERROR</CODE>
810 if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
811 or <CODE>strm->avail_out < 1</CODE>
812 <CODE>BZ_DATA_ERROR</CODE>
813 if a data integrity error is detected in the compressed stream
814 <CODE>BZ_DATA_ERROR_MAGIC</CODE>
815 if the compressed stream doesn't begin with the right magic bytes
816 <CODE>BZ_MEM_ERROR</CODE>
817 if there wasn't enough memory available
818 <CODE>BZ_STREAM_END</CODE>
819 if the logical end of the data stream was detected and all
820 output in has been consumed, eg <CODE>s->avail_out > 0</CODE>
826 Allowable next actions:
829 <CODE>BZ2_bzDecompress</CODE>
830 if <CODE>BZ_OK</CODE> was returned
831 <CODE>BZ2_bzDecompressEnd</CODE>
837 <H3><A NAME="SEC24" HREF="manual_toc.html#TOC24"><CODE>BZ2_bzDecompressEnd</CODE></A></H3>
840 int BZ2_bzDecompressEnd ( bz_stream *strm );
844 Releases all memory associated with a decompression stream.
848 Possible return values:
851 <CODE>BZ_PARAM_ERROR</CODE>
852 if <CODE>strm</CODE> is <CODE>NULL</CODE> or <CODE>strm->s</CODE> is <CODE>NULL</CODE>
858 Allowable next actions:
866 <H2><A NAME="SEC25" HREF="manual_toc.html#TOC25">High-level interface</A></H2>
869 This interface provides functions for reading and writing
870 <CODE>bzip2</CODE> format files. First, some general points.
875 <LI>All of the functions take an <CODE>int*</CODE> first argument,
877 <CODE>bzerror</CODE>.
878 After each call, <CODE>bzerror</CODE> should be consulted first to determine
879 the outcome of the call. If <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>,
881 successfully, and only then should the return value of the function
882 (if any) be consulted. If <CODE>bzerror</CODE> is <CODE>BZ_IO_ERROR</CODE>,
884 reading/writing the underlying compressed file, and you should
885 then consult <CODE>errno</CODE>/<CODE>perror</CODE> to determine the
886 cause of the difficulty.
887 <CODE>bzerror</CODE> may also be set to various other values; precise details are
888 given on a per-function basis below.
889 <LI>If <CODE>bzerror</CODE> indicates an error
891 (ie, anything except <CODE>BZ_OK</CODE> and <CODE>BZ_STREAM_END</CODE>),
892 you should immediately call <CODE>BZ2_bzReadClose</CODE> (or <CODE>BZ2_bzWriteClose</CODE>,
893 depending on whether you are attempting to read or to write)
894 to free up all resources associated
895 with the stream. Once an error has been indicated, behaviour of all calls
896 except <CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) is undefined.
897 The implication is that (1) <CODE>bzerror</CODE> should
898 be checked after each call, and (2) if <CODE>bzerror</CODE> indicates an error,
899 <CODE>BZ2_bzReadClose</CODE> (<CODE>BZ2_bzWriteClose</CODE>) should then be called to clean up.
900 <LI>The <CODE>FILE*</CODE> arguments passed to
902 <CODE>BZ2_bzReadOpen</CODE>/<CODE>BZ2_bzWriteOpen</CODE>
903 should be set to binary mode.
904 Most Unix systems will do this by default, but other platforms,
905 including Windows and Mac, will not. If you omit this, you may
906 encounter problems when moving code to new platforms.
907 <LI>Memory allocation requests are handled by
909 <CODE>malloc</CODE>/<CODE>free</CODE>.
911 there is no facility for user-defined memory allocators in the file I/O
912 functions (could easily be added, though).
917 <H3><A NAME="SEC26" HREF="manual_toc.html#TOC26"><CODE>BZ2_bzReadOpen</CODE></A></H3>
922 BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
923 int small, int verbosity,
924 void *unused, int nUnused );
928 Prepare to read compressed data from file handle <CODE>f</CODE>. <CODE>f</CODE>
929 should refer to a file which has been opened for reading, and for which
930 the error indicator (<CODE>ferror(f)</CODE>)is not set. If <CODE>small</CODE> is 1,
931 the library will try to decompress using less memory, at the expense of
936 For reasons explained below, <CODE>BZ2_bzRead</CODE> will decompress the
937 <CODE>nUnused</CODE> bytes starting at <CODE>unused</CODE>, before starting to read
938 from the file <CODE>f</CODE>. At most <CODE>BZ_MAX_UNUSED</CODE> bytes may be
939 supplied like this. If this facility is not required, you should pass
940 <CODE>NULL</CODE> and <CODE>0</CODE> for <CODE>unused</CODE> and n<CODE>Unused</CODE>
945 For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>,
946 see <CODE>BZ2_bzDecompressInit</CODE>.
950 The amount of memory needed to decompress a file cannot be determined
951 until the file's header has been read. So it is possible that
952 <CODE>BZ2_bzReadOpen</CODE> returns <CODE>BZ_OK</CODE> but a subsequent call of
953 <CODE>BZ2_bzRead</CODE> will return <CODE>BZ_MEM_ERROR</CODE>.
957 Possible assignments to <CODE>bzerror</CODE>:
960 <CODE>BZ_CONFIG_ERROR</CODE>
961 if the library has been mis-compiled
962 <CODE>BZ_PARAM_ERROR</CODE>
963 if <CODE>f</CODE> is <CODE>NULL</CODE>
964 or <CODE>small</CODE> is neither <CODE>0</CODE> nor <CODE>1</CODE>
965 or <CODE>(unused == NULL && nUnused != 0)</CODE>
966 or <CODE>(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))</CODE>
967 <CODE>BZ_IO_ERROR</CODE>
968 if <CODE>ferror(f)</CODE> is nonzero
969 <CODE>BZ_MEM_ERROR</CODE>
970 if insufficient memory is available
976 Possible return values:
979 Pointer to an abstract <CODE>BZFILE</CODE>
980 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
986 Allowable next actions:
989 <CODE>BZ2_bzRead</CODE>
990 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
991 <CODE>BZ2_bzClose</CODE>
997 <H3><A NAME="SEC27" HREF="manual_toc.html#TOC27"><CODE>BZ2_bzRead</CODE></A></H3>
1000 int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
1004 Reads up to <CODE>len</CODE> (uncompressed) bytes from the compressed file
1006 the buffer <CODE>buf</CODE>. If the read was successful,
1007 <CODE>bzerror</CODE> is set to <CODE>BZ_OK</CODE>
1008 and the number of bytes read is returned. If the logical end-of-stream
1009 was detected, <CODE>bzerror</CODE> will be set to <CODE>BZ_STREAM_END</CODE>,
1011 of bytes read is returned. All other <CODE>bzerror</CODE> values denote an error.
1015 <CODE>BZ2_bzRead</CODE> will supply <CODE>len</CODE> bytes,
1016 unless the logical stream end is detected
1017 or an error occurs. Because of this, it is possible to detect the
1018 stream end by observing when the number of bytes returned is
1019 less than the number
1020 requested. Nevertheless, this is regarded as inadvisable; you should
1021 instead check <CODE>bzerror</CODE> after every call and watch out for
1022 <CODE>BZ_STREAM_END</CODE>.
1026 Internally, <CODE>BZ2_bzRead</CODE> copies data from the compressed file in chunks
1027 of size <CODE>BZ_MAX_UNUSED</CODE> bytes
1028 before decompressing it. If the file contains more bytes than strictly
1029 needed to reach the logical end-of-stream, <CODE>BZ2_bzRead</CODE> will almost certainly
1030 read some of the trailing data before signalling <CODE>BZ_SEQUENCE_END</CODE>.
1031 To collect the read but unused data once <CODE>BZ_SEQUENCE_END</CODE> has
1032 appeared, call <CODE>BZ2_bzReadGetUnused</CODE> immediately before <CODE>BZ2_bzReadClose</CODE>.
1036 Possible assignments to <CODE>bzerror</CODE>:
1039 <CODE>BZ_PARAM_ERROR</CODE>
1040 if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE>
1041 <CODE>BZ_SEQUENCE_ERROR</CODE>
1042 if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE>
1043 <CODE>BZ_IO_ERROR</CODE>
1044 if there is an error reading from the compressed file
1045 <CODE>BZ_UNEXPECTED_EOF</CODE>
1046 if the compressed file ended before the logical end-of-stream was detected
1047 <CODE>BZ_DATA_ERROR</CODE>
1048 if a data integrity error was detected in the compressed stream
1049 <CODE>BZ_DATA_ERROR_MAGIC</CODE>
1050 if the stream does not begin with the requisite header bytes (ie, is not
1051 a <CODE>bzip2</CODE> data file). This is really a special case of <CODE>BZ_DATA_ERROR</CODE>.
1052 <CODE>BZ_MEM_ERROR</CODE>
1053 if insufficient memory was available
1054 <CODE>BZ_STREAM_END</CODE>
1055 if the logical end of stream was detected.
1061 Possible return values:
1064 number of bytes read
1065 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE> or <CODE>BZ_STREAM_END</CODE>
1071 Allowable next actions:
1074 collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzRead</CODE> or <CODE>BZ2_bzReadClose</CODE>
1075 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
1076 collect data from <CODE>buf</CODE>, then <CODE>BZ2_bzReadClose</CODE> or <CODE>BZ2_bzReadGetUnused</CODE>
1077 if <CODE>bzerror</CODE> is <CODE>BZ_SEQUENCE_END</CODE>
1078 <CODE>BZ2_bzReadClose</CODE>
1084 <H3><A NAME="SEC28" HREF="manual_toc.html#TOC28"><CODE>BZ2_bzReadGetUnused</CODE></A></H3>
1087 void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
1088 void** unused, int* nUnused );
1092 Returns data which was read from the compressed file but was not needed
1093 to get to the logical end-of-stream. <CODE>*unused</CODE> is set to the address
1094 of the data, and <CODE>*nUnused</CODE> to the number of bytes. <CODE>*nUnused</CODE> will
1095 be set to a value between <CODE>0</CODE> and <CODE>BZ_MAX_UNUSED</CODE> inclusive.
1099 This function may only be called once <CODE>BZ2_bzRead</CODE> has signalled
1100 <CODE>BZ_STREAM_END</CODE> but before <CODE>BZ2_bzReadClose</CODE>.
1104 Possible assignments to <CODE>bzerror</CODE>:
1107 <CODE>BZ_PARAM_ERROR</CODE>
1108 if <CODE>b</CODE> is <CODE>NULL</CODE>
1109 or <CODE>unused</CODE> is <CODE>NULL</CODE> or <CODE>nUnused</CODE> is <CODE>NULL</CODE>
1110 <CODE>BZ_SEQUENCE_ERROR</CODE>
1111 if <CODE>BZ_STREAM_END</CODE> has not been signalled
1112 or if <CODE>b</CODE> was opened with <CODE>BZ2_bzWriteOpen</CODE>
1118 Allowable next actions:
1121 <CODE>BZ2_bzReadClose</CODE>
1126 <H3><A NAME="SEC29" HREF="manual_toc.html#TOC29"><CODE>BZ2_bzReadClose</CODE></A></H3>
1129 void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
1133 Releases all memory pertaining to the compressed file <CODE>b</CODE>.
1134 <CODE>BZ2_bzReadClose</CODE> does not call <CODE>fclose</CODE> on the underlying file
1135 handle, so you should do that yourself if appropriate.
1136 <CODE>BZ2_bzReadClose</CODE> should be called to clean up after all error
1141 Possible assignments to <CODE>bzerror</CODE>:
1144 <CODE>BZ_SEQUENCE_ERROR</CODE>
1145 if <CODE>b</CODE> was opened with <CODE>BZ2_bzOpenWrite</CODE>
1151 Allowable next actions:
1159 <H3><A NAME="SEC30" HREF="manual_toc.html#TOC30"><CODE>BZ2_bzWriteOpen</CODE></A></H3>
1162 BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
1163 int blockSize100k, int verbosity,
1168 Prepare to write compressed data to file handle <CODE>f</CODE>.
1169 <CODE>f</CODE> should refer to
1170 a file which has been opened for writing, and for which the error
1171 indicator (<CODE>ferror(f)</CODE>)is not set.
1175 For the meaning of parameters <CODE>blockSize100k</CODE>,
1176 <CODE>verbosity</CODE> and <CODE>workFactor</CODE>, see
1177 <BR> <CODE>BZ2_bzCompressInit</CODE>.
1181 All required memory is allocated at this stage, so if the call
1182 completes successfully, <CODE>BZ_MEM_ERROR</CODE> cannot be signalled by a
1183 subsequent call to <CODE>BZ2_bzWrite</CODE>.
1187 Possible assignments to <CODE>bzerror</CODE>:
1190 <CODE>BZ_CONFIG_ERROR</CODE>
1191 if the library has been mis-compiled
1192 <CODE>BZ_PARAM_ERROR</CODE>
1193 if <CODE>f</CODE> is <CODE>NULL</CODE>
1194 or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE>
1195 <CODE>BZ_IO_ERROR</CODE>
1196 if <CODE>ferror(f)</CODE> is nonzero
1197 <CODE>BZ_MEM_ERROR</CODE>
1198 if insufficient memory is available
1204 Possible return values:
1207 Pointer to an abstract <CODE>BZFILE</CODE>
1208 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
1214 Allowable next actions:
1217 <CODE>BZ2_bzWrite</CODE>
1218 if <CODE>bzerror</CODE> is <CODE>BZ_OK</CODE>
1219 (you could go directly to <CODE>BZ2_bzWriteClose</CODE>, but this would be pretty pointless)
1220 <CODE>BZ2_bzWriteClose</CODE>
1226 <H3><A NAME="SEC31" HREF="manual_toc.html#TOC31"><CODE>BZ2_bzWrite</CODE></A></H3>
1229 void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
1233 Absorbs <CODE>len</CODE> bytes from the buffer <CODE>buf</CODE>, eventually to be
1234 compressed and written to the file.
1238 Possible assignments to <CODE>bzerror</CODE>:
1241 <CODE>BZ_PARAM_ERROR</CODE>
1242 if <CODE>b</CODE> is <CODE>NULL</CODE> or <CODE>buf</CODE> is <CODE>NULL</CODE> or <CODE>len < 0</CODE>
1243 <CODE>BZ_SEQUENCE_ERROR</CODE>
1244 if b was opened with <CODE>BZ2_bzReadOpen</CODE>
1245 <CODE>BZ_IO_ERROR</CODE>
1246 if there is an error writing the compressed file.
1253 <H3><A NAME="SEC32" HREF="manual_toc.html#TOC32"><CODE>BZ2_bzWriteClose</CODE></A></H3>
1256 void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
1258 unsigned int* nbytes_in,
1259 unsigned int* nbytes_out );
1261 void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
1263 unsigned int* nbytes_in_lo32,
1264 unsigned int* nbytes_in_hi32,
1265 unsigned int* nbytes_out_lo32,
1266 unsigned int* nbytes_out_hi32 );
1270 Compresses and flushes to the compressed file all data so far supplied
1271 by <CODE>BZ2_bzWrite</CODE>. The logical end-of-stream markers are also written, so
1272 subsequent calls to <CODE>BZ2_bzWrite</CODE> are illegal. All memory associated
1273 with the compressed file <CODE>b</CODE> is released.
1274 <CODE>fflush</CODE> is called on the
1275 compressed file, but it is not <CODE>fclose</CODE>'d.
1279 If <CODE>BZ2_bzWriteClose</CODE> is called to clean up after an error, the only
1280 action is to release the memory. The library records the error codes
1281 issued by previous calls, so this situation will be detected
1282 automatically. There is no attempt to complete the compression
1283 operation, nor to <CODE>fflush</CODE> the compressed file. You can force this
1284 behaviour to happen even in the case of no error, by passing a nonzero
1285 value to <CODE>abandon</CODE>.
1289 If <CODE>nbytes_in</CODE> is non-null, <CODE>*nbytes_in</CODE> will be set to be the
1290 total volume of uncompressed data handled. Similarly, <CODE>nbytes_out</CODE>
1291 will be set to the total volume of compressed data written. For
1292 compatibility with older versions of the library, <CODE>BZ2_bzWriteClose</CODE>
1293 only yields the lower 32 bits of these counts. Use
1294 <CODE>BZ2_bzWriteClose64</CODE> if you want the full 64 bit counts. These
1295 two functions are otherwise absolutely identical.
1300 Possible assignments to <CODE>bzerror</CODE>:
1303 <CODE>BZ_SEQUENCE_ERROR</CODE>
1304 if <CODE>b</CODE> was opened with <CODE>BZ2_bzReadOpen</CODE>
1305 <CODE>BZ_IO_ERROR</CODE>
1306 if there is an error writing the compressed file
1313 <H3><A NAME="SEC33" HREF="manual_toc.html#TOC33">Handling embedded compressed data streams</A></H3>
1316 The high-level library facilitates use of
1317 <CODE>bzip2</CODE> data streams which form some part of a surrounding, larger
1321 <LI>For writing, the library takes an open file handle, writes
1323 compressed data to it, <CODE>fflush</CODE>es it but does not <CODE>fclose</CODE> it.
1324 The calling application can write its own data before and after the
1325 compressed data stream, using that same file handle.
1326 <LI>Reading is more complex, and the facilities are not as general
1328 as they could be since generality is hard to reconcile with efficiency.
1329 <CODE>BZ2_bzRead</CODE> reads from the compressed file in blocks of size
1330 <CODE>BZ_MAX_UNUSED</CODE> bytes, and in doing so probably will overshoot
1331 the logical end of compressed stream.
1332 To recover this data once decompression has
1333 ended, call <CODE>BZ2_bzReadGetUnused</CODE> after the last call of <CODE>BZ2_bzRead</CODE>
1334 (the one returning <CODE>BZ_STREAM_END</CODE>) but before calling
1335 <CODE>BZ2_bzReadClose</CODE>.
1339 This mechanism makes it easy to decompress multiple <CODE>bzip2</CODE>
1340 streams placed end-to-end. As the end of one stream, when <CODE>BZ2_bzRead</CODE>
1341 returns <CODE>BZ_STREAM_END</CODE>, call <CODE>BZ2_bzReadGetUnused</CODE> to collect the
1342 unused data (copy it into your own buffer somewhere).
1343 That data forms the start of the next compressed stream.
1344 To start uncompressing that next stream, call <CODE>BZ2_bzReadOpen</CODE> again,
1345 feeding in the unused data via the <CODE>unused</CODE>/<CODE>nUnused</CODE>
1347 Keep doing this until <CODE>BZ_STREAM_END</CODE> return coincides with the
1348 physical end of file (<CODE>feof(f)</CODE>). In this situation
1349 <CODE>BZ2_bzReadGetUnused</CODE>
1350 will of course return no data.
1354 This should give some feel for how the high-level interface can be used.
1355 If you require extra flexibility, you'll have to bite the bullet and get
1356 to grips with the low-level interface.
1361 <H3><A NAME="SEC34" HREF="manual_toc.html#TOC34">Standard file-reading/writing code</A></H3>
1363 Here's how you'd write data to a compressed file:
1369 char buf[ /* whatever size you like */ ];
1373 f = fopen ( "myfile.bz2", "w" );
1377 b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
1378 if (bzerror != BZ_OK) {
1379 BZ2_bzWriteClose ( b );
1383 while ( /* condition */ ) {
1384 /* get data to write into buf, and set nBuf appropriately */
1385 nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
1386 if (bzerror == BZ_IO_ERROR) {
1387 BZ2_bzWriteClose ( &bzerror, b );
1392 BZ2_bzWriteClose ( &bzerror, b );
1393 if (bzerror == BZ_IO_ERROR) {
1399 And to read from a compressed file:
1405 char buf[ /* whatever size you like */ ];
1409 f = fopen ( "myfile.bz2", "r" );
1413 b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
1414 if (bzerror != BZ_OK) {
1415 BZ2_bzReadClose ( &bzerror, b );
1420 while (bzerror == BZ_OK && /* arbitrary other conditions */) {
1421 nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
1422 if (bzerror == BZ_OK) {
1423 /* do something with buf[0 .. nBuf-1] */
1426 if (bzerror != BZ_STREAM_END) {
1427 BZ2_bzReadClose ( &bzerror, b );
1430 BZ2_bzReadClose ( &bzerror );
1436 <H2><A NAME="SEC35" HREF="manual_toc.html#TOC35">Utility functions</A></H2>
1439 <H3><A NAME="SEC36" HREF="manual_toc.html#TOC36"><CODE>BZ2_bzBuffToBuffCompress</CODE></A></H3>
1442 int BZ2_bzBuffToBuffCompress( char* dest,
1443 unsigned int* destLen,
1445 unsigned int sourceLen,
1452 Attempts to compress the data in <CODE>source[0 .. sourceLen-1]</CODE>
1453 into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>.
1454 If the destination buffer is big enough, <CODE>*destLen</CODE> is
1455 set to the size of the compressed data, and <CODE>BZ_OK</CODE> is
1456 returned. If the compressed data won't fit, <CODE>*destLen</CODE>
1457 is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned.
1461 Compression in this manner is a one-shot event, done with a single call
1462 to this function. The resulting compressed data is a complete
1463 <CODE>bzip2</CODE> format data stream. There is no mechanism for making
1464 additional calls to provide extra input data. If you want that kind of
1465 mechanism, use the low-level interface.
1469 For the meaning of parameters <CODE>blockSize100k</CODE>, <CODE>verbosity</CODE>
1470 and <CODE>workFactor</CODE>, <BR> see <CODE>BZ2_bzCompressInit</CODE>.
1474 To guarantee that the compressed data will fit in its buffer, allocate
1475 an output buffer of size 1% larger than the uncompressed data, plus
1476 six hundred extra bytes.
1480 <CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or
1481 beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow.
1485 Possible return values:
1488 <CODE>BZ_CONFIG_ERROR</CODE>
1489 if the library has been mis-compiled
1490 <CODE>BZ_PARAM_ERROR</CODE>
1491 if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE>
1492 or <CODE>blockSize100k < 1</CODE> or <CODE>blockSize100k > 9</CODE>
1493 or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE>
1494 or <CODE>workFactor < 0</CODE> or <CODE>workFactor > 250</CODE>
1495 <CODE>BZ_MEM_ERROR</CODE>
1496 if insufficient memory is available
1497 <CODE>BZ_OUTBUFF_FULL</CODE>
1498 if the size of the compressed data exceeds <CODE>*destLen</CODE>
1505 <H3><A NAME="SEC37" HREF="manual_toc.html#TOC37"><CODE>BZ2_bzBuffToBuffDecompress</CODE></A></H3>
1508 int BZ2_bzBuffToBuffDecompress ( char* dest,
1509 unsigned int* destLen,
1511 unsigned int sourceLen,
1517 Attempts to decompress the data in <CODE>source[0 .. sourceLen-1]</CODE>
1518 into the destination buffer, <CODE>dest[0 .. *destLen-1]</CODE>.
1519 If the destination buffer is big enough, <CODE>*destLen</CODE> is
1520 set to the size of the uncompressed data, and <CODE>BZ_OK</CODE> is
1521 returned. If the compressed data won't fit, <CODE>*destLen</CODE>
1522 is unchanged, and <CODE>BZ_OUTBUFF_FULL</CODE> is returned.
1526 <CODE>source</CODE> is assumed to hold a complete <CODE>bzip2</CODE> format
1527 data stream. <BR> <CODE>BZ2_bzBuffToBuffDecompress</CODE> tries to decompress
1528 the entirety of the stream into the output buffer.
1532 For the meaning of parameters <CODE>small</CODE> and <CODE>verbosity</CODE>,
1533 see <CODE>BZ2_bzDecompressInit</CODE>.
1537 Because the compression ratio of the compressed data cannot be known in
1538 advance, there is no easy way to guarantee that the output buffer will
1539 be big enough. You may of course make arrangements in your code to
1540 record the size of the uncompressed data, but such a mechanism is beyond
1541 the scope of this library.
1545 <CODE>BZ2_bzBuffToBuffDecompress</CODE> will not write data at or
1546 beyond <CODE>dest[*destLen]</CODE>, even in case of buffer overflow.
1550 Possible return values:
1553 <CODE>BZ_CONFIG_ERROR</CODE>
1554 if the library has been mis-compiled
1555 <CODE>BZ_PARAM_ERROR</CODE>
1556 if <CODE>dest</CODE> is <CODE>NULL</CODE> or <CODE>destLen</CODE> is <CODE>NULL</CODE>
1557 or <CODE>small != 0 && small != 1</CODE>
1558 or <CODE>verbosity < 0</CODE> or <CODE>verbosity > 4</CODE>
1559 <CODE>BZ_MEM_ERROR</CODE>
1560 if insufficient memory is available
1561 <CODE>BZ_OUTBUFF_FULL</CODE>
1562 if the size of the compressed data exceeds <CODE>*destLen</CODE>
1563 <CODE>BZ_DATA_ERROR</CODE>
1564 if a data integrity error was detected in the compressed data
1565 <CODE>BZ_DATA_ERROR_MAGIC</CODE>
1566 if the compressed data doesn't begin with the right magic bytes
1567 <CODE>BZ_UNEXPECTED_EOF</CODE>
1568 if the compressed data ends unexpectedly
1575 <H2><A NAME="SEC38" HREF="manual_toc.html#TOC38"><CODE>zlib</CODE> compatibility functions</A></H2>
1577 Yoshioka Tsuneo has contributed some functions to
1578 give better <CODE>zlib</CODE> compatibility. These functions are
1579 <CODE>BZ2_bzopen</CODE>, <CODE>BZ2_bzread</CODE>, <CODE>BZ2_bzwrite</CODE>, <CODE>BZ2_bzflush</CODE>,
1580 <CODE>BZ2_bzclose</CODE>,
1581 <CODE>BZ2_bzerror</CODE> and <CODE>BZ2_bzlibVersion</CODE>.
1582 These functions are not (yet) officially part of
1583 the library. If they break, you get to keep all the pieces.
1584 Nevertheless, I think they work ok.
1587 typedef void BZFILE;
1589 const char * BZ2_bzlibVersion ( void );
1593 Returns a string indicating the library version.
1596 BZFILE * BZ2_bzopen ( const char *path, const char *mode );
1597 BZFILE * BZ2_bzdopen ( int fd, const char *mode );
1601 Opens a <CODE>.bz2</CODE> file for reading or writing, using either its name
1602 or a pre-existing file descriptor.
1603 Analogous to <CODE>fopen</CODE> and <CODE>fdopen</CODE>.
1606 int BZ2_bzread ( BZFILE* b, void* buf, int len );
1607 int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
1611 Reads/writes data from/to a previously opened <CODE>BZFILE</CODE>.
1612 Analogous to <CODE>fread</CODE> and <CODE>fwrite</CODE>.
1615 int BZ2_bzflush ( BZFILE* b );
1616 void BZ2_bzclose ( BZFILE* b );
1620 Flushes/closes a <CODE>BZFILE</CODE>. <CODE>BZ2_bzflush</CODE> doesn't actually do
1621 anything. Analogous to <CODE>fflush</CODE> and <CODE>fclose</CODE>.
1626 const char * BZ2_bzerror ( BZFILE *b, int *errnum )
1630 Returns a string describing the more recent error status of
1631 <CODE>b</CODE>, and also sets <CODE>*errnum</CODE> to its numerical value.
1637 <H2><A NAME="SEC39" HREF="manual_toc.html#TOC39">Using the library in a <CODE>stdio</CODE>-free environment</A></H2>
1641 <H3><A NAME="SEC40" HREF="manual_toc.html#TOC40">Getting rid of <CODE>stdio</CODE></A></H3>
1644 In a deeply embedded application, you might want to use just
1645 the memory-to-memory functions. You can do this conveniently
1646 by compiling the library with preprocessor symbol <CODE>BZ_NO_STDIO</CODE>
1647 defined. Doing this gives you a library containing only the following
1652 <CODE>BZ2_bzCompressInit</CODE>, <CODE>BZ2_bzCompress</CODE>, <CODE>BZ2_bzCompressEnd</CODE> <BR>
1653 <CODE>BZ2_bzDecompressInit</CODE>, <CODE>BZ2_bzDecompress</CODE>, <CODE>BZ2_bzDecompressEnd</CODE> <BR>
1654 <CODE>BZ2_bzBuffToBuffCompress</CODE>, <CODE>BZ2_bzBuffToBuffDecompress</CODE>
1658 When compiled like this, all functions will ignore <CODE>verbosity</CODE>
1664 <H3><A NAME="SEC41" HREF="manual_toc.html#TOC41">Critical error handling</A></H3>
1666 <CODE>libbzip2</CODE> contains a number of internal assertion checks which
1667 should, needless to say, never be activated. Nevertheless, if an
1668 assertion should fail, behaviour depends on whether or not the library
1669 was compiled with <CODE>BZ_NO_STDIO</CODE> set.
1673 For a normal compile, an assertion failure yields the message
1676 bzip2/libbzip2: internal error number N.
1677 This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
1678 Please report it to me at: jseward@acm.org. If this happened
1679 when you were using some program which uses libbzip2 as a
1680 component, you should also report this bug to the author(s)
1681 of that program. Please make an effort to report this bug;
1682 timely and accurate bug reports eventually lead to higher
1683 quality software. Thanks. Julian Seward, 21 March 2000.
1687 where <CODE>N</CODE> is some error code number. <CODE>exit(3)</CODE>
1692 For a <CODE>stdio</CODE>-free library, assertion failures result
1693 in a call to a function declared as:
1696 extern void bz_internal_error ( int errcode );
1700 The relevant code is passed as a parameter. You should supply
1705 In either case, once an assertion failure has occurred, any
1706 <CODE>bz_stream</CODE> records involved can be regarded as invalid.
1707 You should not attempt to resume normal operation with them.
1711 You may, of course, change critical error handling to suit
1712 your needs. As I said above, critical errors indicate bugs
1713 in the library and should not occur. All "normal" error
1714 situations are indicated via error return codes from functions,
1715 and can be recovered from.
1721 <H2><A NAME="SEC42" HREF="manual_toc.html#TOC42">Making a Windows DLL</A></H2>
1723 Everything related to Windows has been contributed by Yoshioka Tsuneo
1724 <BR> (<CODE>QWF00133@niftyserve.or.jp</CODE> /
1725 <CODE>tsuneo-y@is.aist-nara.ac.jp</CODE>), so you should send your queries to
1726 him (but perhaps Cc: me, <CODE>jseward@acm.org</CODE>).
1730 My vague understanding of what to do is: using Visual C++ 5.0,
1731 open the project file <CODE>libbz2.dsp</CODE>, and build. That's all.
1736 open the project file for some reason, make a new one, naming these files:
1737 <CODE>blocksort.c</CODE>, <CODE>bzlib.c</CODE>, <CODE>compress.c</CODE>,
1738 <CODE>crctable.c</CODE>, <CODE>decompress.c</CODE>, <CODE>huffman.c</CODE>, <BR>
1739 <CODE>randtable.c</CODE> and <CODE>libbz2.def</CODE>. You will also need
1740 to name the header files <CODE>bzlib.h</CODE> and <CODE>bzlib_private.h</CODE>.
1744 If you don't use VC++, you may need to define the proprocessor symbol
1745 <CODE>_WIN32</CODE>.
1749 Finally, <CODE>dlltest.c</CODE> is a sample program using the DLL. It has a
1750 project file, <CODE>dlltest.dsp</CODE>.
1754 If you just want a makefile for Visual C, have a look at
1755 <CODE>makefile.msc</CODE>.
1759 Be aware that if you compile <CODE>bzip2</CODE> itself on Win32, you must set
1760 <CODE>BZ_UNIX</CODE> to 0 and <CODE>BZ_LCCWIN32</CODE> to 1, in the file
1761 <CODE>bzip2.c</CODE>, before compiling. Otherwise the resulting binary won't
1766 I haven't tried any of this stuff myself, but it all looks plausible.
1771 <p>Go to the <A HREF="manual_1.html">first</A>, <A HREF="manual_2.html">previous</A>, <A HREF="manual_4.html">next</A>, <A HREF="manual_4.html">last</A> section, <A HREF="manual_toc.html">table of contents</A>.