libarchive_internals(3) manual page

The decompression layer not only handles decompression, it also buffers data so that the format handlers see a much nicer I/O model. The decompression API is a two stage peek/consume model. A read_ahead request specifies a minimum read amount; the decompression layer must provide a pointer to at least that much data. If more data is immediately available, it should return more: the format layer handles bulk data reads by asking for a minimum of one byte and then copying as much data as is available.

A subsequent call to the consume() function advances the read pointer. Note that data returned from a read_ahead() call is guaranteed to remain in place until the next call to read_ahead(). Intervening calls to consume() should not cause the data to move.

Skip requests must always be handled exactly. Decompression handlers that cannot seek forward should not register a skip handler; the API layer fills in a generic skip handler that reads and discards data.

A decompression handler has a specific lifecycle:

Registration/Configuration When the client invokes the public support function, the decompression handler invokes the internal __archive_read_register_compression() function to provide bid and initialization functions. This function returns NULL on error or else a pointer to a struct decompressor_t. This structure contains a void * config slot that can be used for storing any customization information.

Bid The bid function is invoked with a pointer and size of a block of data. The decompressor can access its config data through the decompressor element of the archive_read object. The bid function is otherwise stateless. In particular, it must not perform any I/O operations.
The value returned by the bid function indicates its suitability for handling this data stream. A bid of zero will ensure that this decompressor is never invoked. Return zero if magic number checks fail. Otherwise, your initial implementation should return the number of bits actually checked. For example, if you verify two full bytes and three bits of another byte, bid 19. Note that the initial block may be very short; be careful to only inspect the data you are given. (The current decompressors require two bytes for correct bidding.)

Initialize The winning bidder will have its init function called. This function should initialize the remaining slots of the struct decompressor_t object pointed to by the decompressor element of the archive_read object. In particular, it should allocate any working data it needs in the data slot of that structure. The init function is called with the block of data that was used for tasting. At this point, the decompressor is responsible for all I/O requests to the client callbacks. The decompressor is free to read more data as and when necessary.

Satisfy I/O requests The format handler will invoke the read_ahead, consume, and skip functions as needed.

Finish The finish method is called only once when the archive is closed. It should release anything stored in the data and config slots of the decompressor object. It should not invoke the client close callback.

Registration	Allocate your private data and initialize your pointers.
Bid	Formats bid by invoking the `read_ahead()` decompression method but not calling the `consume()` method. This allows each bidder to look ahead in the input stream. Bidders should not look further ahead than necessary, as long look aheads put pressure on the decompression layer to buffer lots of data. Most formats only require a few hundred bytes of look ahead; look aheads of a few kilobytes are reasonable. (The ISO9660 reader sometimes looks ahead by 48k, which should be considered an upper limit.)
Read header	The header read is usually the most complex part of any format. There are a few strategies worth mentioning: For formats such as tar or cpio, reading and parsing the header is straightforward since headers alternate with data. For formats that store all header data at the beginning of the file, the first header read request may have to read all headers into memory and store that data, sorted by the location of the file data. Subsequent header read requests will skip forward to the beginning of the file data and return the corresponding header.
Read Data	The read data interface supports sparse files; this requires that each call return a block of data specifying the file offset and size. This may require you to carefully track the location so that you can return accurate file offsets for each read. Remember that the decompressor will return as much data as it has. Generally, you will want to request one byte, examine the return value to see how much data is available, and possibly trim that to the amount you can use. You should invoke consume for each block just before you return it.
Skip All Data	The skip data call should skip over all file data and trailing padding. This is called automatically by the API layer just before each header read. It is also called in response to the client calling the public `data_skip()` function.
Cleanup	On cleanup, the format should release all of its allocated memory.

Manual Pages — LIBARCHIVE_INTERNALS

NAME

CONTENTS

OVERVIEW

GENERAL ARCHITECTURE

READ ARCHITECTURE

I/O Layer and Client Callbacks

Decompresssion Layer

Format Layer

API Layer

WRITE ARCHITECTURE

I/O Layer and Client Callbacks

Compression Layer

Format Layer

API Layer

WRITE_DISK ARCHITECTURE

GENERAL SERVICES

MISCELLANEOUS NOTES

SEE ALSO

HISTORY

AUTHORS

Registration/Configuration	When the client invokes the public support function, the decompression handler invokes the internal `__archive_read_register_compression()` function to provide bid and initialization functions. This function returns NULL on error or else a pointer to a struct decompressor_t. This structure contains a void * config slot that can be used for storing any customization information.
Bid	The bid function is invoked with a pointer and size of a block of data. The decompressor can access its config data through the decompressor element of the archive_read object. The bid function is otherwise stateless. In particular, it must not perform any I/O operations. The value returned by the bid function indicates its suitability for handling this data stream. A bid of zero will ensure that this decompressor is never invoked. Return zero if magic number checks fail. Otherwise, your initial implementation should return the number of bits actually checked. For example, if you verify two full bytes and three bits of another byte, bid 19. Note that the initial block may be very short; be careful to only inspect the data you are given. (The current decompressors require two bytes for correct bidding.)
Initialize	The winning bidder will have its init function called. This function should initialize the remaining slots of the struct decompressor_t object pointed to by the decompressor element of the archive_read object. In particular, it should allocate any working data it needs in the data slot of that structure. The init function is called with the block of data that was used for tasting. At this point, the decompressor is responsible for all I/O requests to the client callbacks. The decompressor is free to read more data as and when necessary.
Satisfy I/O requests	The format handler will invoke the read_ahead, consume, and skip functions as needed.
Finish	The finish method is called only once when the archive is closed. It should release anything stored in the data and config slots of the decompressor object. It should not invoke the client close callback.