The bz2 Module
(New in 2.3) The bz2 module provides tools for bzip2 compression, known from the tool with the same name. This compression format is based on the Burrows-Wheeler block sorting algorithm, combined with Huffman coding. The bzip2 algorithm is usually a bit more efficient than the more commonly used zlib/deflate format (usually around 10%).
To compress data in a string, use the compress function. It returns an 8-bit string containing compressed data. To get the original data back, use the decompress function:
# File: bz2-example-1.py import bz2 MESSAGE = "the meaning of life" compressed_message = bz2.compress(MESSAGE) decompressed_message = bz2.decompress(compressed_message) print "original:", repr(MESSAGE) print "compressed message:", repr(compressed_message) print "decompressed message:", repr(decompressed_message)
$ python bz2-example-1.py original: 'the meaning of life' compressed message: 'BZh91AY&SY\xcb\x18\xf4\x9e\x00\x00\t\x11 \x80@\x00#\xe7\x84\x00 \x00"\x8d\x94\xc3!\x03@\xd0\x00\xfb\xf6 U\xa6\xe1p\xb8Z.\xe4\x8ap\xa1!\x961\xe9<' decompressed message: 'the meaning of life'
(Note that for very short strings like this, the compressed byte stream is actually larger than the original string.)
The module also provides BZ2Compressor and BZ2Decompressor classes, which support incremental compression and decompression. In the following example, the string is compressed word by word, and then decompressed by a single call to the decompress function:
# File: bz2-example-2.py import bz2 text = "the meaning of life" data = "" comp = bz2.BZ2Compressor() for word in text.split(): data += comp.compress(word + " ") data += comp.flush() print repr(bz2.decompress(data))
$ python bz2-example-2.py 'the meaning of life '
The module also makes it easy to read and write compressed files. The BZ2File function is similar to open, but automatically compresses data on the way in (or out).
# File: bz2-example-3.py import bz2 file = bz2.BZ2File("samples/sample.bz2", "r") for line in file: print repr(line)
$ python bz2-example-3.py 'We will perhaps eventually be writing only small\n' 'modules which are identified by name as they are\n' 'used to build larger ones, so that devices like\n' ...