The mmap module
(New in 2.0) This module provides an interface to the operating system’s memory mapping functions. The mapped region behaves pretty much like a string object, but data is read directly from the file.
Example: Using the mmap module
# File: mmap-example-1.py import mmap import os filename = "samples/sample.txt" file = open(filename, "r+") size = os.path.getsize(filename) data = mmap.mmap(file.fileno(), size) # basics print data print len(data), size # use slicing to read from the file print repr(data[:10]), repr(data[:10]) # or use the standard file interface print repr(data.read(10)), repr(data.read(10))
$ python mmap-example-1.py <mmap object at 008A2A10> 302 302 'We will pe' 'We will pe' 'We will pe' 'rhaps even'
Under Windows, the file must currently be opened for both reading and writing (r+, or w+), or the mmap call will fail.
Memory mapped regions can be used instead of ordinary strings in many places, including regular expressions and many string operations:
Example: Using string
functions and regular expressions on a mapped region
# File: mmap-example-2.py import mmap import os, string, re def mapfile(filename): file = open(filename, "r+") size = os.path.getsize(filename) return mmap.mmap(file.fileno(), size) data = mapfile("samples/sample.txt") # search index = data.find("small") print index, repr(data[index-5:index+15]) # regular expressions work too! m = re.search("small", data) print m.start(), m.group()
$ python mmap-example-2.py 43 'only small\015\012modules ' 43 small
Comment:
I use shared memory in C for much of my work in Linux. I've wanted access via Python for some time. This module is a good beginning. Most of my data is numeric values in Intel byte order (8-bit, 16-bit. 32-bit). Values are signed and unsigned. A few special cases are signed chars that would best be represented as a printable ASCII char. A very few char strings are used. Those require some knowledge of the field length. Access to those data types using a byte offset from the start of the mapped area would be a fine general purpose addition to this module. Perhaps a flag indicating big endian or little endian would make it more general for users on other platforms.
Posted by Mark Edwards (2007-05-29)