The md5 module
Note: In Python 2.5, this module is a compatibility wrapper for hashlib.
This module is used to calculate message signatures (so-called “message digests”).
The MD5 algorithm calculates a strong 128-bit signature. This means that if two strings are different, it’s highly likely that their MD5 signatures are different as well. Or to put it another way, given an MD5 digest, it’s supposed to be nearly impossible to come up with a string that generates that digest.
Note: Since this was written, MD5 has been broken. It’s now relatively easy to generate files that differ slightly, but have the same MD5 signature, if you can insert random-looking data somewhere in the file (e.g in a comment, or in a part of the file that’s not used for any other purpose). Most published attacks use environments where such differences can be used to control the result in some way (e.g. a PostScript document that contains two texts, and code that selects which one to display, or a self-extracting executable that extracts different files). While applications that use MD5 to sign only the data that’s actually displayed or extracted should be safe for now, use of MD5 in new applications should be avoided.
# File: md5-example-1.py import md5 hash = md5.new() hash.update("spam, spam, and eggs") print repr(hash.digest())
Note that the checksum is returned as a binary string. Getting a hexadecimal or base64-encoded string is quite easy, though:
# File: md5-example-2.py import md5 import string import base64 hash = md5.new() hash.update("spam, spam, and eggs") value = hash.digest() print hash.hexdigest() # in Python 1.5.2 and earlier, use this instead: # print string.join(map(lambda v: "%02x" % ord(v), value), "") print base64.encodestring(value)
Among other things, the MD5 checksum can be used for challenge-response authentication (but see the note on random numbers below):
# File: md5-example-3.py import md5 import string, random def getchallenge(): # generate a 16-byte long random string. (note that the built- # in pseudo-random generator uses a 24-bit seed, so this is not # as good as it may seem...) challenge = map(lambda i: chr(random.randint(0, 255)), range(16)) return string.join(challenge, "") def getresponse(password, challenge): # calculate combined digest for password and challenge m = md5.new() m.update(password) m.update(challenge) return m.digest() # # server/client communication # 1. client connects. server issues challenge. print "client:", "connect" challenge = getchallenge() print "server:", repr(challenge) # 2. client combines password and challenge, and calculates # the response client_response = getresponse("trustno1", challenge) print "client:", repr(client_response) # 3. server does the same, and compares the result with the # client response. the result is a safe login in which the # password is never sent across the communication channel. server_response = getresponse("trustno1", challenge) if server_response == client_response: print "server:", "login ok"
client: connect server: '\334\352\227Z#\272\273\212KG\330\265\032>\311o' client: "l'\305\240-x\245\237\035\225A\254\233\337\225\001" server: login ok
A variation of this can be used to sign messages sent over a public network, so that their integrity can be verified at the receiving end.
# File: md5-example-4.py import md5 import array class HMAC_MD5: # keyed MD5 message authentication def __init__(self, key): if len(key) > 64: key = md5.new(key).digest() ipad = array.array("B", [0x36] * 64) opad = array.array("B", [0x5C] * 64) for i in range(len(key)): ipad[i] = ipad[i] ^ ord(key[i]) opad[i] = opad[i] ^ ord(key[i]) self.ipad = md5.md5(ipad.tostring()) self.opad = md5.md5(opad.tostring()) def digest(self, data): ipad = self.ipad.copy() opad = self.opad.copy() ipad.update(data) opad.update(ipad.digest()) return opad.digest() # # simulate server end key = "this should be a well-kept secret" message = open("samples/sample.txt").read() signature = HMAC_MD5(key).digest(message) # (send message and signature across a public network) # # simulate client end key = "this should be a well-kept secret" client_signature = HMAC_MD5(key).digest(message) if client_signature == signature: print "this is the original message:" print print message else: print "someone has modified the message!!!"
The copy method takes a snapshot of the internal object state. This allows you to precalculate partial digests (such as the padded key, in this example).
For details on this algorithm, see HMAC-MD5: Keyed-MD5 for Message Authentication by Krawczyk et al.
Warning: Don’t forget that the built-in psuedo random number generator isn’t really good enough for encryption purposes. Be careful.