My program is too slow. How do I speed it up?
That’s a tough one, in general. There are many tricks to speed up Python code; consider rewriting parts in C as a last resort.
In some cases it’s possible to automatically translate Python to C or x86 assembly language, meaning that you don’t have to modify your code to gain increased speed.
Pyrex can compile a slightly modified version of Python code into a C extension, and can be used on many different platforms.
Psyco is a just-in-time compiler that translates Python code into x86 machine code. If you can use it, Psyco can provide dramatic speedups for critical functions.
The rest of this answer will discuss various tricks for squeezing a bit more speed out of Python code. Never apply any optimization tricks unless you know you need them, after profiling has indicated that a particular function is the heavily executed hot spot in the code. Optimizations almost always make the code less clear, and you shouldn’t pay the costs of reduced clarity (increased development time, greater likelihood of bugs) unless the resulting performance benefit is worth it.
There is a page on the wiki devoted to performance tips.
Guido van Rossum has written up an anecdote related to optimization at
http://www.python.org/doc/essays/list2str.html (dead link).
One thing to notice is that function and (especially) method calls are rather expensive; if you have designed a purely OO interface with lots of tiny functions that don’t do much more than get or set an instance variable or call another method, you might consider using a more direct way such as directly accessing instance variables. Also see the standard module profile which makes it possible to find out where your program is spending most of its time (if you have some patience — the profiling itself can slow your program down by an order of magnitude).
Remember that many standard optimization heuristics you may know from other programming experience may well apply to Python. For example it may be faster to send output to output devices using larger writes rather than smaller ones in order to reduce the overhead of kernel system calls. Thus CGI scripts that write all output in “one shot” may be faster than those that write lots of small pieces of output.
Also, be sure to use Python’s core features where appropriate. For example, slicing allows programs to chop up lists and other sequence objects in a single tick of the interpreter’s mainloop using highly optimized C implementations. Thus to get the same effect as:
L2 =  for i in range(3): L2.append(L1[i])
it is much shorter and far faster to use
L2 = list(L1[:3]) # "list" is redundant if L1 is a list.
Note that the functionally-oriented builtins such as map, zip, and friends can be a convenient accelerator for loops that perform a single task. For example to pair the elements of two lists together:
>>> zip([1,2,3], [4,5,6]) [(1, 4), (2, 5), (3, 6)]
or to compute a number of sines:
>>> map(math.sin, (1,2,3,4)) [0.841470984808, 0.909297426826, 0.14112000806, -0.756802495308]
The operation completes very quickly in such cases.
Other examples include the join and split methods of string objects. For example if s1..s7 are large (10K+) strings then “”.join([s1,s2,s3,s4,s5,s6,s7])` may be a lot faster than the more direct s1+s2+s3+s4+s5+s6+s7, since the “summation” will compute many subexpressions, whereas join does all the copying in one pass. For manipulating strings, use the replace method on string objects. Use regular expressions only when you’re not dealing with constant string patterns. Consider using the string formatting operations string % tuple and string % dictionary.
Be sure to use the sort builtin method or the sorted function to do sorting, and see the sorting mini-HOWTO for examples of moderately advanced usage. list.sort beats other techniques for sorting in all but the most extreme circumstances.
Another common trick is to “push loops into functions or methods.” For example suppose you have a program that runs slowly and you use the profiler to determine that a Python function ff() is being called lots of times. If you notice that ff ():
def ff(x): ...do something with x computing result... return result
tends to be called in loops like:
list = map(ff, oldlist)
for x in sequence: value = ff(x) ...do something with value...
then you can often eliminate function call overhead by rewriting ff() to:
def ffseq(seq): resultseq =  for x in seq: ...do something with x computing result... resultseq.append(result) return resultseq
and rewrite the two examples to list = ffseq(oldlist) and to:
for value in ffseq(sequence): ...do something with value...
Single calls to ff(x) translate to ffseq([x]) with little penalty. Of course this technique is not always appropriate and there are other variants which you can figure out.
You can gain some performance by explicitly storing the results of a function or method lookup into a local variable. A loop like:
for key in token: D[key] = D.get(key, 0) + 1
resolves D.get in every iteration. If the method isn’t going to change, a slightly faster implementation is:
get = D.get # look up the method once for key in token: D[key] = get(key, 0) + 1
Default arguments can be used to determine values once, at function creation time instead of at run time. This can only be done for functions or objects which will not be changed during program execution, such as replacing
def degree_sin(deg): return math.sin(deg * math.pi / 180.0)
def degree_sin(deg, factor=math.pi/180.0, sin=math.sin): return sin(deg * factor)
Because this trick uses default arguments for terms which should not be changed, it should only be used when you are not concerned with presenting a possibly confusing API to your users.