Today I’m going to write about a not that minor inconvenience one faces when using the built-in multiprocessing module – how child process exceptions are presented to the user. I will show you also how to improve it, so in case something goes wrong you don’t have to guess where the problem is.
Through this story, we will stick to a very simple calculation shown below. We have our computation code contained in the ‘go’ function and want to apply it to a range of parameters. We decided to make use of facilities provided by the multiprocessing module. Unfortunately, during a long and tiring coding sprint, a bug crept into our code:
from multiprocessing import Pool def go(x): ret = 0. for i in xrange(x+1): ret += 1./(5-i) return ret def main(): pool = Pool(processes=4) print pool.map(go, range(10)) if __name__ == "__main__": main()
The output we get after running the code above is far from beeing over-verbose:
Traceback (most recent call last): File "go_1.py", line 14, in module main() File "go_1.py", line 11, in main print pool.map(go, range(10)) File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get raise self._value ZeroDivisionError: float division by zero
From such traceback we can find out what was the type of exception and what was the target function of the Pool.map call. In case of our ‘go’ function guessing where is the problem is fairly simple – the target function is short, with only a single place where this exception may be coming. In real life the target function will be usually far more complicated and may call other functions from external modules. So seeing traceback similar to the one above doesn’t help at all. Is it our code that thrown the exception? numpy? scikit-learn? Happy guessing – lack of information which line in our code caused it makes our life miserable. At this point we have two possibilities – launch a proper python debugger or try to obtain traceback as it would be presented to us if the code would be run in the non-multiprocessing way.
Since traceback is often enough to understand what is the problem, this time we will leave the debugger at rest and try to obtain a more informative printout.
The traceback module
In order to improve our situation we will use the traceback module to, ehm… obtain a traceback. In order to have our solution reusable we will put it into a decorator:
import functools import traceback import sys def get_traceback(f): @functools.wraps(f) def wrapper(*args, **kwargs): try: return f(*args, **kwargs) except Exception, ex: ret = '#' * 60 ret += "\nException caught:" ret += "\n"+'-'*60 ret += "\n" + traceback.format_exc() ret += "\n" + '-' * 60 ret += "\n"+ "#" * 60 print sys.stderr, ret sys.stderr.flush() raise ex return wrapper
The code above simply prints the traceback in case of problems, i.e. when an exception is thrown and not handled inside wrapped function. After applying it to our function:
@get_traceback def go(x): (...)
The error message starts to be meaningful:
Exception caught: ------------------------------------------------------------ Traceback (most recent call last): File "./go_2.py", line 10, in wrapper return f(*args, **kwargs) File "./go_2.py", line 28, in go ret += 1./(5-i) ZeroDivisionError: float division by zero
(this is actually repeated couple of times since our exception is thrown inside more than one process). In the above output you can exactly see where (which line number) is the problem coming from.
It is worth noting that the usage of functools.wraps helper decorator is crucial in our case – without this the __name__ attribute of the decorated function gets lost (i.e. set to ‘wrapper’) which then makes pickle module fail. The later one is used by the multiprocessing module to serialize function executed inside child processes. You can verify this by getting rid of functools and then setting the __name__ of resulting decorated function manually.
So at this point we are able to get a proper traceback which could be enough. But there is also a different possibility I would like to explore.
The fun way
Some while ago I have discovered a little gem – the joblib package. In order to get it, you need to run ‘pip install joblib’ inside your virtualenv. Among others, it offers an alternative to the multiprocessing module when doing parallel computation similar to ours. With joblib, we can rewrite our code in the following way:
from joblib import Parallel, delayed def go(x): ret = 0. for i in xrange(x+1): ret += 1./(5-i) return ret def main(): print Parallel(n_jobs=4)(delayed(go)(i) for i in range(10)) if __name__ == "__main__": main()
The (partial) output we get from running it is the following:
/home/tfruboes/2017.02.threadedGIL/go_3.py in go(x=5) 1 from joblib import Parallel, delayed 2 3 def go(x): 4 ret = 0. 5 for i in xrange(x+1): 6 ret += 1./(5-i) ret = 2.283333333333333 i = 5 7 return ret 8 9 def main(): 10 print Parallel(n_jobs=4)(delayed(go)(i) for i in range(10)) ZeroDivisionError: float division by zero
As you can see, we got a code listing with the line causing the exception marked. Below that line, you can also see information on local variables at the point exception was thrown. You may also notice that arguments with which the ‘go’ function was called are also print. So tons of useful information that in lots of cases will allow us to immediately understand the problem. Neat!
We have seen, that in normal conditions the multiprocessing module won’t give us the usual amount of information on an exception beeing thrown inside the child process. This is slightly surprising, as one could expect that (following the “batteries included” philosophy) this should be done in the exactly same way as when no multiprocessing module is used. In order to get this info you should use the traceback module. Or, in some cases, go for joblib. Note, that it offers far more than nice printouts in case of problems.