Today I’m going to write about a not that minor inconvenience one faces when using the built-in multiprocessing module – how child process exceptions are presented to the user. I will show you also how to improve it, so in case something goes wrong you don’t have to guess where the problem is.

Standalone multiprocessing

Through this story, we will stick to a very simple calculation shown below. We have our computation code contained in the ‘go’ function and want to apply it to a range of parameters. We decided to make use of facilities provided by the multiprocessing module. Unfortunately, during a long and tiring coding sprint, a bug crept into our code:

from multiprocessing import Pool

def go(x):
    ret = 0.
    for i in xrange(x+1):
        ret += 1./(5-i)
    return ret

def main():
    pool = Pool(processes=4)  
    print pool.map(go, range(10))

if __name__ == "__main__":
    main()

The output we get after running the code above is far from beeing over-verbose:

Traceback (most recent call last):
  File "go_1.py", line 14, in module
    main()
  File "go_1.py", line 11, in main
    print pool.map(go, range(10))
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
ZeroDivisionError: float division by zero

From such traceback we can find out what was the type of exception and what was the target function of the Pool.map call.  In case of our ‘go’ function guessing where is the problem is fairly simple – the target function is short, with only a single place where this exception may be coming. In real life the target function will be usually far more complicated and may call other functions from external modules. So seeing traceback similar to the one above doesn’t help at all. Is it our code that thrown the exception? numpy? scikit-learn? Happy guessing – lack of information which line in our code caused it makes our life miserable. At this point we have two possibilities – launch a proper python debugger or try to obtain traceback as it would be presented to us if the code would be run in the non-multiprocessing way.

Since traceback is often enough to understand what is the problem, this time we will leave the debugger at rest and try to obtain a more informative printout.

The traceback module

In order to improve our situation we will use the traceback module to, ehm… obtain a traceback. In order to have our solution reusable we will put it into a decorator:

import functools
import traceback
import sys

def get_traceback(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        try:
            return f(*args, **kwargs)
        except Exception, ex:
            ret = '#' * 60
            ret += "\nException caught:"
            ret += "\n"+'-'*60
            ret += "\n" + traceback.format_exc()
            ret += "\n" + '-' * 60
            ret += "\n"+ "#" * 60
            print sys.stderr, ret
            sys.stderr.flush()
            raise ex

    return wrapper

The code above simply prints the traceback in case of problems, i.e. when an exception is thrown and not handled inside wrapped function. After applying it to our function:

@get_traceback
def go(x):
   (...)

The error message starts to be meaningful:

Exception caught:
------------------------------------------------------------
Traceback (most recent call last):
  File "./go_2.py", line 10, in wrapper
    return f(*args, **kwargs)
  File "./go_2.py", line 28, in go
    ret += 1./(5-i)
ZeroDivisionError: float division by zero

(this is actually repeated couple of times since our exception is thrown inside more than one process). In the above output you can exactly see where (which line number) is the problem coming from.

It is worth noting that the usage of functools.wraps helper decorator is crucial in our case – without this the __name__ attribute of the decorated function gets lost (i.e. set to ‘wrapper’) which then makes pickle module fail. The later one is used by the multiprocessing module to serialize function executed inside child processes. You can verify this by getting rid of functools and then setting the __name__ of resulting decorated function manually.

So at this point we are able to get a proper traceback which could be enough. But there is also a different possibility I would like to explore.

The fun way

Some while ago I have discovered a little gem – the joblib package.  In order to get it, you need to run ‘pip install joblib’ inside your virtualenv. Among others, it offers an alternative to the multiprocessing module when doing parallel computation similar to ours. With joblib, we can rewrite our code in the following way:

from joblib import Parallel, delayed
def go(x):
    ret = 0.
    for i in xrange(x+1):
        ret += 1./(5-i)
    return ret

def main():
    print Parallel(n_jobs=4)(delayed(go)(i) for i in range(10))

if __name__ == "__main__":
    main()

The (partial) output we get from running it is the following:

/home/tfruboes/2017.02.threadedGIL/go_3.py in go(x=5)
      1 from joblib import Parallel, delayed
      2 
      3 def go(x):
      4     ret = 0.
      5     for i in xrange(x+1):
      6         ret += 1./(5-i)
        ret = 2.283333333333333
        i = 5
      7     return ret
      8 
      9 def main():
     10     print Parallel(n_jobs=4)(delayed(go)(i) for i in range(10))

ZeroDivisionError: float division by zero

As you can see, we got a code listing with the line causing the exception marked. Below that line, you can also see information on local variables at the point exception was thrown. You may also notice that arguments with which the ‘go’ function was called are also print. So tons of useful information that in lots of cases will allow us to immediately understand the problem. Neat!

Wrap up

We have seen, that in normal conditions the multiprocessing module won’t give us the usual amount of information on an exception beeing thrown inside the child process. This is slightly surprising, as one could expect that (following the “batteries included” philosophy) this should be done in the exactly same way as when no multiprocessing module is used. In order to get this info you should use the traceback module. Or, in some cases, go for joblib. Note, that it offers far more than nice printouts in case of problems.


Modify MS Word documents with python A quick and easy way to view cProfile results

  1. Thank you – I was running into exactly this problem with the multiprocessing module. Your solution is pragmatic and perfect!

Leave a Reply

Your email address will not be published.