MongoDB for Developers (python flavour) – course review

Some while ago I decided to dive into the NoSQL world. This decision was driven partially by curiosity if it could be treated as an alternative to the Elasticsearch, which I have seen being used (along with Kibana) to implement rich analytics on mixed (text+numerical) data. I wanted also to understand when I would want… Continue reading

How I’ve managed to shoot myself in the foot with numpy and cPickle

Currently, I use python mostly for data analysis and modeling. Whenever I can I take a pipeline-like approach, where data is processed in multiple steps. Those are implemented in separate py files with cPickle used for data persistence and exchange. You can think of this as of a poor man’s mapreduce. Usually the development is… Continue reading

Multiprocessing and exceptions – some batteries not included

Today I’m going to write about a not that minor inconvenience one faces when using the built-in multiprocessing module – how child process exceptions are presented to the user. I will show you also how to improve it, so in case something goes wrong you don’t have to guess where the problem is. Standalone multiprocessing Through this… Continue reading

Pickle performance bottlenecks when using multiprocessing

Some while ago I have written a parameter scan (regularization in logistic regression to be specific) that was taking a bit to long to execute. Since the machine on which it was executed was essentially other-user-free (and had some 20 cores laying unused 🙂 ) I decided to go multiprocessing. Usually I pick on such ocassions where boilerplate… Continue reading

Being defensive with pickle in evolving environment

Pickle is an in-house python object persistence solution. Although very useful, care must be taken when using it with class definitions that may change, i.e. are under active development. Consider the following example Both printouts will show you var1 and var2 instance variables and no var3, despite the fact that class logic changed in meantime.… Continue reading