Good news everyone!
This blog post by Dan Crosta is interesting. It talks about how is possible to optimise Python code for operations that get called multiple times avoiding the usage of Object Orientation and using Closures instead.
While the “closures” gets the highlight, the main idea is a little more general. Avoid repeating code that is not necessary for the operation.
The difference between the first proposed code, in OOP way
class PageCategoryFilter(object): def __init__(self, config): self.mode = config["mode"] self.categories = config["categories"] def filter(self, bid_request): if self.mode == "whitelist": return bool( bid_request["categories"] & self.categories ) else: return bool( self.categories and not bid_request["categories"] & self.categories )
and the last one
def make_page_category_filter(config): categories = config["categories"] mode = config["mode"] def page_category_filter(bid_request): if mode == "whitelist": return bool(bid_request["categories"] & categories) else: return bool( categories and not bid_request["categories"] & categories ) return page_category_filter
The main differences are that both the config dictionary and the methods (which are also implemented as a dictionary) are not accessed. We create a direct reference to the value (categories and mode) instead of making the Python interpreter search on the self methods over and over.
This generates a significant increase in performance, as described on the post (around 20%).
But why stop there? There is another clear win in terms of access, assuming that the filter doesn’t change. This is the “mode”, which we are comparing for whitelist of blacklist on each iteration. We can create a different closure depending on the mode value.
def make_page_category_filter2(config): categories = config["categories"] if config['mode'] == "whitelist": def whitelist_filter(bid_request): return bool(bid_request["categories"] & categories) return whitelist_filter else: def blacklist_filter(bid_request): return bool( categories and not bid_request["categories"] & categories ) return blacklist_filter
There are another couple of details. The first one is to transform the config categories into a frozenset. Assuming that the config doesn’t change, a frozenset is more efficient than a regular mutable set. This is insinuated in the post, but maybe didn’t get the final review (or to simplify it).
Also, we are calculating the intersection of a set (operand &) to then reduce it to a bool. There is currently a set operation that gets the result without calculating the whole intersection (isdisjoint).
The same basic principle applies to calculate the bool category for the black filter. We can calculate it only once, as it’s there to short-circuit the result in case of an empty config category.
def make_page_category_filter2(config): categories = frozenset(config["categories"]) bool_cat = bool(categories) if config['mode'] == "whitelist": def whitelist_filter(bid_request): return not categories.isdisjoint(bid_request["categories"]) return whitelist_filter else: def blacklist_filter(bid_request): return (bool_cat and categories.isdisjoint(bid_request["categories"])) return blacklist_filter
Even if all of this enters the definition of micro-optimisations (which should be used with care, and only after a hot spot has been found), it actually makes a significant difference, reducing the time around 35% from the closure implementation and ~50% from the initial reference implementation.
All these elements are totally applicable to the OOP implementation, by the way. Python is quite flexible about assigning methods. No closures!
class PageCategoryFilter2(object): ''' Keep the interface of the object ''' def __init__(self, config): self.mode = config["mode"] self.categories = frozenset(config["categories"]) self.bool_cat = bool(self.categories) if self.mode == "whitelist": self.filter = self.filter_whitelist else: self.filter = self.filter_blacklist def filter_whitelist(self, bid_request): return not bid_request["categories"].isdisjoint(self.categories) def filter_blacklist(self, bid_request): return (self.bool_cat and bid_request["categories"].isdisjoint(self.categories))
Here is the updated code, adding this implementations to the test.
The results in my desktop (2011 iMac 2.7GHz i5) are
total time (sec) time per iteration class 9.59787607193 6.39858404795e-07 func 8.38110518456 5.58740345637e-07 closure 7.96493911743 5.30995941162e-07 class2 6.00997519493 4.00665012995e-07 closur2 5.09431600571 3.39621067047e-07
The new class performs better than the initial closure! The optimised closure is anyway trumping, saving a big chunk compared with the slower implementation. The PyPy results are all very close, and it speeds up 10x the code, which is an amazing feat.
Of course, a word of caution. The configuration is assumed to not change for a filter, which I think is reasonable.
More great reads!
I saw yesterday live the Apple keynote on the WWDC. I am far from an Apple developer, but I use OS X and iOS everyday, and I’m interested on new stuff. There was a full section devoted to developers, which is great (well, it’s supposed to be a developer’s conference, after all), and, arguably, the most interesting stuff on that part (for a developer’s perspective) was the release of a new programming language, Swift.
It was announced with an (irrelevant) comparison with Python in terms of speed (I actually have plans to write a post about “why Python is not really slow“, but I digress), as well as a lot of other details that (IMO) are completely pointless in terms of what makes a good or bad programming language.
I am generally skeptic about the announcement of new languages. Almost as much as new web frameworks. Sure, it adds a new flavour, but I’m not that sure about real advancement in tech. Creating a new language, full with proper “clean and beautiful” syntax is not really that difficult. The difficult part is to create a vibrant community behind it, one that loves the language and works to expand it, to push the boundaries of current tech, to make amazing applications and tools, to convince other developers to use it and to carry on the torch. The target of a language are developers. “End customers” couldn’t care less about how the guts of their products are done. “Ruby sharp? Whatever, I just need that it help us increase our sales“
Interestingly enough, languages get a lot of character from their communities, as they embed their values on the relevant modules and tools. A great example of that is “The Zen Of Python“. There’s nothing there about whitespaces, list comprehensions or classes, but it reflects a lot of the ideas that are common on the Python world, values of the Python Community. Using a language is not just writing code, but also interacting with other developers, directly or even just reading the documents and using the APIs.
Obviously, Apple is a very special situation, as it can force developers to use whatever they like for their platform. Hey, they managed to create an Objective-C ecosystem out from nowhere, which is impressive. For what is worth, they can even tailor a language for their platform, and not to worry about anything else. iOS is a platform big enough for devs to have to learn the language and official IDE and use it. And I am pretty sure that in this case it will be an improvement over the previous environment.
But the one part that I am most skeptic about is the “visual programming” stuff. One of the “wow” announcements was the possibility of creating “playgrounds”, to show interactively the results of the code. That means that, for example, a loaded image will be available, or that a graph can be displayed showing the results of a function. And that’s the part that I’m not really that sure that is interesting or relevant at all.
Does it look cool? Absolutely. May it be interesting once in a while? Sure. But I think that’s the kind of process that, in day to day operation, is not really that useful in most kinds of programming.
Programming, more than anything else, is creating a mental image of code. Code can be a very complex thing. Especially on a big application. But normally we don’t need to keep the whole code in our mind. We only have to keep certain parts of it, allowing to focus in a problem at a time. That’s the main principle behind modules, classes and other abstractions. I can use OS calls to open a file, to draw some pixels on the screen, or to make a call to a remote server. All of that without having to worry about file systems, graphic drivers or network protocols. And I can also use higher level modules to search on files, create 3d models or make HTTPS calls.
And the amazing power of programming is that you are coding on the shoulders of giants. And on the shoulders of regular people. And on the shoulders of your co-workers. And on your own shoulders. That’s a lot of shoulders combined.
But a lot of that process deals with the unavoidable complexity of the interaction. And being able to move from an abstracted view to a more specific one, to look inside and outside the black box, is crucial. It may not be evident, but the mental process of programming deals a lot with that sudden change in perspective. This is one of the reasons of multiparadigm being a useful thing. Because you can move between different abstractions and levels, using the proper one on each case (especially for leaky ones).
And there are lots of those processes that are not easily represented with graphs or images. They are constructs on your mind: loops, flexible structures, intuitions on the weak points of an algorithm, variables changing values, corner cases… Showing all intermediate results may be detrimental to that quick change in perspective. Too much information.
There has been experiments with visual programming, trying to represent code as visual blocks in one way or another, since a long time ago (at least 25 years). They are useful in certain areas, but they are far from a general solution. There are also interactive notepads to allow easy display of graphs and help with the interactivity. iPython Notebook is an excellent example (and a very similar idea to the playground). But, again, I feel that those are specialised tools, not something that is that useful in most programming contexts.
I’m just skeptic. All of this doesn’t necessarily means that Swift is bad, or that those tools are wrong. Maybe the new X-Code will have a lot of amazing tools that will help create fantastic applications (I still don’t like IDEs, though). There are already people checking the docs and giving a try to the new language. But I think that it has to show up how good or bad it is for itself, and by the developers that decide to use it. So far, it is just an announcement. I just feel that most that was said on the keynote was not relevant to determine whether it’s a good working environment or not, but was just a gimmick. Yes, obviously these kind of announcements are publicity stunts, but in this particular case it looks especially so.
Looks cool, but is not particularly relevant to how the mental process of programming works or what makes a language good.