Compendium of Wondrous Links vol IX

wondrous_links

Welcome back to this totally non-regular compilation of interesting reads. Enjoy!

1427381663-20150326

 

 

Do you want to see the whole series?

ffind v0.8 released

Good news everyone!

The new version of find (0.8) is available in GitHub and PyPi. This version includes performance improvements, man page and fuzzy search support.

Enjoy!

Optimise Python with closures

This blog post by Dan Crosta is interesting. It talks about how is possible to optimise Python code for operations that get called multiple times avoiding the usage of Object Orientation and using Closures instead.

While the “closures” gets the highlight, the main idea is a little more general. Avoid repeating code that is not necessary for the operation.

The difference between the first proposed code, in OOP way

class PageCategoryFilter(object):
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = config["categories"]

    def filter(self, bid_request):
        if self.mode == "whitelist":
            return bool(
                bid_request["categories"] & self.categories
            )
        else:
            return bool(
                self.categories and not
                bid_request["categories"] & self.categories
            )

and the last one

def make_page_category_filter(config):
    categories = config["categories"]
    mode = config["mode"]
    def page_category_filter(bid_request):
        if mode == "whitelist":
            return bool(bid_request["categories"] & categories)
        else:
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
    return page_category_filter

The main differences are that both the config dictionary and the methods (which are also implemented as a dictionary) are not accessed. We create a direct reference to the value (categories and mode) instead of making the Python interpreter search on the self methods over and over.

This generates a significant increase in performance, as described on the post (around 20%).

But why stop there? There is another clear win in terms of access, assuming that the filter doesn’t change. This is the “mode”, which we are comparing for whitelist of blacklist on each iteration. We can create a different closure depending on the mode value.

def make_page_category_filter2(config):
    categories = config["categories"]
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return bool(bid_request["categories"] & categories)
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
        return blacklist_filter

There are another couple of details. The first one is to transform the config categories into a frozenset. Assuming that the config doesn’t change, a frozenset is more efficient than a regular mutable set. This is insinuated in the post, but maybe didn’t get the final review (or to simplify it).

Also, we are calculating the intersection of a set (operand &) to then reduce it to a bool. There is currently a set operation that gets the result without calculating the whole intersection (isdisjoint).

The same basic principle applies to calculate the bool category for the black filter. We can calculate it only once, as it’s there to short-circuit the result in case of an empty config category.

def make_page_category_filter2(config):
    categories = frozenset(config["categories"])
    bool_cat = bool(categories)
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return not categories.isdisjoint(bid_request["categories"])
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return (bool_cat and categories.isdisjoint(bid_request["categories"]))
        return blacklist_filter

Even if all of this enters the definition of micro-optimisations (which should be used with care, and only after a hot spot has been found), it actually makes a significant difference, reducing the time around 35% from the closure implementation and ~50% from the initial reference implementation.

All these elements are totally applicable to the OOP implementation, by the way. Python is quite flexible about assigning methods. No closures!

class PageCategoryFilter2(object):
    ''' Keep the interface of the object '''
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = frozenset(config["categories"])
        self.bool_cat = bool(self.categories)
        if self.mode == "whitelist":
            self.filter = self.filter_whitelist
        else:
            self.filter = self.filter_blacklist

    def filter_whitelist(self, bid_request):
        return not bid_request["categories"].isdisjoint(self.categories)

    def filter_blacklist(self, bid_request):
        return (self.bool_cat and
                bid_request["categories"].isdisjoint(self.categories))

Show me the time!

Here is the updated code, adding this implementations to the test.

The results in my desktop (2011 iMac 2.7GHz i5) are

        total time (sec)  time per iteration
class   9.59787607193     6.39858404795e-07
func    8.38110518456     5.58740345637e-07
closure 7.96493911743     5.30995941162e-07
class2  6.00997519493     4.00665012995e-07
closur2 5.09431600571     3.39621067047e-07

The new class performs better than the initial closure! The optimised closure is anyway trumping, saving a big chunk compared with the slower implementation. The PyPy results are all very close, and it speeds up 10x the code, which is an amazing feat.

Of course, a word of caution. The configuration is assumed to not change for a filter, which I think is reasonable.

Happy optimising!

Compendium of Wondrous Links vol VIII

wondrous_links

More great reads!

hands-typing

About code creation

office

The job of developing

knowledge_workers_productive

concept artOther stuff

 

 

Compendium of Wondrous Links VI

wondrous_links

  • They finally found all those buried Atari cartridges, and confirmed a beloved urban legend. Just wonderful.
  • This episode of @ExtraCreditz follows up an idea I always had about education. The key is being demanding, but allowing a lot of opportunities.
  • Amazing book introduction, showing how no one is immune to think that they are stupid. Lots of things in live are hard.
  • Readability in code is not about being literary. Is about making the code easy to understand. You don’t read code, you explore it.
  • The Great Works of Software. The premise is extremely interesting. What are the most influential pieces of software?
  • The hilarious (is funny because it’s true) Programming Sucks and a follow-up What programming is Like.
  • Is programming a dead end job? I still can’t help but feel sad each time that a (good) developer decides to move into management.
  • It’s easy to forget how much the things have changed in term of software distribution. What Writing and Selling Software Was Like in the 80’s (yep, also from The Codist. You should subscribe)
  • The computer world is very dominated by English, and even so with latin alphabet. This idea about making a computer language in Arabic is fascinating. It not only shows how difficult is to set up an environment without problems out of “the ASCII world” (the magnitude is not comparable, but trying to code in languages like French or Spanish has a lot of friction), but it also shows up how alien (yet beautiful) a different alphabet looks. I wonder how code and programming will be if the dominant language would’ve been something like Chinese or Arabic.
  • What is the “Agile mindset” anyway? The graph is very interesting. Specially the “Chaos labeled as Agile” side.
  • I don’t really like the idea of “rivalry” against Vim and Emacs. I prefer to consider them two valid options. But this article goes into explaining their different appeals and why they have been around since an extremely long time ago in computer-years.
  • 10 Most common Python mistakes. Good to check.

Visual Programming and Mental Constructs

I saw yesterday live the Apple keynote on the WWDC. I am far from an Apple developer, but I use OS X and iOS everyday, and I’m interested on new stuff. There was a full section devoted to developers, which is great (well, it’s supposed to be a developer’s conference, after all), and, arguably, the most interesting stuff on that part (for a developer’s perspective) was the release of a new programming language, Swift.

It was announced with an (irrelevant) comparison with Python in terms of speed (I actually have plans to write a post about “why Python is not really slow“, but I digress), as well as a lot of other details that (IMO) are completely pointless in terms of what makes a good or bad programming language.

Most pointless benchmark ever
Most pointless benchmark ever

I am generally skeptic about the announcement of new languages. Almost as much as new web frameworks. Sure, it adds a new flavour, but I’m not that sure about real advancement in tech. Creating a new language, full with proper “clean and beautiful” syntax is not really that difficult. The difficult part is to create a vibrant community behind it, one that loves the language and works to expand it, to push the boundaries of current tech, to make amazing applications and tools, to convince other developers to use it and to carry on the torch. The target of a language are developers. “End customers” couldn’t care less about how the guts of their products are done. “Ruby sharp? Whatever, I just need that it help us increase our sales

Interestingly enough, languages get a lot of character from their communities, as they embed their values on the relevant modules and tools. A great example of that is “The Zen Of Python“. There’s nothing there about whitespaces, list comprehensions or classes, but it reflects a lot of the ideas that are common on the Python world, values of the Python Community. Using a language is not just writing code, but also interacting with other developers, directly or even just reading the documents and using the APIs.

As mandatory as honor
As mandatory as honor

Obviously, Apple is a very special situation, as it can force developers to use whatever they like for their platform. Hey, they managed to create an Objective-C ecosystem out from nowhere, which is impressive. For what is worth, they can even tailor a language for their platform, and not to worry about anything else. iOS is a platform big enough for devs to have to learn the language and official IDE and use it. And I am pretty sure that in this case it will be an improvement over the previous environment.

But the one part that I am most skeptic about is the “visual programming” stuff. One of the “wow” announcements was the possibility of creating “playgrounds”, to show interactively the results of the code. That means that, for example, a loaded image will be available, or that a graph can be displayed showing the results of a function. And that’s the part that I’m not really that sure that is interesting or relevant at all.

Does it look cool? Absolutely. May it be interesting once in a while? Sure. But I think that’s the kind of process that, in day to day operation, is not really that useful in most kinds of programming.

Programming, more than anything else, is creating a mental image of code. Code can be a very complex thing. Especially on a big application. But normally we don’t need to keep the whole code in our mind. We only have to keep certain parts of it, allowing to focus in a problem at a time. That’s the main principle behind modules, classes and other abstractions. I can use OS calls to open a file, to draw some pixels on the screen, or to make a call to a remote server. All of that without having to worry about file systems, graphic drivers or network protocols. And I can also use higher level modules to search on files, create 3d models or make HTTPS calls.

And the amazing power of programming is that you are coding on the shoulders of giants. And on the shoulders of regular people. And on the shoulders of your co-workers. And on your own shoulders. That’s a lot of shoulders combined.

But a lot of that process deals with the unavoidable complexity of the interaction. And being able to move from an abstracted view to a more specific one, to look inside and outside the black box, is crucial. It may not be evident, but the mental process of programming deals a lot with that sudden change in perspective. This is one of the reasons of multiparadigm being a useful thing. Because you can move between different abstractions and levels, using the proper one on each case (especially for leaky ones).

And there are lots of those processes that are not easily represented with graphs or images. They are constructs on your mind: loops, flexible structures, intuitions on the weak points of an algorithm, variables changing values, corner cases… Showing all intermediate results may be detrimental to that quick change in perspective. Too much information.

There has been experiments with visual programming, trying to represent code as visual blocks in one way or another, since a long time ago (at least 25 years). They are useful in certain areas, but they are far from a general solution. There are also interactive notepads to allow easy display of graphs and help with the interactivity. iPython Notebook is an excellent example (and a very similar idea to the playground). But, again, I feel that those are specialised tools, not something that is that useful in most programming contexts.

I’m just skeptic. All of this doesn’t necessarily means that Swift is bad, or that those tools are wrong. Maybe the new X-Code will have a lot of amazing tools that will help create fantastic applications (I still don’t like IDEs, though). There are already people checking the docs and giving a try to the new language.  But I think that it has to show up how good or bad it is for itself, and by the developers that decide to use it. So far, it is just an announcement. I just feel that most that was said on the keynote was not relevant to determine whether it’s a good working environment or not, but was just a gimmick. Yes, obviously these kind of announcements are publicity stunts, but in this particular case it looks especially so.

Looks cool, but is not particularly relevant to how the mental process of programming works or what makes a language good.

Hmph. Visual blocks. Heh. Excitement. Heh. A Developer craves not these things.
Hmph. Visual blocks. Heh. Excitement. Heh. A Developer craves not these things.

Compendium of Wondrous Links vol V

wondrous_links