Compendium of Wondrous Links vol X

wondrous_links

More interesting reads worth checking out

topblueprint

Tech

Red Lion, Pennsylvania, USA --- 6/1/1946- Red Lion, PA: Soft coal miners return to work... miners stand in the elevator cage, ready to descend into the H.C. Frick coke company mine at Red Lion, PA., near Connellsville, June 1st, to work their first shift since settlement of the soft coal strike. Pennsylvania'a 75,000 hard coal miners are still on strike while contract negotiations continue. PH: Edwin J. Morgan. --- Image by © Bettmann/CORBIS

About development

  • I’ve still confused with this “learning code is cool”, as this article says. I’m not sure if this is a bad time to be a beginner.  Yes, it’s true that too many options is confusing, but the amount and quality of instructional material at the moment is absolutely incredible. Beginners right now are a thousand times more capable of doing stuff than 20 years ago, just by the increase of productivity and clarity.
  • Tools don’t solve the web problems. Related to the first about the constant new tools for working on a web development, and their problems.
  • This tweet chain describes quite good the constant roller coaster when developing code.
  • Be friends with failure. The master has failed more times than the beginner has even tried.

Leonardo numbers

I have my own set of numbers!
I have my own set of numbers!

Because Fibonacci numbers are quite abused in programming, a similar concept.


L0 = L1 = 1

Ln = Ln-2 + Ln-1 + 1

My first impulse is to describe them in recursive way:

def leonardo(n):
    if n in (0, 1):
        return 1
    return leonardo(n - 2) + leonardo(n - 1) + 1 

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))

But this is not very efficient to calculate them, as for each is calculating all the previous ones, recursively.

Here memoization works beautifully


cache = {}

def leonardo(n):
    if n in (0, 1):
        return 1

    if n not in cache:
        result = leonardo(n - 1) + leonardo(n - 2) + 1
        cache[n] = result

    return cache[n]

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))

Taking into account that it uses more memory, and that calculating the Nth element without calculating the previous ones is also costly.

I saw this on Programming Praxis, and I like a lot the solution proposed by Graham on the comments, using an iterator.

def leonardo_numbers():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b + 1

The code is really clean.

Compendium of Wondrous Links vol IX

wondrous_links

Welcome back to this totally non-regular compilation of interesting reads. Enjoy!

1427381663-20150326

 

 

Do you want to see the whole series?

ffind v0.8 released

Good news everyone!

The new version of find (0.8) is available in GitHub and PyPi. This version includes performance improvements, man page and fuzzy search support.

Enjoy!

Optimise Python with closures

This blog post by Dan Crosta is interesting. It talks about how is possible to optimise Python code for operations that get called multiple times avoiding the usage of Object Orientation and using Closures instead.

While the “closures” gets the highlight, the main idea is a little more general. Avoid repeating code that is not necessary for the operation.

The difference between the first proposed code, in OOP way

class PageCategoryFilter(object):
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = config["categories"]

    def filter(self, bid_request):
        if self.mode == "whitelist":
            return bool(
                bid_request["categories"] & self.categories
            )
        else:
            return bool(
                self.categories and not
                bid_request["categories"] & self.categories
            )

and the last one

def make_page_category_filter(config):
    categories = config["categories"]
    mode = config["mode"]
    def page_category_filter(bid_request):
        if mode == "whitelist":
            return bool(bid_request["categories"] & categories)
        else:
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
    return page_category_filter

The main differences are that both the config dictionary and the methods (which are also implemented as a dictionary) are not accessed. We create a direct reference to the value (categories and mode) instead of making the Python interpreter search on the self methods over and over.

This generates a significant increase in performance, as described on the post (around 20%).

But why stop there? There is another clear win in terms of access, assuming that the filter doesn’t change. This is the “mode”, which we are comparing for whitelist of blacklist on each iteration. We can create a different closure depending on the mode value.

def make_page_category_filter2(config):
    categories = config["categories"]
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return bool(bid_request["categories"] & categories)
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
        return blacklist_filter

There are another couple of details. The first one is to transform the config categories into a frozenset. Assuming that the config doesn’t change, a frozenset is more efficient than a regular mutable set. This is insinuated in the post, but maybe didn’t get the final review (or to simplify it).

Also, we are calculating the intersection of a set (operand &) to then reduce it to a bool. There is currently a set operation that gets the result without calculating the whole intersection (isdisjoint).

The same basic principle applies to calculate the bool category for the black filter. We can calculate it only once, as it’s there to short-circuit the result in case of an empty config category.

def make_page_category_filter2(config):
    categories = frozenset(config["categories"])
    bool_cat = bool(categories)
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return not categories.isdisjoint(bid_request["categories"])
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return (bool_cat and categories.isdisjoint(bid_request["categories"]))
        return blacklist_filter

Even if all of this enters the definition of micro-optimisations (which should be used with care, and only after a hot spot has been found), it actually makes a significant difference, reducing the time around 35% from the closure implementation and ~50% from the initial reference implementation.

All these elements are totally applicable to the OOP implementation, by the way. Python is quite flexible about assigning methods. No closures!

class PageCategoryFilter2(object):
    ''' Keep the interface of the object '''
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = frozenset(config["categories"])
        self.bool_cat = bool(self.categories)
        if self.mode == "whitelist":
            self.filter = self.filter_whitelist
        else:
            self.filter = self.filter_blacklist

    def filter_whitelist(self, bid_request):
        return not bid_request["categories"].isdisjoint(self.categories)

    def filter_blacklist(self, bid_request):
        return (self.bool_cat and
                bid_request["categories"].isdisjoint(self.categories))

Show me the time!

Here is the updated code, adding this implementations to the test.

The results in my desktop (2011 iMac 2.7GHz i5) are

        total time (sec)  time per iteration
class   9.59787607193     6.39858404795e-07
func    8.38110518456     5.58740345637e-07
closure 7.96493911743     5.30995941162e-07
class2  6.00997519493     4.00665012995e-07
closur2 5.09431600571     3.39621067047e-07

The new class performs better than the initial closure! The optimised closure is anyway trumping, saving a big chunk compared with the slower implementation. The PyPy results are all very close, and it speeds up 10x the code, which is an amazing feat.

Of course, a word of caution. The configuration is assumed to not change for a filter, which I think is reasonable.

Happy optimising!

Compendium of Wondrous Links vol VIII

wondrous_links

More great reads!

hands-typing

About code creation

office

The job of developing

knowledge_workers_productive

concept artOther stuff

 

 

Compendium of Wondrous Links VI

wondrous_links

  • They finally found all those buried Atari cartridges, and confirmed a beloved urban legend. Just wonderful.
  • This episode of @ExtraCreditz follows up an idea I always had about education. The key is being demanding, but allowing a lot of opportunities.
  • Amazing book introduction, showing how no one is immune to think that they are stupid. Lots of things in live are hard.
  • Readability in code is not about being literary. Is about making the code easy to understand. You don’t read code, you explore it.
  • The Great Works of Software. The premise is extremely interesting. What are the most influential pieces of software?
  • The hilarious (is funny because it’s true) Programming Sucks and a follow-up What programming is Like.
  • Is programming a dead end job? I still can’t help but feel sad each time that a (good) developer decides to move into management.
  • It’s easy to forget how much the things have changed in term of software distribution. What Writing and Selling Software Was Like in the 80’s (yep, also from The Codist. You should subscribe)
  • The computer world is very dominated by English, and even so with latin alphabet. This idea about making a computer language in Arabic is fascinating. It not only shows how difficult is to set up an environment without problems out of “the ASCII world” (the magnitude is not comparable, but trying to code in languages like French or Spanish has a lot of friction), but it also shows up how alien (yet beautiful) a different alphabet looks. I wonder how code and programming will be if the dominant language would’ve been something like Chinese or Arabic.
  • What is the “Agile mindset” anyway? The graph is very interesting. Specially the “Chaos labeled as Agile” side.
  • I don’t really like the idea of “rivalry” against Vim and Emacs. I prefer to consider them two valid options. But this article goes into explaining their different appeals and why they have been around since an extremely long time ago in computer-years.
  • 10 Most common Python mistakes. Good to check.