ffind v0.8 released

Good news everyone!

The new version of find (0.8) is available in GitHub and PyPi. This version includes performance improvements, man page and fuzzy search support.

Enjoy!

Optimise Python with closures

This blog post by Dan Crosta is interesting. It talks about how is possible to optimise Python code for operations that get called multiple times avoiding the usage of Object Orientation and using Closures instead.

While the “closures” gets the highlight, the main idea is a little more general. Avoid repeating code that is not necessary for the operation.

The difference between the first proposed code, in OOP way

class PageCategoryFilter(object):
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = config["categories"]

    def filter(self, bid_request):
        if self.mode == "whitelist":
            return bool(
                bid_request["categories"] & self.categories
            )
        else:
            return bool(
                self.categories and not
                bid_request["categories"] & self.categories
            )

and the last one

def make_page_category_filter(config):
    categories = config["categories"]
    mode = config["mode"]
    def page_category_filter(bid_request):
        if mode == "whitelist":
            return bool(bid_request["categories"] & categories)
        else:
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
    return page_category_filter

The main differences are that both the config dictionary and the methods (which are also implemented as a dictionary) are not accessed. We create a direct reference to the value (categories and mode) instead of making the Python interpreter search on the self methods over and over.

This generates a significant increase in performance, as described on the post (around 20%).

But why stop there? There is another clear win in terms of access, assuming that the filter doesn’t change. This is the “mode”, which we are comparing for whitelist of blacklist on each iteration. We can create a different closure depending on the mode value.

def make_page_category_filter2(config):
    categories = config["categories"]
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return bool(bid_request["categories"] & categories)
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return bool(
                categories and not
                bid_request["categories"] & categories
            )
        return blacklist_filter

There are another couple of details. The first one is to transform the config categories into a frozenset. Assuming that the config doesn’t change, a frozenset is more efficient than a regular mutable set. This is insinuated in the post, but maybe didn’t get the final review (or to simplify it).

Also, we are calculating the intersection of a set (operand &) to then reduce it to a bool. There is currently a set operation that gets the result without calculating the whole intersection (isdisjoint).

The same basic principle applies to calculate the bool category for the black filter. We can calculate it only once, as it’s there to short-circuit the result in case of an empty config category.

def make_page_category_filter2(config):
    categories = frozenset(config["categories"])
    bool_cat = bool(categories)
    if config['mode'] == "whitelist":
        def whitelist_filter(bid_request):
            return not categories.isdisjoint(bid_request["categories"])
        return whitelist_filter
    else:
        def blacklist_filter(bid_request):
            return (bool_cat and categories.isdisjoint(bid_request["categories"]))
        return blacklist_filter

Even if all of this enters the definition of micro-optimisations (which should be used with care, and only after a hot spot has been found), it actually makes a significant difference, reducing the time around 35% from the closure implementation and ~50% from the initial reference implementation.

All these elements are totally applicable to the OOP implementation, by the way. Python is quite flexible about assigning methods. No closures!

class PageCategoryFilter2(object):
    ''' Keep the interface of the object '''
    def __init__(self, config):
        self.mode = config["mode"]
        self.categories = frozenset(config["categories"])
        self.bool_cat = bool(self.categories)
        if self.mode == "whitelist":
            self.filter = self.filter_whitelist
        else:
            self.filter = self.filter_blacklist

    def filter_whitelist(self, bid_request):
        return not bid_request["categories"].isdisjoint(self.categories)

    def filter_blacklist(self, bid_request):
        return (self.bool_cat and
                bid_request["categories"].isdisjoint(self.categories))

Show me the time!

Here is the updated code, adding this implementations to the test.

The results in my desktop (2011 iMac 2.7GHz i5) are

        total time (sec)  time per iteration
class   9.59787607193     6.39858404795e-07
func    8.38110518456     5.58740345637e-07
closure 7.96493911743     5.30995941162e-07
class2  6.00997519493     4.00665012995e-07
closur2 5.09431600571     3.39621067047e-07

The new class performs better than the initial closure! The optimised closure is anyway trumping, saving a big chunk compared with the slower implementation. The PyPy results are all very close, and it speeds up 10x the code, which is an amazing feat.

Of course, a word of caution. The configuration is assumed to not change for a filter, which I think is reasonable.

Happy optimising!

Some characteristics of the best developers I worked with

I had a conversation last November on the PyConEs, when I was on a conversation stating that I am working with truly brilliant people in DemonWare, and then someone asked me: “Do you have problems agreeing in  what to do? Normally great developers have problems reaching consensus on tech discussions”. My answer something like: “Well, in my experience, truly awesome developers know when to have a strong argument and they usually are ok reaching an agreement in a reasonable time”.

So, I wanted to, as sort of follow-up, summarise what are the characteristics that I’ve seen in the best developers I’ve been lucky to work with. This is not a list I am making on “what’s my ideal developer”, but more a reflexion on the common traits I’ve seen on my experience…

    • Awesome developers are obviously smart, but that’s not typically shown as bursts of brilliance, solving really difficult issues with “aha!” moments. In my experience, genius ideas are rarely required nor expressed (though they surely happen once in a blue moon). Instead, great developers are consistently smart. They present solutions to problems that are reasonable all the time. They find and fix typical bugs with ease. They struggle with very difficult problems, but are able to deal with them. They are able to quickly present something that will make you say “Actually that’s a nice point. Why didn’t I think about this?”. They do not typically present something ingenious and never heard of, but deliver perfectly fine working ideas over and over, one day after another.  Their code is not full of mind blowing concepts, but it is logical, clean and easy to follow most the time (and when’s not, there is a good reason). They are able to remove complexity and simplify stuff, to a degree that it almost look easy (but it’s not)
Normally brilliant people on real life do not come with crazy great ideas out of nowhere
Brilliant people on real life do not come with insanely great ideas out of nowhere
  • They keep a lot of relevant information on their minds. They are able to relate something that is in discussion with something that happened three months ago. They seem to have the extraordinary ability of getting out of the hat some weird knowledge that is applicable to the current problem.
  • While they have a passion for coding, it is not the only thing in their lives. They have hobbies and interests, and they don’t usually go home in the weekends to keep working on open source all day, though they may occasionally do.
  • They love to do things “the right way”, but even more than that, they love to make things work. This means that they will use tools they consider inferior to achieve something if it’s the best/most convenient way. They’ll complain and will try to change it, but deliver will be more important that being right. They have strong opinions about what language/framework/way of doing stuff is best, being that Python, Ruby, Haskell, PostgreSQL, Riak or COBOL, but that won’t stop them knowing when it’s important to just stop arguing and do it.
  • They are humble. They are confident most of the times, but far from arrogant. My impression is that they don’t think that they are as awesome as they truly are. They will want to learn from everyone else, and ask when they have questions. They will also catch new ideas very fast. They are also friendly and nice.
  • Communication is among their best skills. They are very good communications, especially, but not limited, about tech issues. They may be a little social awkward sometimes (though this is not as common as stereotypes portrait), but when they have the motivation to express some idea, they’ll do it very clearly.
  • In some of the truly remarkable cases, they’ll be able to fulfil different roles, when needed. I mean different roles in the most broad sense, basically being able to be what’s needed for that particular moment. Sometimes they’ll have to be leaders, sometimes they’ll be ok being led. They’ll know when a joke is the proper thing to do and when to remain formal. They’ll be the person that helps you with a difficult technical question, or the one that will tell you “you’re tired, just go home and tomorrow it will be another day”
  • And they’ll have a great sense of humour. I know that almost everyone thinks that they have a good sense of humour. That’s not totally true.

Again, this is sort of a personal collection of traits based in my experience and on what I consider the best developers I’ve been honoured to work with. Any ideas?

My concerns with Bitcoin as a currency

Today I retweeted this brilliant tweet:

So, to start the year, I’ve decided to share some of my thought on the bit coin issue, and some of the problems I see. As I am not an economist, I’m not going to go into the deflation / long term scenario. For what I know, that’s very bad, but as that can lead to a deep economic conversation, one I don’t really want to get into, as I lack of the required knowledge, I’m going to concede that. Let’s imagine that bitcoin, from the macroeconomic point of view is absolutely sound. Even in that case, my impression is that it is not very safe from the user point of view. These are “social problems“, more than “tech problems“.

(I am also going to assume that it is cryptographically sound, as I don’t have any reason to think is not)

One of the main problems the system have is that you are entirely on your own to safe your bitcoins / wallets. I guess some people don’t perceive this as a “real problem“, but as someone that can be considered tech-savy, the perspective of a virus, a hardware problem or a missing password that can make disappear my money forever  is really worrying. Even a common problem like transferring money from a dead person (unfortunately, everyone gets to that point) can be impossible if not planed in advance. A Bitcoin wallet (which can be reduced to a private key, a sequence of bits that should be secret) associated to all your Bitcoins can be gone or inaccessible in seconds. Accidental deletion, hardware problems, a malicious virus … Yes, there are countermeasures to this, like backups (if you’re reading this and you don’t have a backup in place, PLEASE DO), but the sad truth is that most of the people out there does make regular backups.

Gone in 10 minutes
Gone in 10 minutes

The single most important quality of any currency is trust. I trust that, if I have money in Dollars or Euros, they are not going to be vaporised for a stupid reason like a failing hard drive. All you need is some horror stories of people loosing all their savings on Bitcoin because there is a virus out there, and non-tech-wavy people will be scared, loosing trust on the currency.

Of course, this scenario can be avoided by an intelligent move. Hey, I don’t have my Euros with me in cash because of these problems. I put them in the bank! Awesome. I can move all my bitcoins to a bank, and interact with my money in the usual way, like credit cards, getting some from time to time from the ATM (in this case, a virtual online ATM). But, in this case, what’s the point of  Bitcoin? If I relay on a bank, I am using the currency exactly as I am using Dollars, Euros or Sterling Pounds (and the banks will charge accordingly). It could have some small benefits, like getting the money out of the bank to transfer it to someone else in an easier fashion than with a traditional currency (especially for small amounts), but I doubt it will be different enough or advantageous enough to justify using Bitcoin instead of regular currencies for most people.

I must say that Casascius coins are gorgeous
I must say that Casascius coins are gorgeous

Another insidious problem I can see is privacy. Bitcoin is pseudonymous, meaning that all the transactions are public, but there is no association between a wallet and someone. I don’t see that as reassuring, as getting to know that wallet A belongs to person B is definitively not a extremely difficult operation. In case Bitcoin was popular, there will be a lot of transaction, and most people would use a couple of wallets at most, for convenience. If you need to send goods to someone, for example, it won’t be that difficult to associate the wallet that pay for the goods with the person receiving the goods. Again, this can be obscured and some people will use complex schemas to hide who they are, but in a typical operation, I’d say that most people wouldn’t care too much about it, just as they don’t care at the moment with a credit card.

Ok, so you manage to know that person B is behind wallet A. Now you can track all the activity of wallet A (because it is public) and use it for whatever you want. A lot of wallets will be simply obvious what they are (known shops), so for example that will be a great way of  “directed marketing”. For example, Amazon could know that you have a contract with Vodaphone that looks like a mobile contract. Now you’ll get “directed information” of all the million offers that Amazon has about mobile products. Great, now you have more spam in your inbox. The data mining implications are incredible.

Of course, any purchase that you don’t necessarily what to share with the world can be exposed. And it’s there, publicly available, forever. If you move to a different wallet and move your bitcoins around, hey, that’s registered, so you can’t hide unless you transfer all the money out of the system, and then exchange it back, to a new wallet(s) that, this time, hopefully won’t be discovered. Plus all the inconveniences of doing so, of course.

Of course, there are ways of dealing with it. Using a lot of wallets, circulating the money among them (and hoping this is safe enough, as there could be advanced methods of detection for common uses). Being aware of what information is being shared. But, seriously, are we expecting everyone that just wants to use a currency to make common operations to add all that overhead and knowledge? I think that’s asking too much.

As the objective of a currency is to be used as means of payment, to be exchanged often, I think that these problems are in the way of considering Bitcoin as a currency replacement that can get some real traction in the world. The potential risks are quite big, and not well understood for a lot of people at the moment. Of course, these problems are at the moment less important that the fact that Bitcoin is used at the moment as an investment / speculation product, making the exchange rate so volatile that using Bitcoin as a currency is currently unviable. But assuming that Bitcoin can leave this state behind, I still see these issues in the way of becoming a viable currency.

I am not an expert in this subject, so if I am mistaken at some point, let me know. Comments welcome :-P

Python Wizard

Elton+John+Pinball+WizardEver since I was a young boy,
I typed on keyboards
From bash commands to Java
I must have code them all
but I ain’t seen nothing like him
In any Hackathon
That nice, nerd and shy kid
Sure codes great Python!

He stands like a statue,
Becomes part of the machine.
Lots of comprehensions
always writing clean
right code indentation
dicts used the most
That nice, nerd and shy kid
Sure codes great Python!

He’s a coding wizard
There has to be a twist.
A coding wizard,
S’got such a supple wrist.

How do you think he does it?
I don’t know!
What makes him so good?

ain’t got no distractions
semicolons or brackets
Nice packaged modules
produced everyday
Functional programing
when it fits the best
That nice, nerd and shy kid
Sure codes great Python!

I thought I was
The system admin king.
But I just handed
My hacker crown to him.

Even on my favorite system
He can beat my best.
Opens the text editor
And he just does the rest
He’s got crazy vi fingers
no IDE at all
That nice, nerd and shy kid
Sure codes great Python!

Make beautiful Python code (talk at PyCon IE ’13)

Another year, another amazing PyCon. I guess I repeat myself, but I keep being impressed about the quality of the talks and the friendly, vibrant atmosphere. It is always a pleasure to spend some time with people interested in code and technology… There was also an increase in the number attendees, and quite a lot students. I said that on Twitter, but Python Ireland, you guys rock.

Of all the talks I attend to, I’d like to comment two that were especially interesting. The first was one of the keynotes, PRISM-as-a-Service: Not Subject to American Law, by Lynn Root. All this think is pretty scary when you think about it. Definitively worth a read. The other one was The Clean Architecture in Python, by Brandon Rhodes, about ways of designing code and make them data-centric.

I also gave a talk, and other than a problem with the project that made me rush a little, I think it went good. Just in case you’re interested, here are the slides. Here is also the PDF version with notes.

Oh, and another thing. there are launching the pyLadies Dublin group this wednesday 15th October, so if you’re interested, show up.

 

UPDATE: Added slides for Brandon Rhodes talk

ffind is now available on PyPI

Remember ffind (A sane replacement for command line file search) module/script ? I’ve just pushed it to PyPI, so anyone interested in giving it a try can install it doing

pip install ffind

Brilliant!

As this was my first submission to PyPI, I’ve follow this guide. It has been quite simple, once it is prepared to use setup.py. And remember, the code is available on Github, so feel free to check it and contribute!