All you need is cache

Cache is all you need
Cache is all you need

What is cache

More than a formal definition, I think that the best way of thinking about cache is an result from an operation (data) that gets saved (cached) for future use.

The cache value should be identifiable with a key that is reasonably small. This normally is the call name and the parameters, in some sort of hashed way.

A proper cache has the following three properties:

  1. The result is always replicable. The value can be scrapped without remorse.
  2. Obtaining the result from cache is faster than generate it.
  3. The same result will be used more than once.

The first property implies that the cache is never the True Source of Data. A cache that’s the True Source of Data is not a cache, it’s a database; and need be treated as such.

The second one implies that retrieving from cache is useful. If getting the result from the cache is slower (or only marginally better) than from the True Source of Data, the cache can (and should) be removed. A good candidate for cache should be a slow I/O operation or computationally expensive call. When in doubt, measure and compare.

The third property simply warns against storing values that will be used only once, so the cached value won’t be ever used again. For example, big parts of online games are uncacheable because they change so often they are read less times than written.

The simplest cache

The humblest cache is a well known technique called memoization, which is simply to store in process memory the results of a call, to serve it from there on the next calls with the same parameters. For example,

NUMBER = 100
def leonardo(number):

    if number in (0, 1):
        return 1

    return leonardo(number - 1) + leonardo(number - 2) + 1

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))  

This terribly performant code will return the first 100 Leonardo numbers. But each number will be calculated recursively, so storing the result we can greatly speed up the results. The key to store the results is simply the number.

cache = {}

def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    return cache[number]

for i in range(NUMBER):
    print('leonardo[{}] = {}'.format(i, leonardo(i)))

Normally, though, we’d like to limit the total size of the cache, to avoid our program to run wild in memory. This restrict the size of the cache to only 10 elements, so we’ll need to delete values from the cache to allow new values to be cached:

def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    ret_value = cache[number]

    while len(cache) > 10:
        # Maximum size allowed, 10 elements
        # this is extremely naive, but it's just an example
        key = cache.keys()[0]
        del cache[key]

    return ret_value

Of course, in this example every cached value never changes, which may not be the case. There’s further discussion about this issue below.

Cache keys

Cache keys deserve a small note. They are not usually complicated, but the key point is that they need to be unique. A non unique key, which may be produced by unproper hashing, will produce cache collisions, returning the wrong data. Be sure that this doesn’t happen.

Python support

Just for the sake of being useful, on Python3 there is support for a decorator to cache calls, so the previous code can look like this.

from functools import lru_cache

def leonardo(number):

    if number in (0, 1):
        return 1

    if number not in cache:
        result = leonardo(number - 1) + leonardo(number - 2) + 1
        cache[number] = result

    return cache[number]

so you can use it instead of implement your own.

The stereotypical web app cache

In the context of web apps, everyone normally thinks of memcached when thinks of cache.

Memcached will, in this stereotypical usage, use some allocated memory to cache database results or full HTML page, identified by an appropriate unique key, speeding up the whole operation. There are a lot of integrated tools with web frameworks and it can be clustered, increasing the total amount of memory and reliability of the system.typical_usage

In a production environment, with more than one server, the cache can  be shared among different servers, making the generation of content only happen once on the whole cluster, and then be able to be read by every consumer. Just be sure to ensure the first property, making possible to obtain the value from the True Source of Data at any point, from any server.

This is a fantastic setting, and worth using in services. Memcached can be also replaced by other tools like Redis, but the general operation is similar.

But there are more ways to cache!

Assuming a typical distributed deployment on a production web server, there are a lot of places where a cache can be introduced to speed up things.

This described service will have one DB (or a cluster) that contains the True Source of Data, several servers with a web server channeling requests to several backend workers, and a load balancer on top of that, as the entry point of the service.


Typically, the farther away that we introduce a cache from the True Source of Data, the less work we produce to the system and the most efficient the cache is.

Let’s describe possible caches from closest to the True Source of Data to farther away.

Cache inside the DataBase

(other than the internal cache of the database itself)

Some values can be stored directly on the database, derivating them from the True Source of Data, in a more manageable way.

A good example for that are periodic reports. If some data is produced during the day, and a report is generated every hour, that report can be stored on the database as well. Next accesses will be accessing the already-compiled report, which should be less expensive than crunching the numbers again.


Another useful way of caching values is to use replication. This can be supported by databases, making possible to read from different nodes at the same time, increasing throughput.

For example, using Master-Slave replication on MySQL, the True Source of Data is on the Master, but that information gets replicated to the slaves, that can be used to increase the read throughput.

Here the third property of cache shows up, as this is only useful if we read the data more often than we write it. Write throughput is not increased.

Cache in the Application Level

The juiciest part of a service is normally in this level, and here is where the most alternatives are available.

From the raw results of the database queries, to the completed HTML (or JSON, or any other format) resulting from the request, or any other meaningful intermediate result, here is where the application of caches can be most creative.

Memory caches can be set either internally per worker, per server, or  externally for intermediate values.

  • Cache per worker. This is the fastest option, as the overhead will be minimal, being internal memory of the process serving the requests. But it will be multiplied by the number of workers per box, and will need to be generated individually. No extra maintenance needs to be done, though.
  • External cache. An external service, like memcached. This will share all the cache among the whole service, but the delays in accessing the cache would be limited by the network. There is extra maintenance costs in setting the external service.
  • Cache per server. Intermediate. Normally, setting on each server a cache service like memcached. Local faster access shared among all workers on the same box, with the small overhead of using a protocol.

Another possibility worth noting in some cases is to cache in the hard drive, instead of RAM memory. Reading from local hard drive can be faster than accessing external services, in particular if the external service is very slow (like a connection to a external network) or if the data needs to be highly processed before being used. Hard drive caches can also be helpful for high volumes of data that won’t fit in memory, or reducing startup time, if starting a worker requires complex operations that produces a cacheable outcome.

Cache in the Web Server

Widely available web servers like  Apache or Nginx have integrated caches. This is typically less flexible than application layer caching, and needs to fit into common patterns, but it’s simple to setup and operate.

There’s also the possibility to return an empty response with status code 304 Not Modified, indicating that the data hasn’t changed since the last time the client requested the data. This can also be triggered from the application layer.

Static data should be, as much as possible, stored as a file and returned directly from the web server, as they are optimised for that use case. This allows the strategy of storing responses as static files and serve them through the web server. This, in an offline fashion, is the strategy behind static website generators like Nikola or Jekyll.

For sites that deal with huge number of requests that should return the same data, like online newspapers or Wikipedia, a cache server like Varnish can be set to cache them, that may be able to act as a load balancer as well. This level of cache may be done with the data already compressed in Gzip, for maximum performance.

Cache in the Client

Of course, the fastest requests is the one that doesn’t happen, so any information that can be stored in the client and avoid making a call at all will greatly speed an application. To achieve real responsiveness this needs to be greatly taken into account. This is a different issue than caching, but I translated an article a while ago about tips and tricks for improving user experience on web applications here.

The dreaded cache invalidation

The elephant in the room when talking about cache is “cache invalidation”. This can be an extremely difficult problem to solve in distributed environments, depending on the nature of the data.

The basic problem is very easy to describe: “What happens when the cache contains different data than the True Source of Data?

Some times this won’t be a problem. In the first example, the cached Leonardo numbers just can’t be different from the True Source of Data. If the value is cached, it will be the correct value. The same would happen with prime numbers, a calendar for 2016, or last month’s report. If the cached data is static, happy days.

But most of the data that we’d like to cache is not really static. Good data candidates for being cached are values that rarely change. For example, your Facebook friends, or your schedule for today. This is something that will be relatively static, but it can change (a friend can be added, a meeting cancelled). What would happen then?

The most basic approach is to refresh periodically the cache, like deleting the cached value after a predetermined time. This is very straightforward and normally supported natively by the cache tools, like allowing to store a value with a validation date. For example, assuming the user has a cached copy of the avatars of friends locally available, only ask again every 15 minutes. Sure, there will up to 15 minutes where a new avatar from a friend won’t be available, and the old one will be displayed, but that’s probably not a big deal.

On the other hand, the position on a leaderboard for a competitive video game, or the result on a live match in the World Cup is probably much more sensible for such a delay.

Even worse, we’ve seen that some options involve having more than one cache (cache per server, or per worker; or redundant copies for reliability purposes). If two caches contains different data, the user may be alternating between old and new data, which will be confusing at best and produce inconsistent results at worst.

This is a very real problem on applications working with eventually consistent databases (like the mentioned Master-Slave configuration). If a single operation involves writing a value, and then read the same value, the returned value could have a different value (the old one), potentially creating inconsistent results or corrupting the data. Two very close operations modifying the same data by two users could also produce this effect.

Periodically refreshing the cache can also produce bad effects in production environment, like synchronising all the refresh happening at the same time. This is typical in systems that refresh data for the day at exactly 00:00. At exactly that time all workers will try to refresh all the data at the same time, orchestrating a perfectly coordinated distributed attack against the True Source of Data. It is better to avoid perfectly round numbers and use some randomness instead, or set numbers relative to the last time the data was requested from the True Source of Data, avoiding synchronised access.

This avalanche effect can also happen when the cache cluster changes (like adding or removing nodes, for example, when one node fails). These operations can invalidate or make unavailable high numbers of cached content, producing an avalanche of requests to the True Source of Data. There are techniques to mitigate this, like Consistent Hash Rings, but they can be a nightmare if faced in production.

Manually invalidating the cache when the data changes in the True Source of Data is a valid strategy, but it needs to invalidate the results from all the caches, which is normally only feasible for external cache services. You simply can’t access the internal memory of a worker on a different server. Also,  depending on the rate of invalidation per read cached value, can be counter productive, as it will produce an overhead of calls to the cache services. It also normally requires more development work, as this needs a better knowledge of the data flow and when the value in the cache is no longer valid. Sometimes that’s very subtle and not evident at all.


Caching is an incredibly powerful tool to improve performance in software systems.  But it can also be a huge pain due all those subtle issues.

So, some tips to deal with cache

  • Understand the data an how it’s consumed by the user. A value that changes more often than gets read it’s not a good cache candidate.
  • Ensure the system has a proper cache cycle. At the very least, understand how cache flows and what are the implications of cache failure.
  • There are a lot of ways and levels to cache. Use the most adequate to make caching efficient.
  • Cache invalidation can be very difficult. Sorry about that.

ffind v0.8 released

Good news everyone!

The new version of find (0.8) is available in GitHub and PyPi. This version includes performance improvements, man page and fuzzy search support.


Future as a developer and the ever changing picture

A few weeks ago I came by a couple of articles my Marco Arment that share the theme of the current status of accelerated change within the development community as a way of stressing up, and being difficult to be up to date. After all, one gets tired of learning a new framework or language every size months. It gets to a point where is not funny or interesting anymore.

It seems like two different options are presented, that are available for developers after some time:

  • Keep up, meaning that you adopt rapidly each new technology
  • Move to other areas, typically management

Both are totally valid options, as I already said in this blog that I don’t like when good developers move to different areas (to me it’s sort of a surgeon deciding she had enough after a few years and move to manage the hospital). Though, obviously each person has absolutely every right to choose their career path.

But I think that it’s all mostly based in an biased and incorrect view of the field of technology and the real pace of changes.

In the last years, there has been an explosion of technologies, in particular for web. Ruby on Rails almost feels introduced at the same time as COBOL. NodeJS seemed to be in fashion for a while. The same with MongoDB or jQuery.

We all know that being stressed is not a great way of learn
We all know that being stressed is not a great way of learn

In the last 6 or 7 years there has been an incredible explosion in terms of open source fragmentation. Probably because GitHub (and other online repos) and the increase in communication through the Internet, the bar to create a web framework and offer it to the world has been lowered so much, that a lot of projects that would’ve been not exposed previously, has gotten more exposure. As a general effect, is positive, but it came with the negative effect that every year there is a revolution in terms of technologies, which forces everyone to catch up and learn the brand new tool that is the best for the current development, increasing the churning of buzz words.

But all this is nothing but an illusion. We developers tend to laugh at the common “minimum 3+ years of experience in Swift”, but we still get the notion that we should be experts in a particular language, DB or framework since day one. Of course, of the one on demand today, or we are just outdated, dinosaurs that should retire.

Software development is a young field, full of young people. That’s great in a lot of aspects, but we need to appreciate experience, even if it comes from using a different technology. It doesn’t look like it, but there’s still a lot of projects done in “not-so-fancy” technologies. That includes really old stuff like Fortran or COBOL, but also C++, Java, Perl, PHP or Ruby.

Technologies gets established by a combination of features, maturity, community and a little luck.  But once they are established, they’re quite resilient and don’t go away easily.  They are useful for quite a long time. Right now it’s not that difficult to pick a tool that is almost guaranteed to be around in the next 10-15 years. Also, most of the real important stuff is totally technology agnostic, things like write clean code, structure, debug ability, communication, team work, transform abstract ideas into concrete implementations, etc… That simply does not go away.

Think about this. iOS development started in 2008. Smartphones are radically different beasts than the ones available 6 years ago, probably the environment that has changed more. The basics are the same, though. And even if Swift has been introduced this year, it’s based in the same principles. Every year there has been tweaks, changing APIs, new functionalities. But the basic ideas are still the same. Today a new web development using LAMP is totally viable. Video games still relay on C++ and OpenGL. Java is still heavily used. I use all the time ideas mainly developed in the 70s like UNIX command line or Vim.

Just because every day we get tons of news about new startups setting up applications on new paradigms, that doesn’t mean that they don’t coexist with “older” technologies.

Of course, there are new tricks to learn, but it’s a day by day additive effort. Real revolution and change of paradigm is rare, and normally not a good sign. Changing from MySQL to PostgreSQL shouldn’t be considered a major change in career. Searching certain stability in the tools you use should be seen as good move.

We developers love to stress the part of learning everyday something new and constantly challenge ourselves, but that should be taken also in perspective with allowing time to breathe. We’ve created a lot of pressure on ourselves in terms of having to be constantly pushing with new ideas, investigating in side projects and devoting ourselves 100% of the time to software. That’s not only not realistic. It’s not good.

You only have to breathe.  And just worry on doing a good work and enjoy learning.

Requests per second. A server load reference

As there seems to be a lot of misconceptions about what Big Data, there are also not really a good baseline to know “how much is high load”, specially from the point of view of people with not that much experience in servers. If you have some experience dealing with servers, you will probably know all this. So, just for the sake of convenience, I am going to do some back-of-the-envelope calculations to try to set a few numbers and explain how to calculate how many requests per second a server can hold.

We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.

Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.

I/O bound or CPU bound

There are two type of requests, I/O bound and CPU bound.

Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open
Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open

Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.[1]

In memory bound systems, getting the number of RPS is making the following calculation:

RPS = (memory / worker memory)  * (1 / Task time)

For example:

Total RAM Worker memory Task time RPS
16Gb 40Mb 100ms 4,000
16Gb 40Mb 50ms 8,000
16Gb 400Mb 100ms 400
16Gb 400Mb 50ms 800

Crunch those requests!
Crunch those requests!

Some other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core.  Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.

In CPU bound systems, getting the number of RPS is making the following calculation:

RPS = Num. cores * (1 /Task time)

For example:

Num. cores Task time RPS
4 10ms 400
4 100ms 40
16 10ms 1,600
16 100ms 160

Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers.  And, of course, they can be errors. But there are good numbers to check and keep in mind.

Calculating the load of a system

If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.

That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, just assume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.

For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.

That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.


A typical load for a server

This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.

1,000 RPS is not difficult to achieve on a normal server for a regular service.

2,000 RPS is a decent amount of load for a normal server for a regular service.

More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work (this means a request that is simple and not doing a lot of work) these days.

Again, this are just my personal “measures”, that depends on a lot of factors, but are useful to keep in mind when checking if there’s a problem or the servers can be pushed a little more.

1 – Small detail, async systems work a little different than this, so they can be faster in a purely I/O bound system. That’s one of the reasons why new async frameworks seems to get traction. They are really good for I/O bound operations. And most of the operations these days are I/O bound.

You can’t make One Perfect Final Decision

How to Make Technology Choices

Truly awesome post by Steven Lott.

The expectation of finality is the most disturbing: the expectation that someone can make OnePerfectFinalDecision.
No technology choice is ever final. Today’s greatest ever state-of-the-art, kick-ass-and-take-names SDK may evaporate in a cloud of lawsuits tomorrow. Today’s tech giant may collapse. Today’s platform of choice may be tomorrows weird anachronism.
If you are really, really lucky, you may get big enough to have scalability issues. Having a scalability issue is something we all dream about. Until you actually have a specific scalability issue, don’t try to “pre-solve” a potential problem you don’t yet have. If your software is even moderately well design, adding architectural layers to increase parallelism is not as painful as supporting obscure edge cases in user stories.
When you’re still circulating your ideas prior to writing a demo, all technology choices are equally good. And equally bad. It’s more important to get started than it is to make some impossibly PerfectFinalDecision. Hence the advice to build early and build often.

Making a tech choice and migrating after you know more is not necessarily a problem. It is at least unavoidable, and probably even good for design.

Vim as IDE. Are you getting the wrong parts?

There are a lot of discussion about how to make Vim “an IDE”. Vim is a great text editor, but when we are developing, there are lots of extra tools that are very useful. Code completion. Easy navigation through files and classes. Source control integration. Syntax checking. Navigation that understand the semantics. Integrated debugger.

My problem with IDEs (and I have used a few over the years) is that they give you a lot of support, but at the cost of increasing the complexity. They are slow for a lot of common operations and they use a lot of resources. Code completion, the good kind that will allow you to get the types of a method, not just finish your words (which most of the time is trivial), is useful when developing on a static language, but since I  program in Python is something that I can’t find good use of it. I just don’t use all that stuff, so I don’t feel I can justify to pay the price.

Another problem with IDEs is that they tend to be designed, by default, to the newbie. That’s not necessarily a bad thing, Vim is a pretty intimidating tool because is totally crazy for the rookie. But it generates a bloated environment to start with. If you open Eclipse, a popular IDE (and one I’ve used for some years), you’ll get a relatively small frame for the code, and a lot of extra stuff surrounding it. Logs, file navigation, class definition, a huge toolbar, maybe even a TO DO list…

This is a lot of stuff!
This is a lot of stuff!

For example, think about file navigation. Sure, it’s useful to move around. But it is used only at certain points in time. Most of the time, it’s just used as an entry point to code, and then the navigation can be achieved, either just moving around in the same file, by a referral from the code (like going to a method definition), or just searching on the whole project. In case you need to go to an specific file, you can then show the navigation window, or even better, search by filename. That only happens during small periods of time, so the rest of the time the window is just wasted space on screen. The same thing happen for a task list. It is useful to know the next step. But not while you’re working in one task.

Hey, it is possible to hide the navigation window, and display it only at the proper moment, to save space. I have done that. But it’s not there by default, so most of the people I know just keep it open “just in case, giving context”. They just get used to it, and don’t perceive it as a problem, But having half of your screen full of information that is irrelevant 95% of the time is a huge price to pay. And certainly not a good use of an IDE. The good parts of an IDE are things like automatic compilation and deployment, refactoring tools (not just renaming), debugging, help with the types in static languages, automatic generation of code, etc. Not showing everything, all the time.

Mimicking the wrong parts of an IDE
Mimicking the wrong parts of an IDE You can do better.

Vim is a text editor, but it is also sort of a philosophy. It is not about showing stuff, but about having stuff available at the right moment, with a quick command. It is not just about using hjkl to move around and having an insert mode. It’s about being precise. It is difficult at first, because you can’t simply discover what are the available options, but it also pays off in terms of focus and clean environment. When programming, the most important part is to keep as few things into mind as possible. Keeping all relevant information, which is already enough, but nothing more than can distract you for the task. It is also about doing one thing, and not a hundred. I use other tools for things like source control and debugging. After all, we have the whole OS to work as our IDE.

I use a small number of plugins in Vim. When you learn a little about it, you find out that it’s amazing the number of features and stuff that can be achieved “out of the box”, and how little extra is actually needed to have a really productive environment. It’s just that we need to move from the usual “browse through menus” world that most of software uses, and devout some time to, well, study and learn. It’s really worth it.

Notifications and emails

Air Mail Envelope
Yet another vintage representation of Email

We all now that email, being a technology created a long time ago and developed organically into some sort of lingua franca of Internet persona and communications, has a series of problems. No easy ones. Manage the email is a problem of its own, and there are lots of articles about it on the Internet.

One of the most annoying is the notifications. We all receive too much email that are only reminders of something relatively interesting in a different app. That could be a new comment on a blog post, an update on LinkedIn, or even a new post on a forum (yep, that used to be a huge thing). GMail’s recent move to group together all notification email is a great example that this system is quite inefficient. It is difficult to find the balance between keep a user informed and not sending spam.

To increase the annoyance, notifications typically will be produced in bursts. There is some discussion in a blog, with 4 or 5 messages in an hour, then it stops for several hours, and then someone else post another comment, producing another couple of comments.

My impression is that any serious app that produces a significant number of notifications (not even very high, something like twice a week or more) and wants to show some respect to their uses should move to a notification system. Hey, Facebook has done it. Remember when Facebook used to send tons of mail everyday with new likes, friends and posts? They changed that to make a notification system in their page. That mean you can always close Facebook, and when coming back, you can easily go to everything since last time.

But, of course, Facebook is a special case, because most people keeps it open or at least check it regularly. Most of other apps that are not that frequently used needs to use email, or no one will check them.

So that’s the deal. Send only one email. One saying “You have new stuff on app X. go to this link to check your new notifications. No new email will be sent until you visit our page” And maybe send another reminder after a week (that can be disabled). This way, if I don’t want to go immediately to the page, no more spamy notifications are received. If I’m interested in the app, I’ll check every time I get that email, but the email is not spam. It allows a very interesting natural flow. And it also shows up respect for your users.

PD: Yes, I know that this is inspired by the way phpBB works, but in a more high level approach. Not sure why that way of doing stuff is not more common.