Posted on October 21, 2013 by Jaime Buelta

Requests per second. A server load reference

As there seems to be a lot of misconceptions about what Big Data, there are also not really a good baseline to know “how much is high load”, specially from the point of view of people with not that much experience in servers. If you have some experience dealing with servers, you will probably know all this. So, just for the sake of convenience, I am going to do some back-of-the-envelope calculations to try to set a few numbers and explain how to calculate how many requests per second a server can hold.

We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.

Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.

I/O bound or CPU bound

There are two type of requests, I/O bound and CPU bound.

Everything you do or learn will be imprinted on this disc. This will limit the number of requests you can keep open

Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.[1]

In memory bound systems, getting the number of RPS is making the following calculation:

RPS = (memory / worker memory) * (1 / Task time)

For example:

Total RAM	Worker memory	Task time	RPS
16Gb	40Mb	100ms	4,000
16Gb	40Mb	50ms	8,000
16Gb	400Mb	100ms	400
16Gb	400Mb	50ms	800

Crunch those requests!

Some other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core. Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.

In CPU bound systems, getting the number of RPS is making the following calculation:

RPS = Num. cores * (1 /Task time)

For example:

Num. cores	Task time	RPS
4	10ms	400
4	100ms	40
16	10ms	1,600
16	100ms	160

Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers. And, of course, they can be errors. But there are good numbers to check and keep in mind.

Calculating the load of a system

If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.

That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, just assume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.

For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.

That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.

HULK CAN HOLD ANY LOAD!

A typical load for a server

This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.

1,000 RPS is not difficult to achieve on a normal server for a regular service.

2,000 RPS is a decent amount of load for a normal server for a regular service.

More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work (this means a request that is simple and not doing a lot of work) these days.

Again, this are just my personal “measures”, that depends on a lot of factors, but are useful to keep in mind when checking if there’s a problem or the servers can be pushed a little more.

—

1 – Small detail, async systems work a little different than this, so they can be faster in a purely I/O bound system. That’s one of the reasons why new async frameworks seems to get traction. They are really good for I/O bound operations. And most of the operations these days are I/O bound.

Category: Software Tags: english, requests per second, server, software engineering

11 Comments on “Requests per second. A server load reference”

Ahmad
September 2, 2014

I am beginner & confused about following. Would be great if you can reply.

What is meant by “worker memory”?
Does this mean memory being used for a request?

Thanks

Reply
- Jaime Buelta
  September 2, 2014
  
  Typically, a web server will create X “workers”, and will direct alternatively the requests to each of them. These workers are just a copy of the server application (the code specific for the application, your code) that is already started.
  With only one worker, the server will be able to attend only one request at the time, waiting until the first one is finished. With more, the app server (Apache, nginx, etc) will direct new requests to other workers to spread the load.
  
  As the worker is started and normally not killed after one request (this is wasteful because it adds all the start up time to each request), it keeps a memory footprint. This is related to the memory used on each request, but not the same thing. There could be memory leaks over time, or maybe the memory used is the memory used by the worst kind of request in the system. (A common approach to avoid memory leaks is to restart each worker after X requests)
  
  Reply
Shelley
October 11, 2014

I am very interested in this topic. Here you mention “2,000 RPS is a decent amount of load for a normal server for a regular service.” Could you please help me with the following question?

What is the decent amount of load for a laptop or a desktop? a volume server? a mid-range server?

Thanks.

Reply
- Jaime Buelta
  October 12, 2014
  
  Well, I have to say that those are very, very rough numbers. It’s just a round number that may be used as very general reference, a ballpark estimation, but shouldn’t be taking too seriously.
  
  I’m also talking about “simple requests”, if you’re request takes ages to complete because does a lot of stuff (a complicated DB query, for example), that’s going to be taken into account.
  
  Is it going to depend on memory, CPU, how much work each request do, etc.and what is the bottleneck (DB? CPU? Memory? an external service? reading from disk?)
  Normally a laptop or desktop is going to have less available memory (you’ll have other stuff running, most likely) and less cores than a server. That’s a huge difference. Most likely, the number of workers will be much lower.
  
  The difference on the servers, it depends. If your task needs a lot of CPU, a beefy server with more cores and more raw power will make the difference. If the task is more dependent on the DB, it could depend on the memory or disk speed (obviously of the DB machine, which may be the same server or not).
  
  I know, I’m not really answering the question, but I’m afraid that the only possible question is “it depends”. The numbers I said above are just a way of keeping something in my mind, so I get a very rough idea. But they need to be analysed critically.
  
  Reply
paul
April 10, 2015

Thanks for posting this. I really found it very insightful. Your examples were especially useful and made things easier to understand how RPS and load.

Reply
paul
April 10, 2015

… how RPS and load relate.

Reply
Pingback: Test your Technical knowledge – Scenarios – Ignited Thoughts
D
April 11, 2017

Great post, Jaime! I was curious: how many requests does a user generate? You gave us the RPS per 1M requests, which is helpful, and now I’m having trouble finding how many requests an average user would generate…

Do you have any data about this? I’m trying to figure out what kind of load I will get on a server if, say, 1 million people responded to a marketing campaign within a 4 hour period.

Thanks again for the great article.

Reply
- Jaime Buelta
  April 24, 2017
  
  Knowing how many requests a user will generate depends greatly on the application. It can be as simple as one (a marketing campaign the displays some info, only a single call) or a few per minute in a session (e.g. the user register and needs to fill out a form with 3 pages). I’d say that if your user needs to call the backend more than a couple times per minute for a sustained period of time, it’s probable that you can reduce it (aka you’re likely doing it wrong), but it really depends on the case.
  
  In your particular case, you can run a test run responding to the campaign and see how many requests you get from a single user. Then you can run the math. E.g. let’s say that each user that responds to your campaign will perform 5 actions in 5 minutes (register in your site). You can then expect an increate of load of:
  
  users responding * 1 req/min / 240 min = 4 req/min * 1K users
  
  Unless you have millions of users responding, it’s probably an increase in load that won’t be too difficult to handle. But your particular case may demand way more load, maybe the session will be more intense, a lot of analytics produce more requests, or the users will all connect at the start of the period creating a huge spike… It’s all about checking your specific application and running the possibilities.
  
  Reply
JBorhani
August 25, 2017

Thank you for the post. you said “async systems work a little different than this”. Does it mean there is another parameter in the calculation of RPS? How should I measure Worker memory(Is there any special tool)?

Reply
Kapil Sharma
January 22, 2018

Thanks for the post buddy!! I had this confusion from a long time but clarified now. Keep posting such topics!!!

Reply