I’ve been playing recently an online game that has recently launched, that uses the following idea.
When a user starts a match, it spawns a process in the server that acts as the opponent, generating the actions against the user.
The game had a rough launch, with a lot of problems due it being played by a lot of people. And, IMHO, a lot of the problems can be traced to that idea.
I see it’s a seductive one. If a user generates an interaction with the service that takes time (for example, a match for this game), spawn a process/thread in the server that generates the responses in “real time“. The user then will be notified through polling or pushing the information, and can react to it. The process will receive the new information from the user and adjust the responses.
I know is seductive because I had it once, and I was very lucky to have someone around with more experience that show me how it will break under pressure. It’s not a sane architecture to scale.
Some bad ideas:
No limit on processes, meaning the servers can be overflown by context switching. Once you have several thousand processes running on a server, you are in a bad place.
The very definition of state on the server. You need to keep track of processes started on different servers (so no two servers perform the same job). High Availability is impossible, as losing one server will mean destroying the state on all those processes. For scalability, always look at stateless servers: read all the data, store the resulting data.
Start up times. Each time a process starts, there’s some time to boot. This can be a problem if processes are always being started and stopped, adding overhead to the system. Even starting a thread is not free (and will require probably starting internal work like connect to the DB, read from cache, etc)
Connections explosion. If each process needs to connect to other parts of the infrastructure (DB, logging, cache, etc) you can have a problem in number of connections.
Process monitoring. What if a process gets stuck? A request can be cancelled easily by a web server (if a request takes more than X, kill it), but an individual process or thread can be more complicated and require specific tooling.
Alternative: Pool of workers
Generate a defined number of processes that can perform the individual actions that generates a match. Each process will get an action from a queue, execute it, and store the resulting state. Any process can produce an action for any user.
For example, if a match is a set of 20 actions, each one happening every minute, the start match request will introduce 20 actions in a queue, to be extracted at the proper time, introducing the proper delay on each action. Note that the queue needs to have a way of deliver delayed messages, not every messaging queue can (in particular, RabbitMQ doesn’t have a good support). Beanstalkd or Amazon SQS supports it.
Or, alternatively, a single action, that will end inserting the next step in the queue with the adequate delay. The action can be as simple as checking if it should change something and, if not, end.
The processes will be extracting the next action from the queue, and executing them. Note that here you minimise the time the worker is waiting for a new task to do. Each worker is active as much as possible, while any user has a pending task, ready to be executed.
The number of processes are limited, so you won’t have an explosion. You can test the system and have a good idea on the limit, when your throughput is not good enough to execute the actions within a reasonable delay, so you can stop the users from starting a new match. This is a better fallback option than allowing everyone to start one and then not giving a good experience.
A priority queue can be put in place, in that case, to inform the user: “You will be able to start your match in ~3 minutes“
Or you can add more processes/servers to increase the throughput in a predictable manner.
Alternative: Whole match pregeneration
Another alternative is actually generating a set of actions and returning them in the first go, and display them at the proper times in the client side. If any adjustment is required due the actions of the user, redo all the results from that time on.
For example, a match starts, and returns the 20 server actions to the client, which shows them to the user one each minute. In the 3rd minute, the user performs an action, which makes the server to recalculate the remainder of the match and return another 17 actions. This is a good strategy if generating actions in advance is possible and few interactions from the user are expected.
The bottom line
The main word here is stateless. It is a basic component of an scalable system, and it’s always worth it to keep in mind when designing a system to be used to more than a couple of users.
It seemed like a good idea at the time. How tech decisions done at some point in time can have a big impact much much later. Unfortunately, this is unavoidable, developing software is based in dealing with imperfect information all the time.
“Forty years later Weinberg’s “egoless” approach, in which mistakes are accepted as inevitable and reviews are performed in a collegial way, remains the sanest way to produce code” “Rock Star” Programmers
A set of articles talking about management skills. Is a great compilation of the different aspect of management. Great read even if you’re not “an official manager”, as is capital to understand what are the things that a (good) manager should do and challenges that the role presents
Some ideas about productivity. I like this quote a lot: “There is no one secret to becoming more productive, but there are hundreds of tactics you can use to get more done”
We tend to idealise the work involved in some stuff we really like. In particular in creating video games. It’s okay not to follow your dreams, but I think it could apply to a lot of other aspects
Dragon’s Lair was a mind-blowing game. Arcade story. The Banner Saga proves that 2D animation is totally ageless and can be a fantastic look for a new game.
Lessons from Doom. An game design analysis on the game, including the differences between Doom and modern FPS.
I think that one of the most overlooked components on any sane company culture is Respect. That’s probably true also for any relationship, also outside work environment, but I think is usually forgotten when nice places to work are described.
When I look back about the things that bothered me the most, most of them are related to disrespect, even in relatively minor form. It can be personal disrespect or not respecting the work itself or even the customers. Probably because is something engraved, it’s easy to take for granted when it exists, and to identify more problems deviating from the lack of it when is not present. We typically talk about how great cultures are innovative, open, communicative, fun, collaborative, etc. but one of the prerequisites that makes these values worthwhile is Respect, both to your coworkers and to the work itself.
Without Respect, ideas are accepted mostly depending on who present them, and need to be imposed. Even when there are explicit request for ideas, they take the shape of “suggestion boxes” where no one really looks into them. So, in practice, being proactive is discouraged unless you’re in a power position.
When there is Respect, ideas can be freely exchanged without fear of not being talking seriously. They are also welcomed from any source, not only through the “chosen channels”. There can be hard scrutiny, but it will be fair, and rejections will be reasonably based in facts.
Without Respect, a “funny, relaxed atmosphere” can be easily transformed into harassment and abuse. Jokes will actually hurt. Closed groups, extremely aggressive with everyone external with them, will be formed. That can include groups outside the company, like mocking customers or partners. Some groups will be appointed as intrinsically “better” (engineers, executives…) as others (secretaries, workers…) and generate asymmetrical relationships, with one part dominating the other.
When there is Respect, jokes are played just for the laugh, and are taken up to the correct limit for everyone, as there are people with thicker skin than others. If those limits happen to be crossed, the problem will be arisen and people will sincerely apologise and correct their behaviours in the future, without external influence. Occasionally the customers or partners can be make fun of, but the quality of the delivered software will be took extremely seriously (the highest form of Respect for customers) and their requests or suggestions will be taken into account when making new features.
Without Respect within the company and the different groups, no particular measures will be enforced to protect anyone or anything. Therefore, it will be easy for someone to take advantage of that, ranging from lower the quality of the work to be a moron and degrade the working environment. Code will devolve into an unreadable mess, and technical debt will grow uncontrollably. Hiring standards will get lower, and not-that-great people will be part of the team (technically, but also in a more personal sense). Also, the expectations will be to work overtime regularly, without any contingency plans or treating it as a bad sign.
When there is Respect, the organisation truly cares about the people, and not just as an empty statement. This includes understanding when overtime is unavoidable evil and work as a team to avoid it as much as possible. And when it happens, everyone do as much as they can to make it as short and enjoyable as possible. There will be understanding when someone wants to leave because they have a genuine different interest, leaving the door open if things don’t turn out for the good. Learning and personal growth will be encouraged with actions, not only with words.
Trust, a extremely important value, can only arise if there is Respect. Without Respect, fear and uncertainty will replace real trust. Being honest needs trust and confidence in the other part, as real honesty can be, and sometimes should be, uncomfortable to hear. Formality and defensiveness take control over honest feedback and team work when respect is not present. Any long-term relationship also needs Respect to stay healthy.
Being imperfect human beings, we cannot probably achieve perfect respectful relationships at all times. But we should try to be as respectful as possible, identifying our mistakes and the ones of the organisation, and move up towards the Respect ladder. That makes a much healthier (and happier) environment for all. We should recognise the Real Respect, as the word is often abused.
It is great to aim for having a great organisation or startup, with a thrilling culture. But, in order to get to establish a funny, exciting, learning, diverse and passionate place to work, we should lay strong foundations with Respect. Identify it, and not tolerate the lack of it.
There is a lot of Agile talking and I think it has reached a point where it is, if not standard, at least a common way of doing software. But, even if there is a lot talking about Agile methodologies, and companies telling that the are doing Agile, are they really doing it? I’m not so sure. When relating to Agile, I always come back to the source, which is the Agile manifesto. I really like its simplicity. Let me copy it here
We are uncovering better ways of developing software by doing it and helping others do it.Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
The first time I read it, I must say I wasn’t impressed. Yeah, sure… Great values, dude. But after spending more time, I come to see that as a really good set of values that, in mi opinion, work really well for software development. In practice, I think there are some of those values that are somehow “forgotten” over the day to day operations. I’ve been thinking about what are the ideas that, again in my opinion, are the ones I consider the best way of implementing those values. Consider them my personal “Agile highlighted parts”
In my opinion, the single word that is capital in software development is “LEARN”. The same thing applies to Agile. Agile is about being constantly learning. Change (which is also a very important word) is just a consequence of this learning process, the outcome. Because we learn how to do things better, we change our way to work. Because we truly understand the problems of the customer, we can develop what the customer needs (which may not be what the customer has in mind in the first place). This learning process should not view restricted to the developers. It is also applicable to the rest of the people involved, including the customer. If the customer is not willing to learn in the process and refuses to accept feedback, then the process is much more difficult. That could be the single most important risk in Agile, as that will make the collaboration difficult and adds a lot of friction. Even in products aimed for mass consumption, this process can be present one way or another. For example, recently there has been a lot of discussion about iOS 7, and how consumers have learned how to use a touch screen and elements that were present on previous versions to help usage are no longer needed as average consumer know now what it is about. Refusal of learning can also be a problem in Agile. There are people that do not like the constant effort and the change in mindset that it implies.
Team centric view.
The team in Agile is king. When I say “a team” I am not referring to a list of people put together on the same building working on the same product. A team is much more than that. It is people working towards the same goal, but also effectively working close together and helping each other. Just as a soccer team needs players with different abilities, so does software teams. The people working on the same product will have to be, with high probability, multidisciplinary. That has not been usually the case on software companies, where you have the testing department in one side, the design department in another, and the R&D department detached from the rest of teams, communicating through big documents and formal meetings. That creates the need of strict interfaces, and negotiation between the parts to agree how to exchange information.
Instead, a team will try to learn and improve how to work and exchange information. Asking constantly, “how can I make the work of X easier?”. To be able to do that, you have to know and understand what are the problems that X will face, and the only way of knowing that is to work closely with them. That is more difficult that it looks. We developers are usually very “machine centric”, tending to try to fix everything with a script, or adding a new feature that complicates the system. It is not simple to learn about how and why other “non-techies” people (sales, designers, etc) are doing the things they are doing the way they are doing them. We prefer to keep running out scripts and our UNIX commands. We talk our techie talk of threads, classes, recursion and functional programming. But the learning of the”domain knowledge” is really what makes the difference between a good team and a great one. And that knowledge can only be achieved constantly learning from one another. Both the developers understanding the real problems of the customer, but also, in some cases, the customer understanding how the team works and what is possible on a certain amount of time. Creating a good team is very important and challenging. But also accepting that it is formed by different individuals, with different capabilities, strong and weak points, is probably one of the most difficult parts in any organisation. I always think that the most challenging task is to deal with people, which, no matter how “good cultural fit” is achieved, will be individuals different from any other one. Acknowledging it and being able to make everyone on the same page is difficult, but capital for highly successful teams.
Interact in a constant, but structured fashion
While interaction is really important, some balance need to be achieved with interrupting ongoing work. Agile is not about “hey, we can change all at any point”. I know, the name is a little misleading. It about knowing that you are going to change stuff, and deciding what at certain points. One of the details that I found out more efficient in that are sprints and stand up meetings. Both are tools to structure the conversation while providing constant feedback.
Stand up meetings
Doing stand up meetings in the best way is more difficult that it looks. It needs to be kept very simple, fast, and with as little noise as possible. Basically each participant need to say, keeping it simple: What did I do yesterday? What I am going to do today? Are there any problems in the way? And listen carefully to the rest of the people to be aware of the work being done on the team, and to see if you can give support to anyone (during the meeting or later, if more than a few minutes are necessary). Being actually standing up and in a different room helps, as you tend to keep things up to the point, eliminating distractions. One interesting part is not the meeting itself, but the time that previously is used to structure your mind into what you’re going to say. That helps a lot in planning and in keeping you focused, as everyday you’ll have to explain what have you done. Focus is paramount in software development. Another important thing is to keep that as a strong habit. There are always moments when it looks like no one is saying something new, and it feel as redundant. If the meeting is all about the same things for a long time, then is clearly a problem (your tasks are not as small as they should). But the benefits in terms of constant feedback and focus are not achieved until some time. I also think that managers/product owners should be present on the meeting, and probably participate actively. Remember, it is part of the learning process and it’s a very good opportunity to show how the team is working and what are the management tasks. A proper star up meeting reduces the need to interrupt the work of the day with typical “what are you working on?” questions, and detects very fast problems and blockers. It structures communication to channel it.
While the standup meetings structures the communication between the team, Sprints structure the communication between the team and “the external world”. The idea of the sprint is to produce something that can be shown, and ideally works, even if it’s small and not complete. That gives feedback and then the goal for the next sprint can be decided. While having a simple objective for a sprint can be good, I don’t think is necessary. A simple grouping of tasks with no particular meaning together may well be the objective. I don’t like the word “Sprint”. I think is not fortunate, as it will not give the idea a long term race, but a strong burst in effort. Software projects are long, and teams should find a comfortable development speed, or the quality of the result will suffer. I’d prefer something like “stage” (using bicycle race metaphor), because the main idea is that they should have a sustainable pace. The sprint objective is not a contract signed by blood of your firstborn. It is a reasonable objective that the team honestly thinks can be achieved. Let me get back to couple of ideas there. What can be achieved on a sprint is something that can only be decided by the team. They are the only ones with the knowledge of what is possible and what is not. Estimations are that, assumptions. A typical problem in software development is unrealistic deadlines by management, that will be used as weapons against the developers, and the natural response from developers is to produce “safe” estimations, much bigger that they should, so they can be protected. This is not a good thing, and it is a reflection of distrust. The way to overcome that is to treat estimations as that, and try to improve them, without punishment for mistakes. Let me rephrase it: Estimations will be wrong. Often. The objective should be to improve them, and to not be wrong by a huge margin. Dividing work in small tasks will help, but that’s an inherent problem.
Make it work
I consider myself a pragmatist. At the end, everything should be done not for the sake of it, but because it helps towards an end. As with a lot of good ideas, people has fallen too often in “Agile cargo cult”, just following processes without thinking why, or without analysing who are the people on charge of doing it. Again, analysing and learning what’s going on is important, to enable the best concepts applied to a particular team.
As there seems to be a lot of misconceptions about what Big Data, there are also not really a good baseline to know “how much is high load”, specially from the point of view of people with not that much experience in servers. If you have some experience dealing with servers, you will probably know all this. So, just for the sake of convenience, I am going to do some back-of-the-envelope calculations to try to set a few numbers and explain how to calculate how many requests per second a server can hold.
We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.
Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.
I/O bound or CPU bound
There are two type of requests, I/O bound and CPU bound.
Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.
In memory bound systems, getting the number of RPS is making the following calculation:
RPS = (memory / worker memory) * (1 / Task time)
Some other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core. Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.
In CPU bound systems, getting the number of RPS is making the following calculation:
RPS = Num. cores * (1 /Task time)
Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers. And, of course, they can be errors. But there are good numbers to check and keep in mind.
Calculating the load of a system
If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.
That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, just assume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.
For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.
That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.
A typical load for a server
This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.
1,000 RPS is not difficult to achieve on a normal server for a regular service.
2,000 RPS is a decent amount of load for a normal server for a regular service.
More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work (this means a request that is simple and not doing a lot of work) these days.
Again, this are just my personal “measures”, that depends on a lot of factors, but are useful to keep in mind when checking if there’s a problem or the servers can be pushed a little more.
1 – Small detail, async systems work a little different than this, so they can be faster in a purely I/O bound system. That’s one of the reasons why new async frameworks seems to get traction. They are really good for I/O bound operations. And most of the operations these days are I/O bound.
There has been some discussion about the so-called Rockstar Programmer. You know, that awesome engineer (also called 10x engineer) that can produce what 10 other, average engineers can.
This post by Scott Hanselmanfueled some discussion on Hacker News. What has been overseen about the original post is that he advocates about 10x teams.
That resonates a lot, because I think that we should agree that, while there is people with potential to be ninja programmers, that’s not something that can be achieved without the proper care on the environment.
A good team is one that reinforces the good points of their members while hiding away (or at least mitigating) their weaknesses. It makes not sense to talk about a guru engineer that can come in a parachute in a project (any project), replacing an average Joe on a 10 members team, and simply double their output! And, after a while, she’s probably take her umbrella and fly away to the sunset (to a better payed work, one can only imagine)
Real life just doesn’t work like that. Everything is a little more complex. A toxic team or project can be beyond salvation. A regular programmer can achieve a lot just by giving some motivation and direction. A great engineer can be disastrous working in a particular area.
Do you want to see someone transformed from an x programmer to a Nx programmer? Just take a look on the same engineer the first day in a new job and then again after a whole year. The first day she’ll have to ask lots of questions. After a while, she’ll be committing patches, and, later, she’ll reach a cruise speed much much faster than the first couple of days. Or… maybe she is an x programmer, and during the first days she was a x/N programmer. Mmmm….
I also like how Haselman he approaches the subject talking about “titles” and “loud programmers”. The Rockstar Engineer idea is more a recruitment-marketing issue. It is used to hype possible hires: “Hey, we are a Rockstar company and we are looking for Ninja developers. Maybe you’re an Stellar Programmer”. It has been so used that it doesn’t mean anything anymore. I’ve even seen a job offer asking for a Python Admiral. It is currently more a way of signalling spam than any other thing. 
But there is still the myth of “the 10x programmer”, not as a way of describing that there are obviously people more productive than other (and who can reach high notes), but taking for granted that is mostly a characteristic of the programmer itself, while the truly stelar results are achieved mostly when the environment is the adequate. A lot of the great results in a team is not a magical increase in productivity by some gifted individual, but more the constant improvements in the good directions or good vision to focus in what’s truly relevant. A single good developer can move quite fast in the good direction not because she’s wildly more productive but because she has a clear view and focus.
Because an average programmer can be at least a 5x programmer when the proper details fall in place, and a great developer can be a 1/5x programmer in the wrong place.